k-nearest neighbor classification: Topics by Science.gov

Sample records for k-nearest neighbor classification

K-Nearest Neighbor Algorithm Optimization in Text Categorization

NASA Astrophysics Data System (ADS)

Chen, Shufeng

2018-01-01

K-Nearest Neighbor (KNN) classification algorithm is one of the simplest methods of data mining. It has been widely used in classification, regression and pattern recognition. The traditional KNN method has some shortcomings such as large amount of sample computation and strong dependence on the sample library capacity. In this paper, a method of representative sample optimization based on CURE algorithm is proposed. On the basis of this, presenting a quick algorithm QKNN (Quick k-nearest neighbor) to find the nearest k neighbor samples, which greatly reduces the similarity calculation. The experimental results show that this algorithm can effectively reduce the number of samples and speed up the search for the k nearest neighbor samples to improve the performance of the algorithm.
Frog sound identification using extended k-nearest neighbor classifier

NASA Astrophysics Data System (ADS)

Mukahar, Nordiana; Affendi Rosdi, Bakhtiar; Athiar Ramli, Dzati; Jaafar, Haryati

2017-09-01

Frog sound identification based on the vocalization becomes important for biological research and environmental monitoring. As a result, different types of feature extractions and classifiers have been employed to evaluate the accuracy of frog sound identification. This paper presents a frog sound identification with Extended k-Nearest Neighbor (EKNN) classifier. The EKNN classifier integrates the nearest neighbors and mutual sharing of neighborhood concepts, with the aims of improving the classification performance. It makes a prediction based on who are the nearest neighbors of the testing sample and who consider the testing sample as their nearest neighbors. In order to evaluate the classification performance in frog sound identification, the EKNN classifier is compared with competing classifier, k -Nearest Neighbor (KNN), Fuzzy k -Nearest Neighbor (FKNN) k - General Nearest Neighbor (KGNN)and Mutual k -Nearest Neighbor (MKNN) on the recorded sounds of 15 frog species obtained in Malaysia forest. The recorded sounds have been segmented using Short Time Energy and Short Time Average Zero Crossing Rate (STE+STAZCR), sinusoidal modeling (SM), manual and the combination of Energy (E) and Zero Crossing Rate (ZCR) (E+ZCR) while the features are extracted by Mel Frequency Cepstrum Coefficient (MFCC). The experimental results have shown that the EKNCN classifier exhibits the best performance in terms of accuracy compared to the competing classifiers, KNN, FKNN, GKNN and MKNN for all cases.
An Improvement To The k-Nearest Neighbor Classifier For ECG Database

NASA Astrophysics Data System (ADS)

Jaafar, Haryati; Hidayah Ramli, Nur; Nasir, Aimi Salihah Abdul

2018-03-01

The k nearest neighbor (kNN) is a non-parametric classifier and has been widely used for pattern classification. However, in practice, the performance of kNN often tends to fail due to the lack of information on how the samples are distributed among them. Moreover, kNN is no longer optimal when the training samples are limited. Another problem observed in kNN is regarding the weighting issues in assigning the class label before classification. Thus, to solve these limitations, a new classifier called Mahalanobis fuzzy k-nearest centroid neighbor (MFkNCN) is proposed in this study. Here, a Mahalanobis distance is applied to avoid the imbalance of samples distribition. Then, a surrounding rule is employed to obtain the nearest centroid neighbor based on the distributions of training samples and its distance to the query point. Consequently, the fuzzy membership function is employed to assign the query point to the class label which is frequently represented by the nearest centroid neighbor Experimental studies from electrocardiogram (ECG) signal is applied in this study. The classification performances are evaluated in two experimental steps i.e. different values of k and different sizes of feature dimensions. Subsequently, a comparative study of kNN, kNCN, FkNN and MFkCNN classifier is conducted to evaluate the performances of the proposed classifier. The results show that the performance of MFkNCN consistently exceeds the kNN, kNCN and FkNN with the best classification rates of 96.5%.
Finger vein identification using fuzzy-based k-nearest centroid neighbor classifier

NASA Astrophysics Data System (ADS)

Rosdi, Bakhtiar Affendi; Jaafar, Haryati; Ramli, Dzati Athiar

2015-02-01

In this paper, a new approach for personal identification using finger vein image is presented. Finger vein is an emerging type of biometrics that attracts attention of researchers in biometrics area. As compared to other biometric traits such as face, fingerprint and iris, finger vein is more secured and hard to counterfeit since the features are inside the human body. So far, most of the researchers focus on how to extract robust features from the captured vein images. Not much research was conducted on the classification of the extracted features. In this paper, a new classifier called fuzzy-based k-nearest centroid neighbor (FkNCN) is applied to classify the finger vein image. The proposed FkNCN employs a surrounding rule to obtain the k-nearest centroid neighbors based on the spatial distributions of the training images and their distance to the test image. Then, the fuzzy membership function is utilized to assign the test image to the class which is frequently represented by the k-nearest centroid neighbors. Experimental evaluation using our own database which was collected from 492 fingers shows that the proposed FkNCN has better performance than the k-nearest neighbor, k-nearest-centroid neighbor and fuzzy-based-k-nearest neighbor classifiers. This shows that the proposed classifier is able to identify the finger vein image effectively.
Quantum Algorithm for K-Nearest Neighbors Classification Based on the Metric of Hamming Distance

NASA Astrophysics Data System (ADS)

Ruan, Yue; Xue, Xiling; Liu, Heng; Tan, Jianing; Li, Xi

2017-11-01

K-nearest neighbors (KNN) algorithm is a common algorithm used for classification, and also a sub-routine in various complicated machine learning tasks. In this paper, we presented a quantum algorithm (QKNN) for implementing this algorithm based on the metric of Hamming distance. We put forward a quantum circuit for computing Hamming distance between testing sample and each feature vector in the training set. Taking advantage of this method, we realized a good analog for classical KNN algorithm by setting a distance threshold value t to select k - n e a r e s t neighbors. As a result, QKNN achieves O( n 3) performance which is only relevant to the dimension of feature vectors and high classification accuracy, outperforms Llyod's algorithm (Lloyd et al. 2013) and Wiebe's algorithm (Wiebe et al. 2014).
The distance function effect on k-nearest neighbor classification for medical datasets.

PubMed

Hu, Li-Yu; Huang, Min-Wei; Ke, Shih-Wen; Tsai, Chih-Fong

2016-01-01

K-nearest neighbor (k-NN) classification is conventional non-parametric classifier, which has been used as the baseline classifier in many pattern classification problems. It is based on measuring the distances between the test data and each of the training data to decide the final classification output. Since the Euclidean distance function is the most widely used distance metric in k-NN, no study examines the classification performance of k-NN by different distance functions, especially for various medical domain problems. Therefore, the aim of this paper is to investigate whether the distance function can affect the k-NN performance over different medical datasets. Our experiments are based on three different types of medical datasets containing categorical, numerical, and mixed types of data and four different distance functions including Euclidean, cosine, Chi square, and Minkowsky are used during k-NN classification individually. The experimental results show that using the Chi square distance function is the best choice for the three different types of datasets. However, using the cosine and Euclidean (and Minkowsky) distance function perform the worst over the mixed type of datasets. In this paper, we demonstrate that the chosen distance function can affect the classification accuracy of the k-NN classifier. For the medical domain datasets including the categorical, numerical, and mixed types of data, K-NN based on the Chi square distance function performs the best.
K-Nearest Neighbor Estimation of Forest Attributes: Improving Mapping Efficiency

Treesearch

Andrew O. Finley; Alan R. Ek; Yun Bai; Marvin E. Bauer

2005-01-01

This paper describes our efforts in refining k-nearest neighbor forest attributes classification using U.S. Department of Agriculture Forest Service Forest Inventory and Analysis plot data and Landsat 7 Enhanced Thematic Mapper Plus imagery. The analysis focuses on FIA-defined forest type classification across St. Louis County in northeastern Minnesota. We outline...
The Application of Determining Students’ Graduation Status of STMIK Palangkaraya Using K-Nearest Neighbors Method

NASA Astrophysics Data System (ADS)

Rusdiana, Lili; Marfuah

2017-12-01

K-Nearest Neighbors method is one of methods used for classification which calculate a value to find out the closest in distance. It is used to group a set of data such as students’ graduation status that are got from the amount of course credits taken by them, the grade point average (AVG), and the mini-thesis grade. The study is conducted to know the results of using K-Nearest Neighbors method on the application of determining students’ graduation status, so it can be analyzed from the method used, the data, and the application constructed. The aim of this study is to find out the application results by using K-Nearest Neighbors concept to determine students’ graduation status using the data of STMIK Palangkaraya students. The development of the software used Extreme Programming, since it was appropriate and precise for this study which was to quickly finish the project. The application was created using Microsoft Office Excel 2007 for the training data and Matlab 7 to implement the application. The result of K-Nearest Neighbors method on the application of determining students’ graduation status was 92.5%. It could determine the predicate graduation of 94 data used from the initial data before the processing as many as 136 data which the maximal training data was 50data. The K-Nearest Neighbors method is one of methods used to group a set of data based on the closest value, so that using K-Nearest Neighbors method agreed with this study. The results of K-Nearest Neighbors method on the application of determining students’ graduation status was 92.5% could determine the predicate graduation which is the maximal training data. The K-Nearest Neighbors method is one of methods used to group a set of data based on the closest value, so that using K-Nearest Neighbors method agreed with this study.
Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio

NASA Astrophysics Data System (ADS)

Nababan, A. A.; Sitompul, O. S.; Tulus

2018-04-01

K- Nearest Neighbor (KNN) is a good classifier, but from several studies, the result performance accuracy of KNN still lower than other methods. One of the causes of the low accuracy produced, because each attribute has the same effect on the classification process, while some less relevant characteristics lead to miss-classification of the class assignment for new data. In this research, we proposed Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio as a parameter to see the correlation between each attribute in the data and the Gain Ratio also will be used as the basis for weighting each attribute of the dataset. The accuracy of results is compared to the accuracy acquired from the original KNN method using 10-fold Cross-Validation with several datasets from the UCI Machine Learning repository and KEEL-Dataset Repository, such as abalone, glass identification, haberman, hayes-roth and water quality status. Based on the result of the test, the proposed method was able to increase the classification accuracy of KNN, where the highest difference of accuracy obtained hayes-roth dataset is worth 12.73%, and the lowest difference of accuracy obtained in the abalone dataset of 0.07%. The average result of the accuracy of all dataset increases the accuracy by 5.33%.
Emotion recognition from multichannel EEG signals using K-nearest neighbor classification.

PubMed

Li, Mi; Xu, Hongpei; Liu, Xingwang; Lu, Shengfu

2018-04-27

Many studies have been done on the emotion recognition based on multi-channel electroencephalogram (EEG) signals. This paper explores the influence of the emotion recognition accuracy of EEG signals in different frequency bands and different number of channels. We classified the emotional states in the valence and arousal dimensions using different combinations of EEG channels. Firstly, DEAP default preprocessed data were normalized. Next, EEG signals were divided into four frequency bands using discrete wavelet transform, and entropy and energy were calculated as features of K-nearest neighbor Classifier. The classification accuracies of the 10, 14, 18 and 32 EEG channels based on the Gamma frequency band were 89.54%, 92.28%, 93.72% and 95.70% in the valence dimension and 89.81%, 92.24%, 93.69% and 95.69% in the arousal dimension. As the number of channels increases, the classification accuracy of emotional states also increases, the classification accuracy of the gamma frequency band is greater than that of the beta frequency band followed by the alpha and theta frequency bands. This paper provided better frequency bands and channels reference for emotion recognition based on EEG.
Nearest Neighbor Algorithms for Pattern Classification

NASA Technical Reports Server (NTRS)

Barrios, J. O.

1972-01-01

A solution of the discrimination problem is considered by means of the minimum distance classifier, commonly referred to as the nearest neighbor (NN) rule. The NN rule is nonparametric, or distribution free, in the sense that it does not depend on any assumptions about the underlying statistics for its application. The k-NN rule is a procedure that assigns an observation vector z to a category F if most of the k nearby observations x sub i are elements of F. The condensed nearest neighbor (CNN) rule may be used to reduce the size of the training set required categorize The Bayes risk serves merely as a reference-the limit of excellence beyond which it is not possible to go. The NN rule is bounded below by the Bayes risk and above by twice the Bayes risk.
Privacy Preserving Nearest Neighbor Search

NASA Astrophysics Data System (ADS)

Shaneck, Mark; Kim, Yongdae; Kumar, Vipin

Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by providing mechanisms to mine the data while giving certain privacy guarantees. In this chapter we address the issue of privacy preserving nearest neighbor search, which forms the kernel of many data mining applications. To this end, we present a novel algorithm based on secure multiparty computation primitives to compute the nearest neighbors of records in horizontally distributed data. We show how this algorithm can be used in three important data mining algorithms, namely LOF outlier detection, SNN clustering, and kNN classification. We prove the security of these algorithms under the semi-honest adversarial model, and describe methods that can be used to optimize their performance. Keywords: Privacy Preserving Data Mining, Nearest Neighbor Search, Outlier Detection, Clustering, Classification, Secure Multiparty Computation
Multi-spectral brain tissue segmentation using automatically trained k-Nearest-Neighbor classification.

PubMed

Vrooman, Henri A; Cocosco, Chris A; van der Lijn, Fedde; Stokking, Rik; Ikram, M Arfan; Vernooij, Meike W; Breteler, Monique M B; Niessen, Wiro J

2007-08-01

Conventional k-Nearest-Neighbor (kNN) classification, which has been successfully applied to classify brain tissue in MR data, requires training on manually labeled subjects. This manual labeling is a laborious and time-consuming procedure. In this work, a new fully automated brain tissue classification procedure is presented, in which kNN training is automated. This is achieved by non-rigidly registering the MR data with a tissue probability atlas to automatically select training samples, followed by a post-processing step to keep the most reliable samples. The accuracy of the new method was compared to rigid registration-based training and to conventional kNN-based segmentation using training on manually labeled subjects for segmenting gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) in 12 data sets. Furthermore, for all classification methods, the performance was assessed when varying the free parameters. Finally, the robustness of the fully automated procedure was evaluated on 59 subjects. The automated training method using non-rigid registration with a tissue probability atlas was significantly more accurate than rigid registration. For both automated training using non-rigid registration and for the manually trained kNN classifier, the difference with the manual labeling by observers was not significantly larger than inter-observer variability for all tissue types. From the robustness study, it was clear that, given an appropriate brain atlas and optimal parameters, our new fully automated, non-rigid registration-based method gives accurate and robust segmentation results. A similarity index was used for comparison with manually trained kNN. The similarity indices were 0.93, 0.92 and 0.92, for CSF, GM and WM, respectively. It can be concluded that our fully automated method using non-rigid registration may replace manual segmentation, and thus that automated brain tissue segmentation without laborious manual training is feasible.
A Novel Hybrid Classification Model of Genetic Algorithms, Modified k-Nearest Neighbor and Developed Backpropagation Neural Network

PubMed Central

Salari, Nader; Shohaimi, Shamarina; Najafi, Farid; Nallappan, Meenakshii; Karishnarajah, Isthrinayagy

2014-01-01

Among numerous artificial intelligence approaches, k-Nearest Neighbor algorithms, genetic algorithms, and artificial neural networks are considered as the most common and effective methods in classification problems in numerous studies. In the present study, the results of the implementation of a novel hybrid feature selection-classification model using the above mentioned methods are presented. The purpose is benefitting from the synergies obtained from combining these technologies for the development of classification models. Such a combination creates an opportunity to invest in the strength of each algorithm, and is an approach to make up for their deficiencies. To develop proposed model, with the aim of obtaining the best array of features, first, feature ranking techniques such as the Fisher's discriminant ratio and class separability criteria were used to prioritize features. Second, the obtained results that included arrays of the top-ranked features were used as the initial population of a genetic algorithm to produce optimum arrays of features. Third, using a modified k-Nearest Neighbor method as well as an improved method of backpropagation neural networks, the classification process was advanced based on optimum arrays of the features selected by genetic algorithms. The performance of the proposed model was compared with thirteen well-known classification models based on seven datasets. Furthermore, the statistical analysis was performed using the Friedman test followed by post-hoc tests. The experimental findings indicated that the novel proposed hybrid model resulted in significantly better classification performance compared with all 13 classification methods. Finally, the performance results of the proposed model was benchmarked against the best ones reported as the state-of-the-art classifiers in terms of classification accuracy for the same data sets. The substantial findings of the comprehensive comparative study revealed that performance of the
[Galaxy/quasar classification based on nearest neighbor method].

PubMed

Li, Xiang-Ru; Lu, Yu; Zhou, Jian-Ming; Wang, Yong-Jun

2011-09-01

With the wide application of high-quality CCD in celestial spectrum imagery and the implementation of many large sky survey programs (e. g., Sloan Digital Sky Survey (SDSS), Two-degree-Field Galaxy Redshift Survey (2dF), Spectroscopic Survey Telescope (SST), Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) program and Large Synoptic Survey Telescope (LSST) program, etc.), celestial observational data are coming into the world like torrential rain. Therefore, to utilize them effectively and fully, research on automated processing methods for celestial data is imperative. In the present work, we investigated how to recognizing galaxies and quasars from spectra based on nearest neighbor method. Galaxies and quasars are extragalactic objects, they are far away from earth, and their spectra are usually contaminated by various noise. Therefore, it is a typical problem to recognize these two types of spectra in automatic spectra classification. Furthermore, the utilized method, nearest neighbor, is one of the most typical, classic, mature algorithms in pattern recognition and data mining, and often is used as a benchmark in developing novel algorithm. For applicability in practice, it is shown that the recognition ratio of nearest neighbor method (NN) is comparable to the best results reported in the literature based on more complicated methods, and the superiority of NN is that this method does not need to be trained, which is useful in incremental learning and parallel computation in mass spectral data processing. In conclusion, the results in this work are helpful for studying galaxies and quasars spectra classification.
Improving the accuracy of k-nearest neighbor using local mean based and distance weight

NASA Astrophysics Data System (ADS)

Syaliman, K. U.; Nababan, E. B.; Sitompul, O. S.

2018-03-01

In k-nearest neighbor (kNN), the determination of classes for new data is normally performed by a simple majority vote system, which may ignore the similarities among data, as well as allowing the occurrence of a double majority class that can lead to misclassification. In this research, we propose an approach to resolve the majority vote issues by calculating the distance weight using a combination of local mean based k-nearest neighbor (LMKNN) and distance weight k-nearest neighbor (DWKNN). The accuracy of results is compared to the accuracy acquired from the original k-NN method using several datasets from the UCI Machine Learning repository, Kaggle and Keel, such as ionosphare, iris, voice genre, lower back pain, and thyroid. In addition, the proposed method is also tested using real data from a public senior high school in city of Tualang, Indonesia. Results shows that the combination of LMKNN and DWKNN was able to increase the classification accuracy of kNN, whereby the average accuracy on test data is 2.45% with the highest increase in accuracy of 3.71% occurring on the lower back pain symptoms dataset. For the real data, the increase in accuracy is obtained as high as 5.16%.
RRAM-based parallel computing architecture using k-nearest neighbor classification for pattern recognition

NASA Astrophysics Data System (ADS)

Jiang, Yuning; Kang, Jinfeng; Wang, Xinan

2017-03-01

Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.
Diagnosis of diabetes diseases using an Artificial Immune Recognition System2 (AIRS2) with fuzzy K-nearest neighbor.

PubMed

Chikh, Mohamed Amine; Saidi, Meryem; Settouti, Nesma

2012-10-01

The use of expert systems and artificial intelligence techniques in disease diagnosis has been increasing gradually. Artificial Immune Recognition System (AIRS) is one of the methods used in medical classification problems. AIRS2 is a more efficient version of the AIRS algorithm. In this paper, we used a modified AIRS2 called MAIRS2 where we replace the K- nearest neighbors algorithm with the fuzzy K-nearest neighbors to improve the diagnostic accuracy of diabetes diseases. The diabetes disease dataset used in our work is retrieved from UCI machine learning repository. The performances of the AIRS2 and MAIRS2 are evaluated regarding classification accuracy, sensitivity and specificity values. The highest classification accuracy obtained when applying the AIRS2 and MAIRS2 using 10-fold cross-validation was, respectively 82.69% and 89.10%.
Large margin nearest neighbor classifiers.

PubMed

Domeniconi, Carlotta; Gunopulos, Dimitrios; Peng, Jing

2005-07-01

The nearest neighbor technique is a simple and appealing approach to addressing classification problems. It relies on the assumption of locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with a finite number of examples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. The employment of a locally adaptive metric becomes crucial in order to keep class conditional probabilities close to uniform, thereby minimizing the bias of estimates. We propose a technique that computes a locally flexible metric by means of support vector machines (SVMs). The decision function constructed by SVMs is used to determine the most discriminant direction in a neighborhood around the query. Such a direction provides a local feature weighting scheme. We formally show that our method increases the margin in the weighted space where classification takes place. Moreover, our method has the important advantage of online computational efficiency over competing locally adaptive techniques for nearest neighbor classification. We demonstrate the efficacy of our method using both real and simulated data.
A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.

PubMed

Wang, Xueyi

2012-02-08

The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.

Improved Fuzzy K-Nearest Neighbor Using Modified Particle Swarm Optimization

NASA Astrophysics Data System (ADS)

Jamaluddin; Siringoringo, Rimbun

2017-12-01

Fuzzy k-Nearest Neighbor (FkNN) is one of the most powerful classification methods. The presence of fuzzy concepts in this method successfully improves its performance on almost all classification issues. The main drawbackof FKNN is that it is difficult to determine the parameters. These parameters are the number of neighbors (k) and fuzzy strength (m). Both parameters are very sensitive. This makes it difficult to determine the values of ‘m’ and ‘k’, thus making FKNN difficult to control because no theories or guides can deduce how proper ‘m’ and ‘k’ should be. This study uses Modified Particle Swarm Optimization (MPSO) to determine the best value of ‘k’ and ‘m’. MPSO is focused on the Constriction Factor Method. Constriction Factor Method is an improvement of PSO in order to avoid local circumstances optima. The model proposed in this study was tested on the German Credit Dataset. The test of the data/The data test has been standardized by UCI Machine Learning Repository which is widely applied to classification problems. The application of MPSO to the determination of FKNN parameters is expected to increase the value of classification performance. Based on the experiments that have been done indicating that the model offered in this research results in a better classification performance compared to the Fk-NN model only. The model offered in this study has an accuracy rate of 81%, while. With using Fk-NN model, it has the accuracy of 70%. At the end is done comparison of research model superiority with 2 other classification models;such as Naive Bayes and Decision Tree. This research model has a better performance level, where Naive Bayes has accuracy 75%, and the decision tree model has 70%
Nearest neighbors by neighborhood counting.

PubMed

Wang, Hui

2006-06-01

Finding nearest neighbors is a general idea that underlies many artificial intelligence tasks, including machine learning, data mining, natural language understanding, and information retrieval. This idea is explicitly used in the k-nearest neighbors algorithm (kNN), a popular classification method. In this paper, this idea is adopted in the development of a general methodology, neighborhood counting, for devising similarity functions. We turn our focus from neighbors to neighborhoods, a region in the data space covering the data point in question. To measure the similarity between two data points, we consider all neighborhoods that cover both data points. We propose to use the number of such neighborhoods as a measure of similarity. Neighborhood can be defined for different types of data in different ways. Here, we consider one definition of neighborhood for multivariate data and derive a formula for such similarity, called neighborhood counting measure or NCM. NCM was tested experimentally in the framework of kNN. Experiments show that NCM is generally comparable to VDM and its variants, the state-of-the-art distance functions for multivariate data, and, at the same time, is consistently better for relatively large k values. Additionally, NCM consistently outperforms HEOM (a mixture of Euclidean and Hamming distances), the "standard" and most widely used distance function for multivariate data. NCM has a computational complexity in the same order as the standard Euclidean distance function and NCM is task independent and works for numerical and categorical data in a conceptually uniform way. The neighborhood counting methodology is proven sound for multivariate data experimentally. We hope it will work for other types of data.
K-nearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: Revisited

NASA Astrophysics Data System (ADS)

Wang, Dong

2016-03-01

features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.
K-nearest neighbor imputation of forest inventory variables in New Hampshire

Treesearch

Andrew Lister; Michael Hoppus; Raymond L. Czaplewski

2005-01-01

The k-nearest neighbor (kNN) method was used to map stand volume for a mosaic of 4 Landsat scenes covering the state of New Hampshire. Data for gross cubic foot volume and trees per acre were summarized from USDA Forest Service Forest Inventory and Analysis (FIA) plots and used as training for kNN. Six bands of...
Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery

PubMed Central

Thanh Noi, Phan; Kappas, Martin

2017-01-01

In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets. PMID:29271909
Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery.

PubMed

Thanh Noi, Phan; Kappas, Martin

2017-12-22

In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.
Minimum Expected Risk Estimation for Near-neighbor Classification

DTIC Science & Technology

2006-04-01

We consider the problems of class probability estimation and classification when using near-neighbor classifiers, such as k-nearest neighbors ( kNN ...estimate for weighted kNN classifiers with different prior information, for a broad class of risk functions. Theory and simulations show how significant...the difference is compared to the standard maximum likelihood weighted kNN estimates. Comparisons are made with uniform weights, symmetric weights
The nearest neighbor and the bayes error rates.

PubMed

Loizou, G; Maybank, S J

1987-02-01

The (k, l) nearest neighbor method of pattern classification is compared to the Bayes method. If the two acceptance rates are equal then the asymptotic error rates satisfy the inequalities Ek,l + 1 ¿ E*(¿) ¿ Ek,l dE*(¿), where d is a function of k, l, and the number of pattern classes, and ¿ is the reject threshold for the Bayes method. An explicit expression for d is given which is optimal in the sense that for some probability distributions Ek,l and dE* (¿) are equal.
Using K-Nearest Neighbor Classification to Diagnose Abnormal Lung Sounds

PubMed Central

Chen, Chin-Hsing; Huang, Wen-Tzeng; Tan, Tan-Hsu; Chang, Cheng-Chun; Chang, Yuan-Jen

2015-01-01

A reported 30% of people worldwide have abnormal lung sounds, including crackles, rhonchi, and wheezes. To date, the traditional stethoscope remains the most popular tool used by physicians to diagnose such abnormal lung sounds, however, many problems arise with the use of a stethoscope, including the effects of environmental noise, the inability to record and store lung sounds for follow-up or tracking, and the physician’s subjective diagnostic experience. This study has developed a digital stethoscope to help physicians overcome these problems when diagnosing abnormal lung sounds. In this digital system, mel-frequency cepstral coefficients (MFCCs) were used to extract the features of lung sounds, and then the K-means algorithm was used for feature clustering, to reduce the amount of data for computation. Finally, the K-nearest neighbor method was used to classify the lung sounds. The proposed system can also be used for home care: if the percentage of abnormal lung sound frames is > 30% of the whole test signal, the system can automatically warn the user to visit a physician for diagnosis. We also used bend sensors together with an amplification circuit, Bluetooth, and a microcontroller to implement a respiration detector. The respiratory signal extracted by the bend sensors can be transmitted to the computer via Bluetooth to calculate the respiratory cycle, for real-time assessment. If an abnormal status is detected, the device will warn the user automatically. Experimental results indicated that the error in respiratory cycles between measured and actual values was only 6.8%, illustrating the potential of our detector for home care applications. PMID:26053756
The nearest neighbor and next nearest neighbor effects on the thermodynamic and kinetic properties of RNA base pair

NASA Astrophysics Data System (ADS)

Wang, Yujie; Wang, Zhen; Wang, Yanli; Liu, Taigang; Zhang, Wenbing

2018-01-01

The thermodynamic and kinetic parameters of an RNA base pair with different nearest and next nearest neighbors were obtained through long-time molecular dynamics simulation of the opening-closing switch process of the base pair near its melting temperature. The results indicate that thermodynamic parameters of GC base pair are dependent on the nearest neighbor base pair, and the next nearest neighbor base pair has little effect, which validated the nearest-neighbor model. The closing and opening rates of the GC base pair also showed nearest neighbor dependences. At certain temperature, the closing and opening rates of the GC pair with nearest neighbor AU is larger than that with the nearest neighbor GC, and the next nearest neighbor plays little role. The free energy landscape of the GC base pair with the nearest neighbor GC is rougher than that with nearest neighbor AU.
Scalable Nearest Neighbor Algorithms for High Dimensional Data.

PubMed

Muja, Marius; Lowe, David G

2014-11-01

For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
Multidimensional k-nearest neighbor model based on EEMD for financial time series forecasting

NASA Astrophysics Data System (ADS)

Zhang, Ningning; Lin, Aijing; Shang, Pengjian

2017-07-01

In this paper, we propose a new two-stage methodology that combines the ensemble empirical mode decomposition (EEMD) with multidimensional k-nearest neighbor model (MKNN) in order to forecast the closing price and high price of the stocks simultaneously. The modified algorithm of k-nearest neighbors (KNN) has an increasingly wide application in the prediction of all fields. Empirical mode decomposition (EMD) decomposes a nonlinear and non-stationary signal into a series of intrinsic mode functions (IMFs), however, it cannot reveal characteristic information of the signal with much accuracy as a result of mode mixing. So ensemble empirical mode decomposition (EEMD), an improved method of EMD, is presented to resolve the weaknesses of EMD by adding white noise to the original data. With EEMD, the components with true physical meaning can be extracted from the time series. Utilizing the advantage of EEMD and MKNN, the new proposed ensemble empirical mode decomposition combined with multidimensional k-nearest neighbor model (EEMD-MKNN) has high predictive precision for short-term forecasting. Moreover, we extend this methodology to the case of two-dimensions to forecast the closing price and high price of the four stocks (NAS, S&P500, DJI and STI stock indices) at the same time. The results indicate that the proposed EEMD-MKNN model has a higher forecast precision than EMD-KNN, KNN method and ARIMA.
Applying an efficient K-nearest neighbor search to forest attribute imputation

Treesearch

Andrew O. Finley; Ronald E. McRoberts; Alan R. Ek

2006-01-01

This paper explores the utility of an efficient nearest neighbor (NN) search algorithm for applications in multi-source kNN forest attribute imputation. The search algorithm reduces the number of distance calculations between a given target vector and each reference vector, thereby, decreasing the time needed to discover the NN subset. Results of five trials show gains...
Credit scoring analysis using weighted k nearest neighbor

NASA Astrophysics Data System (ADS)

Mukid, M. A.; Widiharih, T.; Rusgiyono, A.; Prahutama, A.

2018-05-01

Credit scoring is a quatitative method to evaluate the credit risk of loan applications. Both statistical methods and artificial intelligence are often used by credit analysts to help them decide whether the applicants are worthy of credit. These methods aim to predict future behavior in terms of credit risk based on past experience of customers with similar characteristics. This paper reviews the weighted k nearest neighbor (WKNN) method for credit assessment by considering the use of some kernels. We use credit data from a private bank in Indonesia. The result shows that the Gaussian kernel and rectangular kernel have a better performance based on the value of percentage corrected classified whose value is 82.4% respectively.
Streamflow variability and classification using false nearest neighbor method

NASA Astrophysics Data System (ADS)

Vignesh, R.; Jothiprakash, V.; Sivakumar, B.

2015-12-01

Understanding regional streamflow dynamics and patterns continues to be a challenging problem. The present study introduces the false nearest neighbor (FNN) algorithm, a nonlinear dynamic-based method, to examine the spatial variability of streamflow over a region. The FNN method is a dimensionality-based approach, where the dimension of the time series represents its variability. The method uses phase space reconstruction and nearest neighbor concepts, and identifies false neighbors in the reconstructed phase space. The FNN method is applied to monthly streamflow data monitored over a period of 53 years (1950-2002) in an extensive network of 639 stations in the contiguous United States (US). Since selection of delay time in phase space reconstruction may influence the FNN outcomes, analysis is carried out for five different delay time values: monthly, seasonal, and annual separation of data as well as delay time values obtained using autocorrelation function (ACF) and average mutual information (AMI) methods. The FNN dimensions for the 639 streamflow series are generally identified to range from 4 to 12 (with very few exceptional cases), indicating a wide range of variability in the dynamics of streamflow across the contiguous US. However, the FNN dimensions for a majority of the streamflow series are found to be low (less than or equal to 6), suggesting low level of complexity in streamflow dynamics in most of the individual stations and over many sub-regions. The FNN dimension estimates also reveal that streamflow dynamics in the western parts of the US (including far west, northwestern, and southwestern parts) generally exhibit much greater variability compared to that in the eastern parts of the US (including far east, northeastern, and southeastern parts), although there are also differences among 'pockets' within these regions. These results are useful for identification of appropriate model complexity at individual stations, patterns across regions and sub
Nearest neighbor-density-based clustering methods for large hyperspectral images

NASA Astrophysics Data System (ADS)

Cariou, Claude; Chehdi, Kacem

2017-10-01

We address the problem of hyperspectral image (HSI) pixel partitioning using nearest neighbor - density-based (NN-DB) clustering methods. NN-DB methods are able to cluster objects without specifying the number of clusters to be found. Within the NN-DB approach, we focus on deterministic methods, e.g. ModeSeek, knnClust, and GWENN (standing for Graph WatershEd using Nearest Neighbors). These methods only require the availability of a k-nearest neighbor (kNN) graph based on a given distance metric. Recently, a new DB clustering method, called Density Peak Clustering (DPC), has received much attention, and kNN versions of it have quickly followed and showed their efficiency. However, NN-DB methods still suffer from the difficulty of obtaining the kNN graph due to the quadratic complexity with respect to the number of pixels. This is why GWENN was embedded into a multiresolution (MR) scheme to bypass the computation of the full kNN graph over the image pixels. In this communication, we propose to extent the MR-GWENN scheme on three aspects. Firstly, similarly to knnClust, the original labeling rule of GWENN is modified to account for local density values, in addition to the labels of previously processed objects. Secondly, we set up a modified NN search procedure within the MR scheme, in order to stabilize of the number of clusters found from the coarsest to the finest spatial resolution. Finally, we show that these extensions can be easily adapted to the three other NN-DB methods (ModeSeek, knnClust, knnDPC) for pixel clustering in large HSIs. Experiments are conducted to compare the four NN-DB methods for pixel clustering in HSIs. We show that NN-DB methods can outperform a classical clustering method such as fuzzy c-means (FCM), in terms of classification accuracy, relevance of found clusters, and clustering speed. Finally, we demonstrate the feasibility and evaluate the performances of NN-DB methods on a very large image acquired by our AISA Eagle hyperspectral
Classification of matrix-product ground states corresponding to one-dimensional chains of two-state sites of nearest neighbor interactions

NASA Astrophysics Data System (ADS)

Fatollahi, Amir H.; Khorrami, Mohammad; Shariati, Ahmad; Aghamohammadi, Amir

2011-04-01

A complete classification is given for one-dimensional chains with nearest-neighbor interactions having two states in each site, for which a matrix product ground state exists. The Hamiltonians and their corresponding matrix product ground states are explicitly obtained.
Colorectal Cancer and Colitis Diagnosis Using Fourier Transform Infrared Spectroscopy and an Improved K-Nearest-Neighbour Classifier.

PubMed

Li, Qingbo; Hao, Can; Kang, Xue; Zhang, Jialin; Sun, Xuejun; Wang, Wenbo; Zeng, Haishan

2017-11-27

Combining Fourier transform infrared spectroscopy (FTIR) with endoscopy, it is expected that noninvasive, rapid detection of colorectal cancer can be performed in vivo in the future. In this study, Fourier transform infrared spectra were collected from 88 endoscopic biopsy colorectal tissue samples (41 colitis and 47 cancers). A new method, viz., entropy weight local-hyperplane k-nearest-neighbor (EWHK), which is an improved version of K-local hyperplane distance nearest-neighbor (HKNN), is proposed for tissue classification. In order to avoid limiting high dimensions and small values of the nearest neighbor, the new EWHK method calculates feature weights based on information entropy. The average results of the random classification showed that the EWHK classifier for differentiating cancer from colitis samples produced a sensitivity of 81.38% and a specificity of 92.69%.
Using genetic algorithms to optimize k-Nearest Neighbors configurations for use with airborne laser scanning data

Treesearch

Ronald E. McRoberts; Grant M. Domke; Qi Chen; Erik Næsset; Terje Gobakken

2016-01-01

The relatively small sampling intensities used by national forest inventories are often insufficient to produce the desired precision for estimates of population parameters unless the estimation process is augmented with auxiliary information, usually in the form of remotely sensed data. The k-Nearest Neighbors (k-NN) technique is a non-parametric,multivariate approach...
A Comparison of the Spatial Linear Model to Nearest Neighbor (k-NN) Methods for Forestry Applications

Treesearch

Jay M. Ver Hoef; Hailemariam Temesgen; Sergio Gómez

2013-01-01

Forest surveys provide critical information for many diverse interests. Data are often collected from samples, and from these samples, maps of resources and estimates of aerial totals or averages are required. In this paper, two approaches for mapping and estimating totals; the spatial linear model (SLM) and k-NN (k-Nearest Neighbor) are compared, theoretically,...

Ising lattices with +/-J second-nearest-neighbor interactions

NASA Astrophysics Data System (ADS)

Ramírez-Pastor, A. J.; Nieto, F.; Vogel, E. E.

1997-06-01

Second-nearest-neighbor interactions are added to the usual nearest-neighbor Ising Hamiltonian for square lattices in different ways. The starting point is a square lattice where half the nearest-neighbor interactions are ferromagnetic and the other half of the bonds are antiferromagnetic. Then, second-nearest-neighbor interactions can also be assigned randomly or in a variety of causal manners determined by the nearest-neighbor interactions. In the present paper we consider three causal and three random ways of assigning second-nearest-neighbor exchange interactions. Several ground-state properties are then calculated for each of these lattices:energy per bond ɛg, site correlation parameter pg, maximal magnetization μg, and fraction of unfrustrated bonds hg. A set of 500 samples is considered for each size N (number of spins) and array (way of distributing the N spins). The properties of the original lattices with only nearest-neighbor interactions are already known, which allows realizing the effect of the additional interactions. We also include cubic lattices to discuss the distinction between coordination number and dimensionality. Comparison with results for triangular and honeycomb lattices is done at specific points.
Estimating areal means and variances of forest attributes using the k-Nearest Neighbors technique and satellite imagery

Treesearch

Ronald E. McRoberts; Erkki O. Tomppo; Andrew O. Finley; Heikkinen Juha

2007-01-01

The k-Nearest Neighbor (k-NN) technique has become extremely popular for a variety of forest inventory mapping and estimation applications. Much of this popularity may be attributed to the non-parametric, multivariate features of the technique, its intuitiveness, and its ease of use. When used with satellite imagery and forest...
Comparison of Neural Networks and Tabular Nearest Neighbor Encoding for Hyperspectral Signature Classification in Unresolved Object Detection

NASA Astrophysics Data System (ADS)

Schmalz, M.; Ritter, G.; Key, R.

Accurate and computationally efficient spectral signature classification is a crucial step in the nonimaging detection and recognition of spaceborne objects. In classical hyperspectral recognition applications using linear mixing models, signature classification accuracy depends on accurate spectral endmember discrimination [1]. If the endmembers cannot be classified correctly, then the signatures cannot be classified correctly, and object recognition from hyperspectral data will be inaccurate. In practice, the number of endmembers accurately classified often depends linearly on the number of inputs. This can lead to potentially severe classification errors in the presence of noise or densely interleaved signatures. In this paper, we present an comparison of emerging technologies for nonimaging spectral signature classfication based on a highly accurate, efficient search engine called Tabular Nearest Neighbor Encoding (TNE) [3,4] and a neural network technology called Morphological Neural Networks (MNNs) [5]. Based on prior results, TNE can optimize its classifier performance to track input nonergodicities, as well as yield measures of confidence or caution for evaluation of classification results. Unlike neural networks, TNE does not have a hidden intermediate data structure (e.g., the neural net weight matrix). Instead, TNE generates and exploits a user-accessible data structure called the agreement map (AM), which can be manipulated by Boolean logic operations to effect accurate classifier refinement algorithms. The open architecture and programmability of TNE's agreement map processing allows a TNE programmer or user to determine classification accuracy, as well as characterize in detail the signatures for which TNE did not obtain classification matches, and why such mis-matches occurred. In this study, we will compare TNE and MNN based endmember classification, using performance metrics such as probability of correct classification (Pd) and rate of false
A Novel Graph Constructor for Semisupervised Discriminant Analysis: Combined Low-Rank and k-Nearest Neighbor Graph

PubMed Central

Pan, Yongke; Niu, Wenjia

2017-01-01

Semisupervised Discriminant Analysis (SDA) is a semisupervised dimensionality reduction algorithm, which can easily resolve the out-of-sample problem. Relative works usually focus on the geometric relationships of data points, which are not obvious, to enhance the performance of SDA. Different from these relative works, the regularized graph construction is researched here, which is important in the graph-based semisupervised learning methods. In this paper, we propose a novel graph for Semisupervised Discriminant Analysis, which is called combined low-rank and k-nearest neighbor (LRKNN) graph. In our LRKNN graph, we map the data to the LR feature space and then the kNN is adopted to satisfy the algorithmic requirements of SDA. Since the low-rank representation can capture the global structure and the k-nearest neighbor algorithm can maximally preserve the local geometrical structure of the data, the LRKNN graph can significantly improve the performance of SDA. Extensive experiments on several real-world databases show that the proposed LRKNN graph is an efficient graph constructor, which can largely outperform other commonly used baselines. PMID:28316616
Simulating ensembles of source water quality using a K-nearest neighbor resampling approach.

PubMed

Towler, Erin; Rajagopalan, Balaji; Seidel, Chad; Summers, R Scott

2009-03-01

Climatological, geological, and water management factors can cause significant variability in surface water quality. As drinking water quality standards become more stringent, the ability to quantify the variability of source water quality becomes more important for decision-making and planning in water treatment for regulatory compliance. However, paucity of long-term water quality data makes it challenging to apply traditional simulation techniques. To overcome this limitation, we have developed and applied a robust nonparametric K-nearest neighbor (K-nn) bootstrap approach utilizing the United States Environmental Protection Agency's Information Collection Rule (ICR) data. In this technique, first an appropriate "feature vector" is formed from the best available explanatory variables. The nearest neighbors to the feature vector are identified from the ICR data and are resampled using a weight function. Repetition of this results in water quality ensembles, and consequently the distribution and the quantification of the variability. The main strengths of the approach are its flexibility, simplicity, and the ability to use a large amount of spatial data with limited temporal extent to provide water quality ensembles for any given location. We demonstrate this approach by applying it to simulate monthly ensembles of total organic carbon for two utilities in the U.S. with very different watersheds and to alkalinity and bromide at two other U.S. utilities.
Nearest neighbor, bilinear interpolation and bicubic interpolation geographic correction effects on LANDSAT imagery

NASA Technical Reports Server (NTRS)

Jayroe, R. R., Jr.

1976-01-01

Geographical correction effects on LANDSAT image data are identified, using the nearest neighbor, bilinear interpolation and bicubic interpolation techniques. Potential impacts of registration on image compression and classification are explored.
Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.

PubMed

Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar

2016-04-01

Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data. Copyright © 2016 Elsevier Inc. All rights reserved.
Categorizing document by fuzzy C-Means and K-nearest neighbors approach

NASA Astrophysics Data System (ADS)

Priandini, Novita; Zaman, Badrus; Purwanti, Endah

2017-08-01

Increasing of technology had made categorizing documents become important. It caused by increasing of number of documents itself. Managing some documents by categorizing is one of Information Retrieval application, because it involve text mining on its process. Whereas, categorization technique could be done both Fuzzy C-Means (FCM) and K-Nearest Neighbors (KNN) method. This experiment would consolidate both methods. The aim of the experiment is increasing performance of document categorize. First, FCM is in order to clustering training documents. Second, KNN is in order to categorize testing document until the output of categorization is shown. Result of the experiment is 14 testing documents retrieve relevantly to its category. Meanwhile 6 of 20 testing documents retrieve irrelevant to its category. Result of system evaluation shows that both precision and recall are 0,7.
Diagnostic tools for nearest neighbors techniques when used with satellite imagery

Treesearch

Ronald E. McRoberts

2009-01-01

Nearest neighbors techniques are non-parametric approaches to multivariate prediction that are useful for predicting both continuous and categorical forest attribute variables. Although some assumptions underlying nearest neighbor techniques are common to other prediction techniques such as regression, other assumptions are unique to nearest neighbor techniques....
Secure Nearest Neighbor Query on Crowd-Sensing Data

PubMed Central

Cheng, Ke; Wang, Liangmin; Zhong, Hong

2016-01-01

Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes. PMID:27669253
Secure Nearest Neighbor Query on Crowd-Sensing Data.

PubMed

Cheng, Ke; Wang, Liangmin; Zhong, Hong

2016-09-22

Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes.
Discrimination of soft tissues using laser-induced breakdown spectroscopy in combination with k nearest neighbors (kNN) and support vector machine (SVM) classifiers

NASA Astrophysics Data System (ADS)

Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying

2018-06-01

In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.
Landscape-scale parameterization of a tree-level forest growth model: a k-nearest neighbor imputation approach incorporating LiDAR data

Treesearch

Michael J. Falkowski; Andrew T. Hudak; Nicholas L. Crookston; Paul E. Gessler; Edward H. Uebler; Alistair M. S. Smith

2010-01-01

Sustainable forest management requires timely, detailed forest inventory data across large areas, which is difficult to obtain via traditional forest inventory techniques. This study evaluated k-nearest neighbor imputation models incorporating LiDAR data to predict tree-level inventory data (individual tree height, diameter at breast height, and...
Conformal Prediction Based on K-Nearest Neighbors for Discrimination of Ginsengs by a Home-Made Electronic Nose

PubMed Central

Sun, Xiyang; Miao, Jiacheng; Wang, You; Luo, Zhiyuan; Li, Guang

2017-01-01

An estimate on the reliability of prediction in the applications of electronic nose is essential, which has not been paid enough attention. An algorithm framework called conformal prediction is introduced in this work for discriminating different kinds of ginsengs with a home-made electronic nose instrument. Nonconformity measure based on k-nearest neighbors (KNN) is implemented separately as underlying algorithm of conformal prediction. In offline mode, the conformal predictor achieves a classification rate of 84.44% based on 1NN and 80.63% based on 3NN, which is better than that of simple KNN. In addition, it provides an estimate of reliability for each prediction. In online mode, the validity of predictions is guaranteed, which means that the error rate of region predictions never exceeds the significance level set by a user. The potential of this framework for detecting borderline examples and outliers in the application of E-nose is also investigated. The result shows that conformal prediction is a promising framework for the application of electronic nose to make predictions with reliability and validity. PMID:28805721
[Classification of Children with Attention-Deficit/Hyperactivity Disorder and Typically Developing Children Based on Electroencephalogram Principal Component Analysis and k-Nearest Neighbor].

PubMed

Yang, Jiaojiao; Guo, Qian; Li, Wenjie; Wang, Suhong; Zou, Ling

2016-04-01

This paper aims to assist the individual clinical diagnosis of children with attention-deficit/hyperactivity disorder using electroencephalogram signal detection method.Firstly,in our experiments,we obtained and studied the electroencephalogram signals from fourteen attention-deficit/hyperactivity disorder children and sixteen typically developing children during the classic interference control task of Simon-spatial Stroop,and we completed electroencephalogram data preprocessing including filtering,segmentation,removal of artifacts and so on.Secondly,we selected the subset electroencephalogram electrodes using principal component analysis(PCA)method,and we collected the common channels of the optimal electrodes which occurrence rates were more than 90%in each kind of stimulation.We then extracted the latency(200~450ms)mean amplitude features of the common electrodes.Finally,we used the k-nearest neighbor(KNN)classifier based on Euclidean distance and the support vector machine(SVM)classifier based on radial basis kernel function to classify.From the experiment,at the same kind of interference control task,the attention-deficit/hyperactivity disorder children showed lower correct response rates and longer reaction time.The N2 emerged in prefrontal cortex while P2 presented in the inferior parietal area when all kinds of stimuli demonstrated.Meanwhile,the children with attention-deficit/hyperactivity disorder exhibited markedly reduced N2 and P2amplitude compared to typically developing children.KNN resulted in better classification accuracy than SVM classifier,and the best classification rate was 89.29%in StI task.The results showed that the electroencephalogram signals were different in the brain regions of prefrontal cortex and inferior parietal cortex between attention-deficit/hyperactivity disorder and typically developing children during the interference control task,which provided a scientific basis for the clinical diagnosis of attention
Classification of Parkinson's disease utilizing multi-edit nearest-neighbor and ensemble learning algorithms with speech samples.

PubMed

Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang

2016-11-16

The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
Earthquake Declustering via a Nearest-Neighbor Approach in Space-Time-Magnitude Domain

NASA Astrophysics Data System (ADS)

Zaliapin, I. V.; Ben-Zion, Y.

2016-12-01

We propose a new method for earthquake declustering based on nearest-neighbor analysis of earthquakes in space-time-magnitude domain. The nearest-neighbor approach was recently applied to a variety of seismological problems that validate the general utility of the technique and reveal the existence of several different robust types of earthquake clusters. Notably, it was demonstrated that clustering associated with the largest earthquakes is statistically different from that of small-to-medium events. In particular, the characteristic bimodality of the nearest-neighbor distances that helps separating clustered and background events is often violated after the largest earthquakes in their vicinity, which is dominated by triggered events. This prevents using a simple threshold between the two modes of the nearest-neighbor distance distribution for declustering. The current study resolves this problem hence extending the nearest-neighbor approach to the problem of earthquake declustering. The proposed technique is applied to seismicity of different areas in California (San Jacinto, Coso, Salton Sea, Parkfield, Ventura, Mojave, etc.), as well as to the global seismicity, to demonstrate its stability and efficiency in treating various clustering types. The results are compared with those of alternative declustering methods.
Nearest Neighbor Interactions Affect the Conformational Distribution in the Unfolded State of Peptides

NASA Astrophysics Data System (ADS)

Toal, Siobhan; Schweitzer-Stenner, Reinhard; Rybka, Karin; Schwalbe, Hardol

2013-03-01

In order to enable structural predictions of intrinsically disordered proteins (IDPs) the intrinsic conformational propensities of amino acids must be complimented by information on nearest-neighbor interactions. To explore the influence of nearest-neighbors on conformational distributions, we preformed a joint vibrational (Infrared, Vibrational Circular Dichroism (VCD), polarized Raman) and 2D-NMR study of selected GxyG host-guest peptides: GDyG, GSyG, GxLG, GxVG, where x/y ={A,K,LV}. D and S (L and V) were chosen at the x (y) position due to their observance to drastically change the distribution of alanine in xAy tripeptide sequences in truncated coil libraries. The conformationally sensitive amide' profiles of the respective spectra were analyzed in terms of a statistical ensemble described as a superposition of 2D-Gaussian functions in Ramachandran space representing sub-ensembles of pPII-, β-strand-, helical-, and turn-like conformations. Our analysis and simulation of the amide I' band profiles exploits excitonic coupling between the local amide I' vibrational modes in the tetra-peptides. The resulting distributions reveal that D and S, which themselves have high propensities for turn-structures, strongly affect the conformational distribution of their downstream neighbor. Taken together, our results indicate that Dx and Sx motifs might act as conformational randomizers in proteins, attenuating intrinsic propensities of neighboring residues. Overall, our results show that nearest neighbor interactions contribute significantly to the Gibbs energy landscape of disordered peptides and proteins.
Quantum realization of the nearest neighbor value interpolation method for INEQR

NASA Astrophysics Data System (ADS)

Zhou, RiGui; Hu, WenWen; Luo, GaoFeng; Liu, XingAo; Fan, Ping

2018-07-01

This paper presents the nearest neighbor value (NNV) interpolation algorithm for the improved novel enhanced quantum representation of digital images (INEQR). It is necessary to use interpolation in image scaling because there is an increase or a decrease in the number of pixels. The difference between the proposed scheme and nearest neighbor interpolation is that the concept applied, to estimate the missing pixel value, is guided by the nearest value rather than the distance. Firstly, a sequence of quantum operations is predefined, such as cyclic shift transformations and the basic arithmetic operations. Then, the feasibility of the nearest neighbor value interpolation method for quantum image of INEQR is proven using the previously designed quantum operations. Furthermore, quantum image scaling algorithm in the form of circuits of the NNV interpolation for INEQR is constructed for the first time. The merit of the proposed INEQR circuit lies in their low complexity, which is achieved by utilizing the unique properties of quantum superposition and entanglement. Finally, simulation-based experimental results involving different classical images and ratios (i.e., conventional or non-quantum) are simulated based on the classical computer's MATLAB 2014b software, which demonstrates that the proposed interpolation method has higher performances in terms of high resolution compared to the nearest neighbor and bilinear interpolation.
Detection of acute lymphocyte leukemia using k-nearest neighbor algorithm based on shape and histogram features

NASA Astrophysics Data System (ADS)

Purwanti, Endah; Calista, Evelyn

2017-05-01

Leukemia is a type of cancer which is caused by malignant neoplasms in leukocyte cells. Leukemia disease which can cause death quickly enough for the sufferer is a type of acute lymphocyte leukemia (ALL). In this study, we propose automatic detection of lymphocyte leukemia through classification of lymphocyte cell images obtained from peripheral blood smear single cell. There are two main objectives in this study. The first is to extract featuring cells. The second objective is to classify the lymphocyte cells into two classes, namely normal and abnormal lymphocytes. In conducting this study, we use combination of shape feature and histogram feature, and the classification algorithm is k-nearest Neighbour with k variation is 1, 3, 5, 7, 9, 11, 13, and 15. The best level of accuracy, sensitivity, and specificity in this study are 90%, 90%, and 90%, and they were obtained from combined features of area-perimeter-mean-standard deviation with k=7.

Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

DTIC Science & Technology

1999-05-17

Experimental Results In this section, we compare kNN -mut which uses the weight vector obtained using mutual information as the fi- nal weight vector and...WAKNN against kNN , C4.5 [Qui93], RIPPER [Coh95], PEBLS [CS93], Rainbow [McC96], VSM [Low95] on several synthetic and real data sets. VSM is another k...obtained without this option. 3 C4.5 RIPPER PEBLS Rainbow kNN WAKNN Syn-1 100.0 100.0 100.0 100.0 77.3 100.0 Syn-2 67.5 69.5 62.0 50.0 66.0 68.8 Syn
Quantum realization of the nearest-neighbor interpolation method for FRQI and NEQR

NASA Astrophysics Data System (ADS)

Sang, Jianzhi; Wang, Shen; Niu, Xiamu

2016-01-01

This paper is concerned with the feasibility of the classical nearest-neighbor interpolation based on flexible representation of quantum images (FRQI) and novel enhanced quantum representation (NEQR). Firstly, the feasibility of the classical image nearest-neighbor interpolation for quantum images of FRQI and NEQR is proven. Then, by defining the halving operation and by making use of quantum rotation gates, the concrete quantum circuit of the nearest-neighbor interpolation for FRQI is designed for the first time. Furthermore, quantum circuit of the nearest-neighbor interpolation for NEQR is given. The merit of the proposed NEQR circuit lies in their low complexity, which is achieved by utilizing the halving operation and the quantum oracle operator. Finally, in order to further improve the performance of the former circuits, new interpolation circuits for FRQI and NEQR are presented by using Control-NOT gates instead of a halving operation. Simulation results show the effectiveness of the proposed circuits.
Geometric k-nearest neighbor estimation of entropy and mutual information

NASA Astrophysics Data System (ADS)

Lord, Warren M.; Sun, Jie; Bollt, Erik M.

2018-03-01

Nonparametric estimation of mutual information is used in a wide range of scientific problems to quantify dependence between variables. The k-nearest neighbor (knn) methods are consistent, and therefore expected to work well for a large sample size. These methods use geometrically regular local volume elements. This practice allows maximum localization of the volume elements, but can also induce a bias due to a poor description of the local geometry of the underlying probability measure. We introduce a new class of knn estimators that we call geometric knn estimators (g-knn), which use more complex local volume elements to better model the local geometry of the probability measures. As an example of this class of estimators, we develop a g-knn estimator of entropy and mutual information based on elliptical volume elements, capturing the local stretching and compression common to a wide range of dynamical system attractors. A series of numerical examples in which the thickness of the underlying distribution and the sample sizes are varied suggest that local geometry is a source of problems for knn methods such as the Kraskov-Stögbauer-Grassberger estimator when local geometric effects cannot be removed by global preprocessing of the data. The g-knn method performs well despite the manipulation of the local geometry. In addition, the examples suggest that the g-knn estimators can be of particular relevance to applications in which the system is large, but the data size is limited.
Classification Features of US Images Liver Extracted with Co-occurrence Matrix Using the Nearest Neighbor Algorithm

NASA Astrophysics Data System (ADS)

Moldovanu, Simona; Bibicu, Dorin; Moraru, Luminita; Nicolae, Mariana Carmen

2011-12-01

Co-occurrence matrix has been applied successfully for echographic images characterization because it contains information about spatial distribution of grey-scale levels in an image. The paper deals with the analysis of pixels in selected regions of interest of an US image of the liver. The useful information obtained refers to texture features such as entropy, contrast, dissimilarity and correlation extract with co-occurrence matrix. The analyzed US images were grouped in two distinct sets: healthy liver and steatosis (or fatty) liver. These two sets of echographic images of the liver build a database that includes only histological confirmed cases: 10 images of healthy liver and 10 images of steatosis liver. The healthy subjects help to compute four textural indices and as well as control dataset. We chose to study these diseases because the steatosis is the abnormal retention of lipids in cells. The texture features are statistical measures and they can be used to characterize irregularity of tissues. The goal is to extract the information using the Nearest Neighbor classification algorithm. The K-NN algorithm is a powerful tool to classify features textures by means of grouping in a training set using healthy liver, on the one hand, and in a holdout set using the features textures of steatosis liver, on the other hand. The results could be used to quantify the texture information and will allow a clear detection between health and steatosis liver.
Reverse Nearest Neighbor Search on a Protein-Protein Interaction Network to Infer Protein-Disease Associations.

PubMed

Suratanee, Apichat; Plaimas, Kitiporn

2017-01-01

The associations between proteins and diseases are crucial information for investigating pathological mechanisms. However, the number of known and reliable protein-disease associations is quite small. In this study, an analysis framework to infer associations between proteins and diseases was developed based on a large data set of a human protein-protein interaction network integrating an effective network search, namely, the reverse k -nearest neighbor (R k NN) search. The R k NN search was used to identify an impact of a protein on other proteins. Then, associations between proteins and diseases were inferred statistically. The method using the R k NN search yielded a much higher precision than a random selection, standard nearest neighbor search, or when applying the method to a random protein-protein interaction network. All protein-disease pair candidates were verified by a literature search. Supporting evidence for 596 pairs was identified. In addition, cluster analysis of these candidates revealed 10 promising groups of diseases to be further investigated experimentally. This method can be used to identify novel associations to better understand complex relationships between proteins and diseases.
TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach

PubMed Central

Diaz, Naryttza N; Krause, Lutz; Goesmann, Alexander; Niehaus, Karsten; Nattkemper, Tim W

2009-01-01

Background Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. Results Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained. Conclusion An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference
Nearest-neighbor thermodynamics of deoxyinosine pairs in DNA duplexes

PubMed Central

Watkins, Norman E.; SantaLucia, John

2005-01-01

Nearest-neighbor thermodynamic parameters of the ‘universal pairing base’ deoxyinosine were determined for the pairs I·C, I·A, I·T, I·G and I·I adjacent to G·C and A·T pairs. Ultraviolet absorbance melting curves were measured and non-linear regression performed on 84 oligonucleotide duplexes with 9 or 12 bp lengths. These data were combined with data for 13 inosine containing duplexes from the literature. Multiple linear regression was used to solve for the 32 nearest-neighbor unknowns. The parameters predict the Tm for all sequences within 1.2°C on average. The general trend in decreasing stability is I·C > I·A > I·T ≈ I· G > I·I. The stability trend for the base pair 5′ of the I·X pair is G·C > C·G > A·T > T·A. The stability trend for the base pair 3′ of I·X is the same. These trends indicate a complex interplay between H-bonding, nearest-neighbor stacking, and mismatch geometry. A survey of 14 tandem inosine pairs and 8 tandem self-complementary inosine pairs is also provided. These results may be used in the design of degenerate PCR primers and for degenerate microarray probes. PMID:16264087
The Effective Resistance of the -Cycle Graph with Four Nearest Neighbors

NASA Astrophysics Data System (ADS)

Chair, Noureddine

2014-02-01

The exact expression for the effective resistance between any two vertices of the -cycle graph with four nearest neighbors , is given. It turns out that this expression is written in terms of the effective resistance of the -cycle graph , the square of the Fibonacci numbers, and the bisected Fibonacci numbers. As a consequence closed form formulas for the total effective resistance, the first passage time, and the mean first passage time for the simple random walk on the the -cycle graph with four nearest neighbors are obtained. Finally, a closed form formula for the effective resistance of with all first neighbors removed is obtained.
A Coupled k-Nearest Neighbor Algorithm for Multi-Label Classification

DTIC Science & Technology

2015-05-22

classification, an image may contain several concepts simultaneously, such as beach, sunset and kangaroo . Such tasks are usually denoted as multi-label...informatics, a gene can belong to both metabolism and transcription classes; and in music categorization, a song may labeled as Mozart and sad. In the
Nearest unlike neighbor (NUN): an aid to decision confidence estimation

NASA Astrophysics Data System (ADS)

Dasarathy, Belur V.

1995-09-01

The concept of nearest unlike neighbor (NUN), proposed and explored previously in the design of nearest neighbor (NN) based decision systems, is further exploited in this study to develop a measure of confidence in the decisions made by NN-based decision systems. This measure of confidence, on the basis of comparison with a user-defined threshold, may be used to determine the acceptability of the decision provided by the NN-based decision system. The concepts, associated methodology, and some illustrative numerical examples using the now classical Iris data to bring out the ease of implementation and effectiveness of the proposed innovations are presented.
Missing value imputation for gene expression data by tailored nearest neighbors.

PubMed

Faisal, Shahla; Tutz, Gerhard

2017-04-25

High dimensional data like gene expression and RNA-sequences often contain missing values. The subsequent analysis and results based on these incomplete data can suffer strongly from the presence of these missing values. Several approaches to imputation of missing values in gene expression data have been developed but the task is difficult due to the high dimensionality (number of genes) of the data. Here an imputation procedure is proposed that uses weighted nearest neighbors. Instead of using nearest neighbors defined by a distance that includes all genes the distance is computed for genes that are apt to contribute to the accuracy of imputed values. The method aims at avoiding the curse of dimensionality, which typically occurs if local methods as nearest neighbors are applied in high dimensional settings. The proposed weighted nearest neighbors algorithm is compared to existing missing value imputation techniques like mean imputation, KNNimpute and the recently proposed imputation by random forests. We use RNA-sequence and microarray data from studies on human cancer to compare the performance of the methods. The results from simulations as well as real studies show that the weighted distance procedure can successfully handle missing values for high dimensional data structures where the number of predictors is larger than the number of samples. The method typically outperforms the considered competitors.
A Nearest Neighbor Classifier Employing Critical Boundary Vectors for Efficient On-Chip Template Reduction.

PubMed

Xia, Wenjun; Mita, Yoshio; Shibata, Tadashi

2016-05-01

Aiming at efficient data condensation and improving accuracy, this paper presents a hardware-friendly template reduction (TR) method for the nearest neighbor (NN) classifiers by introducing the concept of critical boundary vectors. A hardware system is also implemented to demonstrate the feasibility of using an field-programmable gate array (FPGA) to accelerate the proposed method. Initially, k -means centers are used as substitutes for the entire template set. Then, to enhance the classification performance, critical boundary vectors are selected by a novel learning algorithm, which is completed within a single iteration. Moreover, to remove noisy boundary vectors that can mislead the classification in a generalized manner, a global categorization scheme has been explored and applied to the algorithm. The global characterization automatically categorizes each classification problem and rapidly selects the boundary vectors according to the nature of the problem. Finally, only critical boundary vectors and k -means centers are used as the new template set for classification. Experimental results for 24 data sets show that the proposed algorithm can effectively reduce the number of template vectors for classification with a high learning speed. At the same time, it improves the accuracy by an average of 2.17% compared with the traditional NN classifiers and also shows greater accuracy than seven other TR methods. We have shown the feasibility of using a proof-of-concept FPGA system of 256 64-D vectors to accelerate the proposed method on hardware. At a 50-MHz clock frequency, the proposed system achieves a 3.86 times higher learning speed than on a 3.4-GHz PC, while consuming only 1% of the power of that used by the PC.
Automated analysis of long-term grooming behavior in Drosophila using a k-nearest neighbors classifier

PubMed Central

Allen, Victoria W; Shirasu-Hiza, Mimi

2018-01-01

Despite being pervasive, the control of programmed grooming is poorly understood. We addressed this gap by developing a high-throughput platform that allows long-term detection of grooming in Drosophila melanogaster. In our method, a k-nearest neighbors algorithm automatically classifies fly behavior and finds grooming events with over 90% accuracy in diverse genotypes. Our data show that flies spend ~13% of their waking time grooming, driven largely by two major internal programs. One of these programs regulates the timing of grooming and involves the core circadian clock components cycle, clock, and period. The second program regulates the duration of grooming and, while dependent on cycle and clock, appears to be independent of period. This emerging dual control model in which one program controls timing and another controls duration, resembles the two-process regulatory model of sleep. Together, our quantitative approach presents the opportunity for further dissection of mechanisms controlling long-term grooming in Drosophila. PMID:29485401
A Sensor Data Fusion System Based on k-Nearest Neighbor Pattern Classification for Structural Health Monitoring Applications

PubMed Central

Vitola, Jaime; Pozo, Francesc; Tibaduiza, Diego A.; Anaya, Maribel

2017-01-01

Civil and military structures are susceptible and vulnerable to damage due to the environmental and operational conditions. Therefore, the implementation of technology to provide robust solutions in damage identification (by using signals acquired directly from the structure) is a requirement to reduce operational and maintenance costs. In this sense, the use of sensors permanently attached to the structures has demonstrated a great versatility and benefit since the inspection system can be automated. This automation is carried out with signal processing tasks with the aim of a pattern recognition analysis. This work presents the detailed description of a structural health monitoring (SHM) system based on the use of a piezoelectric (PZT) active system. The SHM system includes: (i) the use of a piezoelectric sensor network to excite the structure and collect the measured dynamic response, in several actuation phases; (ii) data organization; (iii) advanced signal processing techniques to define the feature vectors; and finally; (iv) the nearest neighbor algorithm as a machine learning approach to classify different kinds of damage. A description of the experimental setup, the experimental validation and a discussion of the results from two different structures are included and analyzed. PMID:28230796
A Sensor Data Fusion System Based on k-Nearest Neighbor Pattern Classification for Structural Health Monitoring Applications.

PubMed

Vitola, Jaime; Pozo, Francesc; Tibaduiza, Diego A; Anaya, Maribel

2017-02-21

Civil and military structures are susceptible and vulnerable to damage due to the environmental and operational conditions. Therefore, the implementation of technology to provide robust solutions in damage identification (by using signals acquired directly from the structure) is a requirement to reduce operational and maintenance costs. In this sense, the use of sensors permanently attached to the structures has demonstrated a great versatility and benefit since the inspection system can be automated. This automation is carried out with signal processing tasks with the aim of a pattern recognition analysis. This work presents the detailed description of a structural health monitoring (SHM) system based on the use of a piezoelectric (PZT) active system. The SHM system includes: (i) the use of a piezoelectric sensor network to excite the structure and collect the measured dynamic response, in several actuation phases; (ii) data organization; (iii) advanced signal processing techniques to define the feature vectors; and finally; (iv) the nearest neighbor algorithm as a machine learning approach to classify different kinds of damage. A description of the experimental setup, the experimental validation and a discussion of the results from two different structures are included and analyzed.
On the consistency between nearest-neighbor peridynamic discretizations and discretized classical elasticity models

DOE PAGES

Seleson, Pablo; Du, Qiang; Parks, Michael L.

2016-08-16

The peridynamic theory of solid mechanics is a nonlocal reformulation of the classical continuum mechanics theory. At the continuum level, it has been demonstrated that classical (local) elasticity is a special case of peridynamics. Such a connection between these theories has not been extensively explored at the discrete level. This paper investigates the consistency between nearest-neighbor discretizations of linear elastic peridynamic models and finite difference discretizations of the Navier–Cauchy equation of classical elasticity. While nearest-neighbor discretizations in peridynamics have been numerically observed to present grid-dependent crack paths or spurious microcracks, this paper focuses on a different, analytical aspect of suchmore » discretizations. We demonstrate that, even in the absence of cracks, such discretizations may be problematic unless a proper selection of weights is used. Specifically, we demonstrate that using the standard meshfree approach in peridynamics, nearest-neighbor discretizations do not reduce, in general, to discretizations of corresponding classical models. We study nodal-based quadratures for the discretization of peridynamic models, and we derive quadrature weights that result in consistency between nearest-neighbor discretizations of peridynamic models and discretized classical models. The quadrature weights that lead to such consistency are, however, model-/discretization-dependent. We motivate the choice of those quadrature weights through a quadratic approximation of displacement fields. The stability of nearest-neighbor peridynamic schemes is demonstrated through a Fourier mode analysis. Finally, an approach based on a normalization of peridynamic constitutive constants at the discrete level is explored. This approach results in the desired consistency for one-dimensional models, but does not work in higher dimensions. The results of the work presented in this paper suggest that even though nearest-neighbor
Improving RNA nearest neighbor parameters for helices by going beyond the two-state model.

PubMed

Spasic, Aleksandar; Berger, Kyle D; Chen, Jonathan L; Seetin, Matthew G; Turner, Douglas H; Mathews, David H

2018-06-01

RNA folding free energy change nearest neighbor parameters are widely used to predict folding stabilities of secondary structures. They were determined by linear regression to datasets of optical melting experiments on small model systems. Traditionally, the optical melting experiments are analyzed assuming a two-state model, i.e. a structure is either complete or denatured. Experimental evidence, however, shows that structures exist in an ensemble of conformations. Partition functions calculated with existing nearest neighbor parameters predict that secondary structures can be partially denatured, which also directly conflicts with the two-state model. Here, a new approach for determining RNA nearest neighbor parameters is presented. Available optical melting data for 34 Watson-Crick helices were fit directly to a partition function model that allows an ensemble of conformations. Fitting parameters were the enthalpy and entropy changes for helix initiation, terminal AU pairs, stacks of Watson-Crick pairs and disordered internal loops. The resulting set of nearest neighbor parameters shows a 38.5% improvement in the sum of residuals in fitting the experimental melting curves compared to the current literature set.
An RFID Indoor Positioning Algorithm Based on Bayesian Probability and K-Nearest Neighbor.

PubMed

Xu, He; Ding, Ye; Li, Peng; Wang, Ruchuan; Li, Yizhu

2017-08-05

The Global Positioning System (GPS) is widely used in outdoor environmental positioning. However, GPS cannot support indoor positioning because there is no signal for positioning in an indoor environment. Nowadays, there are many situations which require indoor positioning, such as searching for a book in a library, looking for luggage in an airport, emergence navigation for fire alarms, robot location, etc. Many technologies, such as ultrasonic, sensors, Bluetooth, WiFi, magnetic field, Radio Frequency Identification (RFID), etc., are used to perform indoor positioning. Compared with other technologies, RFID used in indoor positioning is more cost and energy efficient. The Traditional RFID indoor positioning algorithm LANDMARC utilizes a Received Signal Strength (RSS) indicator to track objects. However, the RSS value is easily affected by environmental noise and other interference. In this paper, our purpose is to reduce the location fluctuation and error caused by multipath and environmental interference in LANDMARC. We propose a novel indoor positioning algorithm based on Bayesian probability and K -Nearest Neighbor (BKNN). The experimental results show that the Gaussian filter can filter some abnormal RSS values. The proposed BKNN algorithm has the smallest location error compared with the Gaussian-based algorithm, LANDMARC and an improved KNN algorithm. The average error in location estimation is about 15 cm using our method.
Estimating forest attribute parameters for small areas using nearest neighbors techniques

Treesearch

Ronald E. McRoberts

2012-01-01

Nearest neighbors techniques have become extremely popular, particularly for use with forest inventory data. With these techniques, a population unit prediction is calculated as a linear combination of observations for a selected number of population units in a sample that are most similar, or nearest, in a space of ancillary variables to the population unit requiring...
OCR enhancement through neighbor embedding and fast approximate nearest neighbors

NASA Astrophysics Data System (ADS)

Smith, D. C.

2012-10-01

Generic optical character recognition (OCR) engines often perform very poorly in transcribing scanned low resolution (LR) text documents. To improve OCR performance, we apply the Neighbor Embedding (NE) single-image super-resolution (SISR) technique to LR scanned text documents to obtain high resolution (HR) versions, which we subsequently process with OCR. For comparison, we repeat this procedure using bicubic interpolation (BI). We demonstrate that mean-square errors (MSE) in NE HR estimates do not increase substantially when NE is trained in one Latin font style and tested in another, provided both styles belong to the same font category (serif or sans serif). This is very important in practice, since for each font size, the number of training sets required for each category may be reduced from dozens to just one. We also incorporate randomized k-d trees into our NE implementation to perform approximate nearest neighbor search, and obtain a 1000x speed up of our original NE implementation, with negligible MSE degradation. This acceleration also made it practical to combine all of our size-specific NE Latin models into a single Universal Latin Model (ULM). The ULM eliminates the need to determine the unknown font category and size of an input LR text document and match it to an appropriate model, a very challenging task, since the dpi (pixels per inch) of the input LR image is generally unknown. Our experiments show that OCR character error rates (CER) were over 90% when we applied the Tesseract OCR engine to LR text documents (scanned at 75 dpi and 100 dpi) in the 6-10 pt range. By contrast, using k-d trees and the ULM, CER after NE preprocessing averaged less than 7% at 3x (100 dpi LR scanning) and 4x (75 dpi LR scanning) magnification, over an order of magnitude improvement. Moreover, CER after NE preprocessing was more that 6 times lower on average than after BI preprocessing.

Implementation of Nearest Neighbor using HSV to Identify Skin Disease

NASA Astrophysics Data System (ADS)

Gerhana, Y. A.; Zulfikar, W. B.; Ramdani, A. H.; Ramdhani, M. A.

2018-01-01

Today, Android is one of the most widely used operating system in the world. Most of android device has a camera that could capture an image, this feature could be optimized to identify skin disease. The disease is one of health problem caused by bacterium, fungi, and virus. The symptoms of skin disease usually visible. In this work, the symptoms that captured as image contains HSV in every pixel of the image. HSV can extracted and then calculate to earn euclidean value. The value compared using nearest neighbor algorithm to discover closer value between image testing and image training to get highest value that decide class label or type of skin disease. The testing result show that 166 of 200 or about 80% is accurate. There are some reasons that influence the result of classification model like number of image training and quality of android device’s camera.
Nearest-neighbor Kitaev exchange blocked by charge order in electron-doped α -RuCl3

NASA Astrophysics Data System (ADS)

Koitzsch, A.; Habenicht, C.; Müller, E.; Knupfer, M.; Büchner, B.; Kretschmer, S.; Richter, M.; van den Brink, J.; Börrnert, F.; Nowak, D.; Isaeva, A.; Doert, Th.

2017-10-01

A quantum spin liquid might be realized in α -RuCl3 , a honeycomb-lattice magnetic material with substantial spin-orbit coupling. Moreover, α -RuCl3 is a Mott insulator, which implies the possibility that novel exotic phases occur upon doping. Here, we study the electronic structure of this material when intercalated with potassium by photoemission spectroscopy, electron energy loss spectroscopy, and density functional theory calculations. We obtain a stable stoichiometry at K0.5RuCl3 . This gives rise to a peculiar charge disproportionation into formally Ru2 + (4 d6 ) and Ru3 + (4 d5 ). Every Ru 4 d5 site with one hole in the t2 g shell is surrounded by nearest neighbors of 4 d6 character, where the t2 g level is full and magnetically inert. Thus, each type of Ru site forms a triangular lattice, and nearest-neighbor interactions of the original honeycomb are blocked.
Estimation of Carcinogenicity using Hierarchical Clustering and Nearest Neighbor Methodologies

EPA Science Inventory

Previously a hierarchical clustering (HC) approach and a nearest neighbor (NN) approach were developed to model acute aquatic toxicity end points. These approaches were developed to correlate the toxicity for large, noncongeneric data sets. In this study these approaches applie...
Nearest Neighbor Classification of Stationary Time Series: An Application to Anesthesia Level Classification by EEG Analysis.

DTIC Science & Technology

1980-12-05

classification procedures that are common in speech processing. The anesthesia level classification by EEG time series population screening problem example is in...formance. The use of the KL number type metric in NN rule classification, in a delete-one subj ect ’s EE-at-a-time KL-NN and KL- kNN classification of the...17 individual labeled EEG sample population using KL-NN and KL- kNN rules. The results obtained are shown in Table 1. The entries in the table indicate
Collective Behaviors of Mobile Robots Beyond the Nearest Neighbor Rules With Switching Topology.

PubMed

Ning, Boda; Han, Qing-Long; Zuo, Zongyu; Jin, Jiong; Zheng, Jinchuan

2018-05-01

This paper is concerned with the collective behaviors of robots beyond the nearest neighbor rules, i.e., dispersion and flocking, when robots interact with others by applying an acute angle test (AAT)-based interaction rule. Different from a conventional nearest neighbor rule or its variations, the AAT-based interaction rule allows interactions with some far-neighbors and excludes unnecessary nearest neighbors. The resulting dispersion and flocking hold the advantages of scalability, connectivity, robustness, and effective area coverage. For the dispersion, a spring-like controller is proposed to achieve collision-free coordination. With switching topology, a new fixed-time consensus-based energy function is developed to guarantee the system stability. An upper bound of settling time for energy consensus is obtained, and a uniform time interval is accordingly set so that energy distribution is conducted in a fair manner. For the flocking, based on a class of generalized potential functions taking nonsmooth switching into account, a new controller is proposed to ensure that the same velocity for all robots is eventually reached. A co-optimizing problem is further investigated to accomplish additional tasks, such as enhancing communication performance, while maintaining the collective behaviors of mobile robots. Simulation results are presented to show the effectiveness of the theoretical results.
Competing growth processes induced by next-nearest-neighbor interactions: Effects on meandering wavelength and stiffness

NASA Astrophysics Data System (ADS)

Blel, Sonia; Hamouda, Ajmi BH.; Mahjoub, B.; Einstein, T. L.

2017-02-01

In this paper we explore the meandering instability of vicinal steps with a kinetic Monte Carlo simulations (kMC) model including the attractive next-nearest-neighbor (NNN) interactions. kMC simulations show that increase of the NNN interaction strength leads to considerable reduction of the meandering wavelength and to weaker dependence of the wavelength on the deposition rate F. The dependences of the meandering wavelength on the temperature and the deposition rate obtained with simulations are in good quantitative agreement with the experimental result on the meandering instability of Cu(0 2 24) [T. Maroutian et al., Phys. Rev. B 64, 165401 (2001), 10.1103/PhysRevB.64.165401]. The effective step stiffness is found to depend not only on the strength of NNN interactions and the Ehrlich-Schwoebel barrier, but also on F. We argue that attractive NNN interactions intensify the incorporation of adatoms at step edges and enhance step roughening. Competition between NNN and nearest-neighbor interactions results in an alternative form of meandering instability which we call "roughening-limited" growth, rather than attachment-detachment-limited growth that governs the Bales-Zangwill instability. The computed effective wavelength and the effective stiffness behave as λeff˜F-q and β˜eff˜F-p , respectively, with q ≈p /2 .
Multi-strategy based quantum cost reduction of linear nearest-neighbor quantum circuit

NASA Astrophysics Data System (ADS)

Tan, Ying-ying; Cheng, Xue-yun; Guan, Zhi-jin; Liu, Yang; Ma, Haiying

2018-03-01

With the development of reversible and quantum computing, study of reversible and quantum circuits has also developed rapidly. Due to physical constraints, most quantum circuits require quantum gates to interact on adjacent quantum bits. However, many existing quantum circuits nearest-neighbor have large quantum cost. Therefore, how to effectively reduce quantum cost is becoming a popular research topic. In this paper, we proposed multiple optimization strategies to reduce the quantum cost of the circuit, that is, we reduce quantum cost from MCT gates decomposition, nearest neighbor and circuit simplification, respectively. The experimental results show that the proposed strategies can effectively reduce the quantum cost, and the maximum optimization rate is 30.61% compared to the corresponding results.
A nearest neighbor approach for automated transporter prediction and categorization from protein sequences.

PubMed

Li, Haiquan; Dai, Xinbin; Zhao, Xuechun

2008-05-01

Membrane transport proteins play a crucial role in the import and export of ions, small molecules or macromolecules across biological membranes. Currently, there are a limited number of published computational tools which enable the systematic discovery and categorization of transporters prior to costly experimental validation. To approach this problem, we utilized a nearest neighbor method which seamlessly integrates homologous search and topological analysis into a machine-learning framework. Our approach satisfactorily distinguished 484 transporter families in the Transporter Classification Database, a curated and representative database for transporters. A five-fold cross-validation on the database achieved a positive classification rate of 72.3% on average. Furthermore, this method successfully detected transporters in seven model and four non-model organisms, ranging from archaean to mammalian species. A preliminary literature-based validation has cross-validated 65.8% of our predictions on the 11 organisms, including 55.9% of our predictions overlapping with 83.6% of the predicted transporters in TransportDB.
Efficiency of encounter-controlled reaction between diffusing reactants in a finite lattice: Non-nearest-neighbor effects

NASA Astrophysics Data System (ADS)

Bentz, Jonathan L.; Kozak, John J.; Nicolis, Gregoire

2005-08-01

The influence of non-nearest-neighbor displacements on the efficiency of diffusion-reaction processes involving one and two mobile diffusing reactants is studied. An exact analytic result is given for dimension d=1 from which, for large lattices, one can recover the asymptotic estimate reported 30 years ago by Lakatos-Lindenberg and Shuler. For dimensions d=2,3 we present numerically exact values for the mean time to reaction, as gauged by the mean walklength before reactive encounter, obtained via the theory of finite Markov processes and supported by Monte Carlo simulations. Qualitatively different results are found between processes occurring on d=1 versus d>1 lattices, and between results obtained assuming nearest-neighbor (only) versus non-nearest-neighbor displacements.
Spectral properties near the Mott transition in the two-dimensional t-J model with next-nearest-neighbor hopping

NASA Astrophysics Data System (ADS)

Kohno, Masanori

2018-05-01

The single-particle spectral properties of the two-dimensional t-J model with next-nearest-neighbor hopping are investigated near the Mott transition by using cluster perturbation theory. The spectral features are interpreted by considering the effects of the next-nearest-neighbor hopping on the shift of the spectral-weight distribution of the two-dimensional t-J model. Various anomalous features observed in hole-doped and electron-doped high-temperature cuprate superconductors are collectively explained in the two-dimensional t-J model with next-nearest-neighbor hopping near the Mott transition.
Thermal rectification in mass-graded next-nearest-neighbor Fermi-Pasta-Ulam lattices

NASA Astrophysics Data System (ADS)

Romero-Bastida, M.; Miranda-Peña, Jorge-Orlando; López, Juan M.

2017-03-01

We study the thermal rectification efficiency, i.e., quantification of asymmetric heat flow, of a one-dimensional mass-graded anharmonic oscillator Fermi-Pasta-Ulam lattice both with nearest-neighbor (NN) and next-nearest-neighbor (NNN) interactions. The system presents a maximum rectification efficiency for a very precise value of the parameter that controls the coupling strength of the NNN interactions, which also optimizes the rectification figure when its dependence on mass asymmetry and temperature differences is considered. The origin of the enhanced rectification is the asymmetric local heat flow response as the heat reservoirs are swapped when a finely tuned NNN contribution is taken into account. A simple theoretical analysis gives an estimate of the optimal NNN coupling in excellent agreement with our simulation results.
Seismic clusters analysis in Northeastern Italy by the nearest-neighbor approach

NASA Astrophysics Data System (ADS)

Peresan, Antonella; Gentili, Stefania

2018-01-01

The main features of earthquake clusters in Northeastern Italy are explored, with the aim to get new insights on local scale patterns of seismicity in the area. The study is based on a systematic analysis of robustly and uniformly detected seismic clusters, which are identified by a statistical method, based on nearest-neighbor distances of events in the space-time-energy domain. The method permits us to highlight and investigate the internal structure of earthquake sequences, and to differentiate the spatial properties of seismicity according to the different topological features of the clusters structure. To analyze seismicity of Northeastern Italy, we use information from local OGS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. A preliminary reappraisal of the earthquake bulletins is carried out and the area of sufficient completeness is outlined. Various techniques are considered to estimate the scaling parameters that characterize earthquakes occurrence in the region, namely the b-value and the fractal dimension of epicenters distribution, required for the application of the nearest-neighbor technique. Specifically, average robust estimates of the parameters of the Unified Scaling Law for Earthquakes, USLE, are assessed for the whole outlined region and are used to compute the nearest-neighbor distances. Clusters identification by the nearest-neighbor method turn out quite reliable and robust with respect to the minimum magnitude cutoff of the input catalog; the identified clusters are well consistent with those obtained from manual aftershocks identification of selected sequences. We demonstrate that the earthquake clusters have distinct preferred geographic locations, and we identify two areas that differ substantially in the examined clustering properties. Specifically, burst-like sequences are associated with the north-western part and swarm-like sequences with the south-eastern part of the study
A two-step nearest neighbors algorithm using satellite imagery for predicting forest structure within species composition classes

Treesearch

Ronald E. McRoberts

2009-01-01

Nearest neighbors techniques have been shown to be useful for predicting multiple forest attributes from forest inventory and Landsat satellite image data. However, in regions lacking good digital land cover information, nearest neighbors selected to predict continuous variables such as tree volume must be selected without regard to relevant categorical variables such...
AVNM: A Voting based Novel Mathematical Rule for Image Classification.

PubMed

Vidyarthi, Ankit; Mittal, Namita

2016-12-01

In machine learning, the accuracy of the system depends upon classification result. Classification accuracy plays an imperative role in various domains. Non-parametric classifier like K-Nearest Neighbor (KNN) is the most widely used classifier for pattern analysis. Besides its easiness, simplicity and effectiveness characteristics, the main problem associated with KNN classifier is the selection of a number of nearest neighbors i.e. "k" for computation. At present, it is hard to find the optimal value of "k" using any statistical algorithm, which gives perfect accuracy in terms of low misclassification error rate. Motivated by the prescribed problem, a new sample space reduction weighted voting mathematical rule (AVNM) is proposed for classification in machine learning. The proposed AVNM rule is also non-parametric in nature like KNN. AVNM uses the weighted voting mechanism with sample space reduction to learn and examine the predicted class label for unidentified sample. AVNM is free from any initial selection of predefined variable and neighbor selection as found in KNN algorithm. The proposed classifier also reduces the effect of outliers. To verify the performance of the proposed AVNM classifier, experiments are made on 10 standard datasets taken from UCI database and one manually created dataset. The experimental result shows that the proposed AVNM rule outperforms the KNN classifier and its variants. Experimentation results based on confusion matrix accuracy parameter proves higher accuracy value with AVNM rule. The proposed AVNM rule is based on sample space reduction mechanism for identification of an optimal number of nearest neighbor selections. AVNM results in better classification accuracy and minimum error rate as compared with the state-of-art algorithm, KNN, and its variants. The proposed rule automates the selection of nearest neighbor selection and improves classification rate for UCI dataset and manually created dataset. Copyright © 2016 Elsevier
Predicting acute contact toxicity of pesticides in honeybees (Apis mellifera) through a k-nearest neighbor model.

PubMed

Como, F; Carnesecchi, E; Volani, S; Dorne, J L; Richardson, J; Bassan, A; Pavan, M; Benfenati, E

2017-01-01

Ecological risk assessment of plant protection products (PPPs) requires an understanding of both the toxicity and the extent of exposure to assess risks for a range of taxa of ecological importance including target and non-target species. Non-target species such as honey bees (Apis mellifera), solitary bees and bumble bees are of utmost importance because of their vital ecological services as pollinators of wild plants and crops. To improve risk assessment of PPPs in bee species, computational models predicting the acute and chronic toxicity of a range of PPPs and contaminants can play a major role in providing structural and physico-chemical properties for the prioritisation of compounds of concern and future risk assessments. Over the last three decades, scientific advisory bodies and the research community have developed toxicological databases and quantitative structure-activity relationship (QSAR) models that are proving invaluable to predict toxicity using historical data and reduce animal testing. This paper describes the development and validation of a k-Nearest Neighbor (k-NN) model using in-house software for the prediction of acute contact toxicity of pesticides on honey bees. Acute contact toxicity data were collected from different sources for 256 pesticides, which were divided into training and test sets. The k-NN models were validated with good prediction, with an accuracy of 70% for all compounds and of 65% for highly toxic compounds, suggesting that they might reliably predict the toxicity of structurally diverse pesticides and could be used to screen and prioritise new pesticides. Copyright © 2016 Elsevier Ltd. All rights reserved.
Collective coherence in nearest neighbor coupled metamaterials: A metasurface ruler equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Ningning; Zhang, Weili, E-mail: weili.zhang@okstate.edu; Singh, Ranjan, E-mail: ranjans@ntu.edu.sg

The collective coherent interactions in a meta-atom lattice are the key to myriad applications and functionalities offered by metasurfaces. We demonstrate a collective coherent response of the nearest neighbor coupled split-ring resonators whose resonance shift decays exponentially in the strong near-field coupled regime. This occurs due to the dominant magnetic coupling between the nearest neighbors which leads to the decay of the electromagnetic near fields. Based on the size scaling behavior of the different periodicity metasurfaces, we identified a collective coherent metasurface ruler equation. From the coherent behavior, we also show that the near-field coupling in a metasurface lattice existsmore » even when the periodicity exceeds the resonator size. The identification of a universal coherence in metasurfaces and their scaling behavior would enable the design of novel metadevices whose spectral tuning response based on near-field effects could be calibrated across microwave, terahertz, infrared, and the optical parts of the electromagnetic spectrum.« less
Thermodynamics of alternating spin chains with competing nearest- and next-nearest-neighbor interactions: Ising model

NASA Astrophysics Data System (ADS)

Pini, Maria Gloria; Rettori, Angelo

1993-08-01

The thermodynamical properties of an alternating spin (S,s) one-dimensional (1D) Ising model with competing nearest- and next-nearest-neighbor interactions are exactly calculated using a transfer-matrix technique. In contrast to the case S=s=1/2, previously investigated by Harada, the alternation of different spins (S≠s) along the chain is found to give rise to two-peaked static structure factors, signaling the coexistence of different short-range-order configurations. The relevance of our calculations with regard to recent experimental data by Gatteschi et al. in quasi-1D molecular magnetic materials, R (hfac)3 NITEt (R=Gd, Tb, Dy, Ho, Er, . . .), is discussed; hfac is hexafluoro-acetylacetonate and NlTEt is 2-Ethyl-4,4,5,5-tetramethyl-4,5-dihydro-1H-imidazolyl-1-oxyl-3-oxide.
Identification of jasmine flower (Jasminum sp.) based on the shape of the flower using sobel edge and k-nearest neighbour

NASA Astrophysics Data System (ADS)

Qur’ania, A.; Sarinah, I.

2018-03-01

People often wrong in knowing the type of jasmine by just looking at the white color of the jasmine, while not all white flowers including jasmine and not all jasmine flowers have white. There is a jasmine that is yellow and there is a jasmine that is white and purple.The aim of this research is to identify Jasmine flower (Jasminum sp.) based on the shape of the flower image-based using Sobel edge detection and k-Nearest Neighbor. Edge detection is used to detect the type of flower from the flower shape. Edge detection aims to improve the appearance of the border of a digital image. While k-Nearest Neighbor method is used to classify the classification of test objects into classes that have neighbouring properties closest to the object of training. The data used in this study are three types of jasmine namely jasmine white (Jasminum sambac), jasmine gambir (Jasminum pubescens), and jasmine japan (Pseuderanthemum reticulatum). Testing of jasmine flower image resized 50 × 50 pixels, 100 × 100 pixels, 150 × 150 pixels yields an accuracy of 84%. Tests on distance values of the k-NN method with spacing 5, 10 and 15 resulted in different accuracy rates for 5 and 10 closest distances yielding the same accuracy rate of 84%, for the 15 shortest distance resulted in a small accuracy of 65.2%.
α-K2AgF4: Ferromagnetism induced by the weak superexchange of different eg orbitals from the nearest neighbor Ag ions

NASA Astrophysics Data System (ADS)

Zhang, Xiaoli; Zhang, Guoren; Jia, Ting; Zeng, Zhi; Lin, H. Q.

2016-05-01

We study the abnormal ferromagnetism in α-K2AgF4, which is very similar to high-TC parent material La2CuO4 in structure. We find out that the electron correlation is very important in determining the insulating property of α-K2AgF4. The Ag(II) 4d9 in the octahedron crystal field has the t2 g 6 eg 3 electron occupation with eg x2-y2 orbital fully occupied and 3z2-r2 orbital partially occupied. The two eg orbitals are very extended indicating both of them are active in superexchange. Using the Hubbard model combined with Nth-order muffin-tin orbital (NMTO) downfolding technique, it is concluded that the exchange interaction between eg 3z2-r2 and x2-y2 from the first nearest neighbor Ag ions leads to the anomalous ferromagnetism in α-K2AgF4.
Elliptic Painlevé equations from next-nearest-neighbor translations on the E_8^{(1)} lattice

NASA Astrophysics Data System (ADS)

Joshi, Nalini; Nakazono, Nobutaka

2017-07-01

The well known elliptic discrete Painlevé equation of Sakai is constructed by a standard translation on the E_8(1) lattice, given by nearest neighbor vectors. In this paper, we give a new elliptic discrete Painlevé equation obtained by translations along next-nearest-neighbor vectors. This equation is a generic (8-parameter) version of a 2-parameter elliptic difference equation found by reduction from Adler’s partial difference equation, the so-called Q4 equation. We also provide a projective reduction of the well known equation of Sakai.

Generative Models for Similarity-based Classification

DTIC Science & Technology

2007-01-01

NC), local nearest centroid (local NC), k-nearest neighbors ( kNN ), and condensed nearest neighbors (CNN) are all similarity-based classifiers which...vector machine to the k nearest neighbors of the test sample [80]. The SVM- KNN method was developed to address the robustness and dimensionality...concerns that afflict nearest neighbors and SVMs. Similarly to the nearest-means classifier, the SVM- KNN is a hybrid local and global classifier developed
Fuzzy-Rough Nearest Neighbour Classification

NASA Astrophysics Data System (ADS)

Jensen, Richard; Cornelis, Chris

A new fuzzy-rough nearest neighbour (FRNN) classification algorithm is presented in this paper, as an alternative to Sarkar's fuzzy-rough ownership function (FRNN-O) approach. By contrast to the latter, our method uses the nearest neighbours to construct lower and upper approximations of decision classes, and classifies test instances based on their membership to these approximations. In the experimental analysis, we evaluate our approach with both classical fuzzy-rough approximations (based on an implicator and a t-norm), as well as with the recently introduced vaguely quantified rough sets. Preliminary results are very good, and in general FRNN outperforms FRNN-O, as well as the traditional fuzzy nearest neighbour (FNN) algorithm.
Phase transitions in the antiferromagnetic Ising model on a body-centered cubic lattice with interactions between next-to-nearest neighbors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Murtazaev, A. K.; Ramazanov, M. K., E-mail: sheikh77@mail.ru; Kassan-Ogly, F. A.

2015-01-15

Phase transitions in the antiferromagnetic Ising model on a body-centered cubic lattice are studied on the basis of the replica algorithm by the Monte Carlo method and histogram analysis taking into account the interaction of next-to-nearest neighbors. The phase diagram of the dependence of the critical temperature on the intensity of interaction of the next-to-nearest neighbors is constructed. It is found that a second-order phase transition is realized in this model in the investigated interval of the intensities of interaction of next-to-nearest neighbors.
Self-Organizing Map Neural Network-Based Nearest Neighbor Position Estimation Scheme for Continuous Crystal PET Detectors

NASA Astrophysics Data System (ADS)

Wang, Yonggang; Li, Deng; Lu, Xiaoming; Cheng, Xinyi; Wang, Liwei

2014-10-01

Continuous crystal-based positron emission tomography (PET) detectors could be an ideal alternative for current high-resolution pixelated PET detectors if the issues of high performance γ interaction position estimation and its real-time implementation are solved. Unfortunately, existing position estimators are not very feasible for implementation on field-programmable gate array (FPGA). In this paper, we propose a new self-organizing map neural network-based nearest neighbor (SOM-NN) positioning scheme aiming not only at providing high performance, but also at being realistic for FPGA implementation. Benefitting from the SOM feature mapping mechanism, the large set of input reference events at each calibration position is approximated by a small set of prototypes, and the computation of the nearest neighbor searching for unknown events is largely reduced. Using our experimental data, the scheme was evaluated, optimized and compared with the smoothed k-NN method. The spatial resolutions of full-width-at-half-maximum (FWHM) of both methods averaged over the center axis of the detector were obtained as 1.87 ±0.17 mm and 1.92 ±0.09 mm, respectively. The test results show that the SOM-NN scheme has an equivalent positioning performance with the smoothed k-NN method, but the amount of computation is only about one-tenth of the smoothed k-NN method. In addition, the algorithm structure of the SOM-NN scheme is more feasible for implementation on FPGA. It has the potential to realize real-time position estimation on an FPGA with a high-event processing throughput.
Active learning for solving the incomplete data problem in facial age classification by the furthest nearest-neighbor criterion.

PubMed

Wang, Jian-Gang; Sung, Eric; Yau, Wei-Yun

2011-07-01

Facial age classification is an approach to classify face images into one of several predefined age groups. One of the difficulties in applying learning techniques to the age classification problem is the large amount of labeled training data required. Acquiring such training data is very costly in terms of age progress, privacy, human time, and effort. Although unlabeled face images can be obtained easily, it would be expensive to manually label them on a large scale and getting the ground truth. The frugal selection of the unlabeled data for labeling to quickly reach high classification performance with minimal labeling efforts is a challenging problem. In this paper, we present an active learning approach based on an online incremental bilateral two-dimension linear discriminant analysis (IB2DLDA) which initially learns from a small pool of labeled data and then iteratively selects the most informative samples from the unlabeled set to increasingly improve the classifier. Specifically, we propose a novel data selection criterion called the furthest nearest-neighbor (FNN) that generalizes the margin-based uncertainty to the multiclass case and which is easy to compute, so that the proposed active learning algorithm can handle a large number of classes and large data sizes efficiently. Empirical experiments on FG-NET and Morph databases together with a large unlabeled data set for age categorization problems show that the proposed approach can achieve results comparable or even outperform a conventionally trained active classifier that requires much more labeling effort. Our IB2DLDA-FNN algorithm can achieve similar results much faster than random selection and with fewer samples for age categorization. It also can achieve comparable results with active SVM but is much faster than active SVM in terms of training because kernel methods are not needed. The results on the face recognition database and palmprint/palm vein database showed that our approach can handle
Parametric, bootstrap, and jackknife variance estimators for the k-Nearest Neighbors technique with illustrations using forest inventory and satellite image data

Treesearch

Ronald E. McRoberts; Steen Magnussen; Erkki O. Tomppo; Gherardo Chirici

2011-01-01

Nearest neighbors techniques have been shown to be useful for estimating forest attributes, particularly when used with forest inventory and satellite image data. Published reports of positive results have been truly international in scope. However, for these techniques to be more useful, they must be able to contribute to scientific inference which, for sample-based...
Phase transition and monopole densities in a nearest neighbor two-dimensional spin ice model

NASA Astrophysics Data System (ADS)

Morais, C. W.; de Freitas, D. N.; Mota, A. L.; Bastone, E. C.

2017-12-01

In this work, we show that, due to the alternating orientation of the spins in the ground state of the artificial square spin ice, the influence of a set of spins at a certain distance of a reference spin decreases faster than the expected result for the long range dipolar interaction, justifying the use of the nearest neighbor two-dimensional square spin ice model as an effective model. Using an extension of the model presented in Y. L. Xie et al., Sci. Rep. 5, 15875 (2015), considering the influence of the eight nearest neighbors of each spin on the lattice, we analyze the thermodynamics of the model and study the dependence of monopoles and string densities as a function of the temperature.
Error minimizing algorithms for nearest eighbor classifiers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Porter, Reid B; Hush, Don; Zimmer, G. Beate

2011-01-03

Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. Wemore » use the framework to investigate a new cost sensitive loss function that allows us to train a Nearest Neighbor type classifier for low false alarm rate applications. We report results on both synthetic data and real-world image data.« less
Rotationally Invariant Image Representation for Viewing Direction Classification in Cryo-EM

PubMed Central

Zhao, Zhizhen; Singer, Amit

2014-01-01

We introduce a new rotationally invariant viewing angle classification method for identifying, among a large number of cryo-EM projection images, similar views without prior knowledge of the molecule. Our rotationally invariant features are based on the bispectrum. Each image is denoised and compressed using steerable principal component analysis (PCA) such that rotating an image is equivalent to phase shifting the expansion coefficients. Thus we are able to extend the theory of bispectrum of 1D periodic signals to 2D images. The randomized PCA algorithm is then used to efficiently reduce the dimensionality of the bispectrum coefficients, enabling fast computation of the similarity between any pair of images. The nearest neighbors provide an initial classification of similar viewing angles. In this way, rotational alignment is only performed for images with their nearest neighbors. The initial nearest neighbor classification and alignment are further improved by a new classification method called vector diffusion maps. Our pipeline for viewing angle classification and alignment is experimentally shown to be faster and more accurate than reference-free alignment with rotationally invariant K-means clustering, MSA/MRA 2D classification, and their modern approximations. PMID:24631969
Nearest Neighbor Searching in Binary Search Trees: Simulation of a Multiprocessor System.

ERIC Educational Resources Information Center

Stewart, Mark; Willett, Peter

1987-01-01

Describes the simulation of a nearest neighbor searching algorithm for document retrieval using a pool of microprocessors. Three techniques are described which allow parallel searching of a binary search tree as well as a PASCAL-based system, PASSIM, which can simulate these techniques. Fifty-six references are provided. (Author/LRW)
Localization in one-dimensional lattices with non-nearest-neighbor hopping: Generalized Anderson and Aubry-Andre models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Biddle, J.; Priour, D. J. Jr.; Wang, B.

We study the quantum localization phenomena of noninteracting particles in one-dimensional lattices based on tight-binding models with various forms of hopping terms beyond the nearest neighbor, which are generalizations of the famous Aubry-Andre and noninteracting Anderson models. For the case with deterministic disordered potential induced by a secondary incommensurate lattice (i.e., the Aubry-Andre model), we identify a class of self-dual models, for which the boundary between localized and extended eigenstates are determined analytically by employing a generalized Aubry-Andre transformation. We also numerically investigate the localization properties of nondual models with next-nearest-neighbor hopping, Gaussian, and power-law decay hopping terms. We findmore » that even for these nondual models, the numerically obtained mobility edges can be well approximated by the analytically obtained condition for localization transition in the self-dual models, as long as the decay of the hopping rate with respect to distance is sufficiently fast. For the disordered potential with genuinely random character, we examine scenarios with next-nearest-neighbor hopping, exponential, Gaussian, and power-law decay hopping terms numerically. We find that the higher-order hopping terms can remove the symmetry in the localization length about the energy band center compared to the Anderson model. Furthermore, our results demonstrate that for the power-law decay case, there exists a critical exponent below which mobility edges can be found. Our theoretical results could, in principle, be directly tested in shallow atomic optical lattice systems enabling non-nearest-neighbor hopping.« less
Vehicle Classification Using an Imbalanced Dataset Based on a Single Magnetic Sensor.

PubMed

Xu, Chang; Wang, Yingguan; Bao, Xinghe; Li, Fengrong

2018-05-24

This paper aims to improve the accuracy of automatic vehicle classifiers for imbalanced datasets. Classification is made through utilizing a single anisotropic magnetoresistive sensor, with the models of vehicles involved being classified into hatchbacks, sedans, buses, and multi-purpose vehicles (MPVs). Using time domain and frequency domain features in combination with three common classification algorithms in pattern recognition, we develop a novel feature extraction method for vehicle classification. These three common classification algorithms are the k-nearest neighbor, the support vector machine, and the back-propagation neural network. Nevertheless, a problem remains with the original vehicle magnetic dataset collected being imbalanced, and may lead to inaccurate classification results. With this in mind, we propose an approach called SMOTE, which can further boost the performance of classifiers. Experimental results show that the k-nearest neighbor (KNN) classifier with the SMOTE algorithm can reach a classification accuracy of 95.46%, thus minimizing the effect of the imbalance.
Polymers with nearest- and next nearest-neighbor interactions on the Husimi lattice

NASA Astrophysics Data System (ADS)

Oliveira, Tiago J.

2016-04-01

The exact grand-canonical solution of a generalized interacting self-avoid walk (ISAW) model, placed on a Husimi lattice built with squares, is presented. In this model, beyond the traditional interaction {ω }1={{{e}}}{ɛ 1/{k}BT} between (nonconsecutive) monomers on nearest-neighbor (NN) sites, an additional energy {ɛ }2 is associated to next-NN (NNN) monomers. Three definitions of NNN sites/interactions are considered, where each monomer can have, effectively, at most two, four, or six NNN monomers on the Husimi lattice. The phase diagrams found in all cases have (qualitatively) the same thermodynamic properties: a non-polymerized (NP) and a polymerized (P) phase separated by a critical and a coexistence surface that meet at a tricritical (θ-) line. This θ-line is found even when one of the interactions is repulsive, existing for {ω }1 in the range [0,∞ ), i.e., for {ɛ }1/{k}BT in the range [-∞ ,∞ ). Thus, counterintuitively, a θ-point exists even for an infinite repulsion between NN monomers ({ω }1=0), being associated to a coil-‘soft globule’ transition. In the limit of an infinite repulsive force between NNN monomers, however, the coil-globule transition disappears, and only NP-P continuous transition is observed. This particular case, with {ω }2=0, is also solved exactly on the square lattice, using a transfer matrix calculation where a discontinuous NP-P transition is found. For attractive and repulsive forces between NN and NNN monomers, respectively, the model becomes quite similar to the semiflexible-ISAW one, whose crystalline phase is not observed here, as a consequence of the frustration due to competing NN and NNN forces. The mapping of the phase diagrams in canonical ones is discussed and compared with recent results from Monte Carlo simulations on the square lattice.
Modeling Gas and Gas Hydrate Accumulation in Marine Sediments Using a K-Nearest Neighbor Machine-Learning Technique

NASA Astrophysics Data System (ADS)

Wood, W. T.; Runyan, T. E.; Palmsten, M.; Dale, J.; Crawford, C.

2016-12-01

Natural Gas (primarily methane) and gas hydrate accumulations require certain bio-geochemical, as well as physical conditions, some of which are poorly sampled and/or poorly understood. We exploit recent advances in the prediction of seafloor porosity and heat flux via machine learning techniques (e.g. Random forests and Bayesian networks) to predict the occurrence of gas and subsequently gas hydrate in marine sediments. The prediction (actually guided interpolation) of key parameters we use in this study is a K-nearest neighbor technique. KNN requires only minimal pre-processing of the data and predictors, and requires minimal run-time input so the results are almost entirely data-driven. Specifically we use new estimates of sedimentation rate and sediment type, along with recently derived compaction modeling to estimate profiles of porosity and age. We combined the compaction with seafloor heat flux to estimate temperature with depth and geologic age, which, with estimates of organic carbon, and models of methanogenesis yield limits on the production of methane. Results include geospatial predictions of gas (and gas hydrate) accumulations, with quantitative estimates of uncertainty. The Generic Earth Modeling System (GEMS) we have developed to derive the machine learning estimates is modular and easily updated with new algorithms or data.
Monte Carlo study of a ferrimagnetic mixed-spin (2, 5/2) system with the nearest and next-nearest neighbors exchange couplings

NASA Astrophysics Data System (ADS)

Bi, Jiang-lin; Wang, Wei; Li, Qi

2017-07-01

In this paper, the effects of the next-nearest neighbors exchange couplings on the magnetic and thermal properties of the ferrimagnetic mixed-spin (2, 5/2) Ising model on a 3D honeycomb lattice have been investigated by the use of Monte Carlo simulation. In particular, the influences of exchange couplings (Ja, Jb, Jan) and the single-ion anisotropy(Da) on the phase diagrams, the total magnetization, the sublattice magnetization, the total susceptibility, the internal energy and the specific heat have been discussed in detail. The results clearly show that the system can express the critical and compensation behavior within the next-nearest neighbors exchange coupling. Great deals of the M curves such as N-, Q-, P- and L-types have been discovered, owing to the competition between the exchange coupling and the temperature. Compared with other theoretical and experimental works, our results have an excellent consistency with theirs.
Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search.

PubMed

Liu, Xianglong; Deng, Cheng; Lang, Bo; Tao, Dacheng; Li, Xuelong

2016-02-01

Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare work studies the unified approach to constructing multiple informative hash tables using any type of hashing algorithms. Meanwhile, for multiple table search, it also lacks of a generic query-adaptive and fine-grained ranking scheme that can alleviate the binary quantization loss suffered in the standard hashing techniques. To solve the above problems, in this paper, we first regard the table construction as a selection problem over a set of candidate hash functions. With the graph representation of the function set, we propose an efficient solution that sequentially applies normalized dominant set to finding the most informative and independent hash functions for each table. To further reduce the redundancy between tables, we explore the reciprocal hash tables in a boosting manner, where the hash function graph is updated with high weights emphasized on the misclassified neighbor pairs of previous hash tables. To refine the ranking of the retrieved buckets within a certain Hamming radius from the query, we propose a query-adaptive bitwise weighting scheme to enable fine-grained bucket ranking in each hash table, exploiting the discriminative power of its hash functions and their complement for nearest neighbor search. Moreover, we integrate such scheme into the multiple table search using a fast, yet reciprocal table lookup algorithm within the adaptive weighted Hamming radius. In this paper, both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings. Our extensive experiments on several large-scale benchmarks demonstrate that the proposed techniques can significantly outperform both
Aftershock identification problem via the nearest-neighbor analysis for marked point processes

NASA Astrophysics Data System (ADS)

Gabrielov, A.; Zaliapin, I.; Wong, H.; Keilis-Borok, V.

2007-12-01

The centennial observations on the world seismicity have revealed a wide variety of clustering phenomena that unfold in the space-time-energy domain and provide most reliable information about the earthquake dynamics. However, there is neither a unifying theory nor a convenient statistical apparatus that would naturally account for the different types of seismic clustering. In this talk we present a theoretical framework for nearest-neighbor analysis of marked processes and obtain new results on hierarchical approach to studying seismic clustering introduced by Baiesi and Paczuski (2004). Recall that under this approach one defines an asymmetric distance D in space-time-energy domain such that the nearest-neighbor spanning graph with respect to D becomes a time- oriented tree. We demonstrate how this approach can be used to detect earthquake clustering. We apply our analysis to the observed seismicity of California and synthetic catalogs from ETAS model and show that the earthquake clustering part is statistically different from the homogeneous part. This finding may serve as a basis for an objective aftershock identification procedure.
Remaining Useful Life Estimation of Insulated Gate Biploar Transistors (IGBTs) Based on a Novel Volterra k-Nearest Neighbor Optimally Pruned Extreme Learning Machine (VKOPP) Model Using Degradation Data

PubMed Central

Mei, Wenjuan; Zeng, Xianping; Yang, Chenglin; Zhou, Xiuyun

2017-01-01

The insulated gate bipolar transistor (IGBT) is a kind of excellent performance switching device used widely in power electronic systems. How to estimate the remaining useful life (RUL) of an IGBT to ensure the safety and reliability of the power electronics system is currently a challenging issue in the field of IGBT reliability. The aim of this paper is to develop a prognostic technique for estimating IGBTs’ RUL. There is a need for an efficient prognostic algorithm that is able to support in-situ decision-making. In this paper, a novel prediction model with a complete structure based on optimally pruned extreme learning machine (OPELM) and Volterra series is proposed to track the IGBT’s degradation trace and estimate its RUL; we refer to this model as Volterra k-nearest neighbor OPELM prediction (VKOPP) model. This model uses the minimum entropy rate method and Volterra series to reconstruct phase space for IGBTs’ ageing samples, and a new weight update algorithm, which can effectively reduce the influence of the outliers and noises, is utilized to establish the VKOPP network; then a combination of the k-nearest neighbor method (KNN) and least squares estimation (LSE) method is used to calculate the output weights of OPELM and predict the RUL of the IGBT. The prognostic results show that the proposed approach can predict the RUL of IGBT modules with small error and achieve higher prediction precision and lower time cost than some classic prediction approaches. PMID:29099811
Dynamical phases in a one-dimensional chain of heterospecies Rydberg atoms with next-nearest-neighbor interactions

NASA Astrophysics Data System (ADS)

Qian, Jing; Zhang, Lu; Zhai, Jingjing; Zhang, Weiping

2015-12-01

We theoretically investigate the dynamical phase diagram of a one-dimensional chain of laser-excited two-species Rydberg atoms. The existence of a variety of unique dynamical phases in the experimentally achievable parameter region is predicted under the mean-field approximation, and the change in those phases when the effect of the next-nearest-neighbor interaction is included is further discussed. In particular, we find that the com-petition of the strong Rydberg-Rydberg interactions and the optical excitation imbalance can lead to the presence of complex multiple chaotic phases, which are highly sensitive to the initial Rydberg-state population and the strength of the next-nearest-neighbor interactions.
A dynamical mean-field study of orbital-selective Mott phase enhanced by next-nearest neighbor hopping

NASA Astrophysics Data System (ADS)

Niu, Yuekun; Sun, Jian; Ni, Yu; Song, Yun

2018-06-01

The dynamical mean-field theory is employed to study the orbital-selective Mott transition (OSMT) of the two-orbital Hubbard model with nearest neighbor hopping and next-nearest neighbor (NNN) hopping. The NNN hopping breaks the particle-hole symmetry at half filling and gives rise to an asymmetric density of states (DOS). Our calculations show that the broken symmetry of DOS benefits the OSMT, where the region of the orbital-selective Mott phase significantly extends with the increasing NNN hopping integral. We also find that Hund's rule coupling promotes OSMT by blocking the orbital fluctuations, but the influence of NNN hopping is more remarkable.

Discriminative Hierarchical K-Means Tree for Large-Scale Image Classification.

PubMed

Chen, Shizhi; Yang, Xiaodong; Tian, Yingli

2015-09-01

A key challenge in large-scale image classification is how to achieve efficiency in terms of both computation and memory without compromising classification accuracy. The learning-based classifiers achieve the state-of-the-art accuracies, but have been criticized for the computational complexity that grows linearly with the number of classes. The nonparametric nearest neighbor (NN)-based classifiers naturally handle large numbers of categories, but incur prohibitively expensive computation and memory costs. In this brief, we present a novel classification scheme, i.e., discriminative hierarchical K-means tree (D-HKTree), which combines the advantages of both learning-based and NN-based classifiers. The complexity of the D-HKTree only grows sublinearly with the number of categories, which is much better than the recent hierarchical support vector machines-based methods. The memory requirement is the order of magnitude less than the recent Naïve Bayesian NN-based approaches. The proposed D-HKTree classification scheme is evaluated on several challenging benchmark databases and achieves the state-of-the-art accuracies, while with significantly lower computation cost and memory requirement.
Next nearest neighbors sites and the reactivity of the CO NO surface reaction

NASA Astrophysics Data System (ADS)

Cortés, Joaquín.; Valencia, Eliana

1998-04-01

Using Monte Carlo experiments of the reduction of NO by CO, a study is made of the effect on reactivity due to the formation of N 2O and to the increased coordination of the sites considering the next nearest neighbors sites (nnn) in a square lattice of superficial sites.
On the classification techniques in data mining for microarray data classification

NASA Astrophysics Data System (ADS)

Aydadenta, Husna; Adiwijaya

2018-03-01

Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.
Distributed Adaptive Binary Quantization for Fast Nearest Neighbor Search.

PubMed

Xianglong Liu; Zhujin Li; Cheng Deng; Dacheng Tao

2017-11-01

Hashing has been proved an attractive technique for fast nearest neighbor search over big data. Compared with the projection based hashing methods, prototype-based ones own stronger power to generate discriminative binary codes for the data with complex intrinsic structure. However, existing prototype-based methods, such as spherical hashing and K-means hashing, still suffer from the ineffective coding that utilizes the complete binary codes in a hypercube. To address this problem, we propose an adaptive binary quantization (ABQ) method that learns a discriminative hash function with prototypes associated with small unique binary codes. Our alternating optimization adaptively discovers the prototype set and the code set of a varying size in an efficient way, which together robustly approximate the data relations. Our method can be naturally generalized to the product space for long hash codes, and enjoys the fast training linear to the number of the training data. We further devise a distributed framework for the large-scale learning, which can significantly speed up the training of ABQ in the distributed environment that has been widely deployed in many areas nowadays. The extensive experiments on four large-scale (up to 80 million) data sets demonstrate that our method significantly outperforms state-of-the-art hashing methods, with up to 58.84% performance gains relatively.
A novel method for the detection of R-peaks in ECG based on K-Nearest Neighbors and Particle Swarm Optimization

NASA Astrophysics Data System (ADS)

He, Runnan; Wang, Kuanquan; Li, Qince; Yuan, Yongfeng; Zhao, Na; Liu, Yang; Zhang, Henggui

2017-12-01

Cardiovascular diseases are associated with high morbidity and mortality. However, it is still a challenge to diagnose them accurately and efficiently. Electrocardiogram (ECG), a bioelectrical signal of the heart, provides crucial information about the dynamical functions of the heart, playing an important role in cardiac diagnosis. As the QRS complex in ECG is associated with ventricular depolarization, therefore, accurate QRS detection is vital for interpreting ECG features. In this paper, we proposed a real-time, accurate, and effective algorithm for QRS detection. In the algorithm, a proposed preprocessor with a band-pass filter was first applied to remove baseline wander and power-line interference from the signal. After denoising, a method combining K-Nearest Neighbor (KNN) and Particle Swarm Optimization (PSO) was used for accurate QRS detection in ECGs with different morphologies. The proposed algorithm was tested and validated using 48 ECG records from MIT-BIH arrhythmia database (MITDB), achieved a high averaged detection accuracy, sensitivity and positive predictivity of 99.43, 99.69, and 99.72%, respectively, indicating a notable improvement to extant algorithms as reported in literatures.
Fast Query-Optimized Kernel-Machine Classification

NASA Technical Reports Server (NTRS)

Mazzoni, Dominic; DeCoste, Dennis

2004-01-01

A recently developed algorithm performs kernel-machine classification via incremental approximate nearest support vectors. The algorithm implements support-vector machines (SVMs) at speeds 10 to 100 times those attainable by use of conventional SVM algorithms. The algorithm offers potential benefits for classification of images, recognition of speech, recognition of handwriting, and diverse other applications in which there are requirements to discern patterns in large sets of data. SVMs constitute a subset of kernel machines (KMs), which have become popular as models for machine learning and, more specifically, for automated classification of input data on the basis of labeled training data. While similar in many ways to k-nearest-neighbors (k-NN) models and artificial neural networks (ANNs), SVMs tend to be more accurate. Using representations that scale only linearly in the numbers of training examples, while exploring nonlinear (kernelized) feature spaces that are exponentially larger than the original input dimensionality, KMs elegantly and practically overcome the classic curse of dimensionality. However, the price that one must pay for the power of KMs is that query-time complexity scales linearly with the number of training examples, making KMs often orders of magnitude more computationally expensive than are ANNs, decision trees, and other popular machine learning alternatives. The present algorithm treats an SVM classifier as a special form of a k-NN. The algorithm is based partly on an empirical observation that one can often achieve the same classification as that of an exact KM by using only small fraction of the nearest support vectors (SVs) of a query. The exact KM output is a weighted sum over the kernel values between the query and the SVs. In this algorithm, the KM output is approximated with a k-NN classifier, the output of which is a weighted sum only over the kernel values involving k selected SVs. Before query time, there are gathered
A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

PubMed

Rivas, Elena; Lang, Raymond; Eddy, Sean R

2012-02-01

The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.
Realization of the axial next-nearest-neighbor Ising model in U 3 Al 2 Ge 3

DOE PAGES

Fobes, David M.; Lin, Shi-Zeng; Ghimire, Nirmal J.; ...

2017-11-09

Inmore » this paper, we report small-angle neutron scattering (SANS) measurements and theoretical modeling of U 3 Al 2 Ge 3 . Analysis of the SANS data reveals a phase transition to sinusoidally modulated magnetic order at T N = 63 K to be second order and a first-order phase transition to ferromagnetic order at T c = 48 K. Within the sinusoidally modulated magnetic phase (T c < T < T N), we uncover a dramatic change, by a factor of 3, in the ordering wave vector as a function of temperature. Finally, these observations all indicate that U 3 Al 2 Ge 3 is a close realization of the three-dimensional axial next-nearest-neighbor Ising model, a prototypical framework for describing commensurate to incommensurate phase transitions in frustrated magnets.« less
Realization of the axial next-nearest-neighbor Ising model in U 3 Al 2 Ge 3

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fobes, David M.; Lin, Shi-Zeng; Ghimire, Nirmal J.

Inmore » this paper, we report small-angle neutron scattering (SANS) measurements and theoretical modeling of U 3 Al 2 Ge 3 . Analysis of the SANS data reveals a phase transition to sinusoidally modulated magnetic order at T N = 63 K to be second order and a first-order phase transition to ferromagnetic order at T c = 48 K. Within the sinusoidally modulated magnetic phase (T c < T < T N), we uncover a dramatic change, by a factor of 3, in the ordering wave vector as a function of temperature. Finally, these observations all indicate that U 3 Al 2 Ge 3 is a close realization of the three-dimensional axial next-nearest-neighbor Ising model, a prototypical framework for describing commensurate to incommensurate phase transitions in frustrated magnets.« less
Mapping change of older forest with nearest-neighbor imputation and Landsat time-series

Treesearch

Janet L. Ohmann; Matthew J. Gregory; Heather M. Roberts; Warren B. Cohen; Robert E. Kennedy; Zhiqiang Yang

2012-01-01

The Northwest Forest Plan (NWFP), which aims to conserve late-successional and old-growth forests (older forests) and associated species, established new policies on federal lands in the Pacific Northwest USA. As part of monitoring for the NWFP, we tested nearest-neighbor imputation for mapping change in older forest, defined by threshold values for forest attributes...
Terahertz metasurfaces with a high refractive index enhanced by the strong nearest neighbor coupling.

PubMed

Tan, Siyu; Yan, Fengping; Singh, Leena; Cao, Wei; Xu, Ningning; Hu, Xiang; Singh, Ranjan; Wang, Mingwei; Zhang, Weili

2015-11-02

The realization of high refractive index is of significant interest in optical imaging with enhanced resolution. Strongly coupled subwavelength resonators were proposed and demonstrated at both optical and terahertz frequencies to enhance the refractive index due to large induced dipole moment in meta-atoms. Here, we report an alternative design for flexible free-standing terahertz metasurface in the strong coupling regime where we experimentally achieve a peak refractive index value of 14.36. We also investigate the impact of the nearest neighbor coupling in the form of frequency tuning and enhancement of the peak refractive index. We provide an analytical circuit model to explain the impact of geometrical parameters and coupling on the effective refractive index of the metasurface. The proposed meta-atom structure enables tailoring of the peak refractive index based on nearest neighbor coupling and this property offers tremendous design flexibility for transformation optics and other index-gradient devices at terahertz frequencies.
Integrating instance selection, instance weighting, and feature weighting for nearest neighbor classifiers by coevolutionary algorithms.

PubMed

Derrac, Joaquín; Triguero, Isaac; Garcia, Salvador; Herrera, Francisco

2012-10-01

Cooperative coevolution is a successful trend of evolutionary computation which allows us to define partitions of the domain of a given problem, or to integrate several related techniques into one, by the use of evolutionary algorithms. It is possible to apply it to the development of advanced classification methods, which integrate several machine learning techniques into a single proposal. A novel approach integrating instance selection, instance weighting, and feature weighting into the framework of a coevolutionary model is presented in this paper. We compare it with a wide range of evolutionary and nonevolutionary related methods, in order to show the benefits of the employment of coevolution to apply the techniques considered simultaneously. The results obtained, contrasted through nonparametric statistical tests, show that our proposal outperforms other methods in the comparison, thus becoming a suitable tool in the task of enhancing the nearest neighbor classifier.
Phase transitions and thermodynamic properties of antiferromagnetic Ising model with next-nearest-neighbor interactions on the Kagomé lattice

NASA Astrophysics Data System (ADS)

Ramazanov, M. K.; Murtazaev, A. K.; Magomedov, M. A.; Badiev, M. K.

2018-06-01

We study phase transitions and thermodynamic properties in the two-dimensional antiferromagnetic Ising model with next-nearest-neighbor interaction on a Kagomé lattice by Monte Carlo simulations. A histogram data analysis shows that a second-order transition occurs in the model. From the analysis of obtained data, we can assume that next-nearest-neighbor ferromagnetic interactions in two-dimensional antiferromagnetic Ising model on a Kagomé lattice excite the occurrence of a second-order transition and unusual behavior of thermodynamic properties on the temperature dependence.
A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more

PubMed Central

Rivas, Elena; Lang, Raymond; Eddy, Sean R.

2012-01-01

The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308
Moderate-resolution data and gradient nearest neighbor imputation for regional-national risk assessment

Treesearch

Kenneth B. Jr. Pierce; C. Kenneth Brewer; Janet L. Ohmann

2010-01-01

This study was designed to test the feasibility of combining a method designed to populate pixels with inventory plot data at the 30-m scale with a new national predictor data set. The new national predictor data set was developed by the USDA Forest Service Remote Sensing Applications Center (hereafter RSAC) at the 250-m scale. Gradient Nearest Neighbor (GNN)...
Finite element computation on nearest neighbor connected machines

NASA Technical Reports Server (NTRS)

Mcaulay, A. D.

1984-01-01

Research aimed at faster, more cost effective parallel machines and algorithms for improving designer productivity with finite element computations is discussed. A set of 8 boards, containing 4 nearest neighbor connected arrays of commercially available floating point chips and substantial memory, are inserted into a commercially available machine. One-tenth Mflop (64 bit operation) processors provide an 89% efficiency when solving the equations arising in a finite element problem for a single variable regular grid of size 40 by 40 by 40. This is approximately 15 to 20 times faster than a much more expensive machine such as a VAX 11/780 used in double precision. The efficiency falls off as faster or more processors are envisaged because communication times become dominant. A novel successive overrelaxation algorithm which uses cyclic reduction in order to permit data transfer and computation to overlap in time is proposed.
Evaluation of nearest-neighbor methods for detection of chimeric small-subunit rRNA sequences

NASA Technical Reports Server (NTRS)

Robison-Cox, J. F.; Bateson, M. M.; Ward, D. M.

1995-01-01

Detection of chimeric artifacts formed when PCR is used to retrieve naturally occurring small-subunit (SSU) rRNA sequences may rely on demonstrating that different sequence domains have different phylogenetic affiliations. We evaluated the CHECK_CHIMERA method of the Ribosomal Database Project and another method which we developed, both based on determining nearest neighbors of different sequence domains, for their ability to discern artificially generated SSU rRNA chimeras from authentic Ribosomal Database Project sequences. The reliability of both methods decreases when the parental sequences which contribute to chimera formation are more than 82 to 84% similar. Detection is also complicated by the occurrence of authentic SSU rRNA sequences that behave like chimeras. We developed a naive statistical test based on CHECK_CHIMERA output and used it to evaluate previously reported SSU rRNA chimeras. Application of this test also suggests that chimeras might be formed by retrieving SSU rRNAs as cDNA. The amount of uncertainty associated with nearest-neighbor analyses indicates that such tests alone are insufficient and that better methods are needed.
Estimating Stand Height and Tree Density in Pinus taeda plantations using in-situ data, airborne LiDAR and k-Nearest Neighbor Imputation.

PubMed

Silva, Carlos Alberto; Klauberg, Carine; Hudak, Andrew T; Vierling, Lee A; Liesenberg, Veraldo; Bernett, Luiz G; Scheraiber, Clewerson F; Schoeninger, Emerson R

2018-01-01

Accurate forest inventory is of great economic importance to optimize the entire supply chain management in pulp and paper companies. The aim of this study was to estimate stand dominate and mean heights (HD and HM) and tree density (TD) of Pinus taeda plantations located in South Brazil using in-situ measurements, airborne Light Detection and Ranging (LiDAR) data and the non- k-nearest neighbor (k-NN) imputation. Forest inventory attributes and LiDAR derived metrics were calculated at 53 regular sample plots and we used imputation models to retrieve the forest attributes at plot and landscape-levels. The best LiDAR-derived metrics to predict HD, HM and TD were H99TH, HSD, SKE and HMIN. The Imputation model using the selected metrics was more effective for retrieving height than tree density. The model coefficients of determination (adj.R2) and a root mean squared difference (RMSD) for HD, HM and TD were 0.90, 0.94, 0.38m and 6.99, 5.70, 12.92%, respectively. Our results show that LiDAR and k-NN imputation can be used to predict stand heights with high accuracy in Pinus taeda. However, furthers studies need to be realized to improve the accuracy prediction of TD and to evaluate and compare the cost of acquisition and processing of LiDAR data against the conventional inventory procedures.
Spatio-temporal distribution of Oklahoma earthquakes: Exploring relationships using a nearest-neighbor approach: Nearest-neighbor analysis of Oklahoma

DOE PAGES

Vasylkivska, Veronika S.; Huerta, Nicolas J.

2017-06-24

Determining the spatiotemporal characteristics of natural and induced seismic events holds the opportunity to gain new insights into why these events occur. Linking the seismicity characteristics with other geologic, geographic, natural, or anthropogenic factors could help to identify the causes and suggest mitigation strategies that reduce the risk associated with such events. The nearest-neighbor approach utilized in this work represents a practical first step toward identifying statistically correlated clusters of recorded earthquake events. Detailed study of the Oklahoma earthquake catalog’s inherent errors, empirical model parameters, and model assumptions is presented. We found that the cluster analysis results are stable withmore » respect to empirical parameters (e.g., fractal dimension) but were sensitive to epicenter location errors and seismicity rates. Most critically, we show that the patterns in the distribution of earthquake clusters in Oklahoma are primarily defined by spatial relationships between events. This observation is a stark contrast to California (also known for induced seismicity) where a comparable cluster distribution is defined by both spatial and temporal interactions between events. These results highlight the difficulty in understanding the mechanisms and behavior of induced seismicity but provide insights for future work.« less
Spatio-temporal distribution of Oklahoma earthquakes: Exploring relationships using a nearest-neighbor approach: Nearest-neighbor analysis of Oklahoma

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vasylkivska, Veronika S.; Huerta, Nicolas J.

Determining the spatiotemporal characteristics of natural and induced seismic events holds the opportunity to gain new insights into why these events occur. Linking the seismicity characteristics with other geologic, geographic, natural, or anthropogenic factors could help to identify the causes and suggest mitigation strategies that reduce the risk associated with such events. The nearest-neighbor approach utilized in this work represents a practical first step toward identifying statistically correlated clusters of recorded earthquake events. Detailed study of the Oklahoma earthquake catalog’s inherent errors, empirical model parameters, and model assumptions is presented. We found that the cluster analysis results are stable withmore » respect to empirical parameters (e.g., fractal dimension) but were sensitive to epicenter location errors and seismicity rates. Most critically, we show that the patterns in the distribution of earthquake clusters in Oklahoma are primarily defined by spatial relationships between events. This observation is a stark contrast to California (also known for induced seismicity) where a comparable cluster distribution is defined by both spatial and temporal interactions between events. These results highlight the difficulty in understanding the mechanisms and behavior of induced seismicity but provide insights for future work.« less

Quantum phase transitions of the one-dimensional Peierls-Hubbard model with next-nearest-neighbor hopping integrals

NASA Astrophysics Data System (ADS)

Otsuka, Hiromi

1998-06-01

We investigate two kinds of quantum phase transitions observed in the one-dimensional half-filled Peierls-Hubbard model with the next-nearest-neighbor hopping integral in the strong-coupling region U>>t, t' [t (t'), nearest- (next-nearest-) neighbor hopping; U, on-site Coulomb repulsion]. In the uniform case, with the help of the conformal field theory prediction, we numerically determine a phase boundary t'c(U/t) between the spin-fluid and the dimer states, where a bare coupling of the marginal operator vanishes and the low-energy and long-distance behaviors of the spin part are described by a free-boson model. To exhibit the conformal invariance of the systems on the phase boundary, a multiplet structure of the excitation spectrum of finite-size systems and a value of the central charge are also examined. The critical phenomenological aspect of the spin-Peierls transitions accompanied by the lattice dimerization is then argued for the systems on the phase boundary; the existence of logarithmic corrections to the power-law behaviors of the energy gain and the spin gap (i.e., the Cross-Fisher scaling law) are discussed.
Efficient Fingercode Classification

NASA Astrophysics Data System (ADS)

Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang

In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.
Enhanced Approximate Nearest Neighbor via Local Area Focused Search.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gonzales, Antonio; Blazier, Nicholas Paul

Approximate Nearest Neighbor (ANN) algorithms are increasingly important in machine learning, data mining, and image processing applications. There is a large family of space- partitioning ANN algorithms, such as randomized KD-Trees, that work well in practice but are limited by an exponential increase in similarity comparisons required to optimize recall. Additionally, they only support a small set of similarity metrics. We present Local Area Fo- cused Search (LAFS), a method that enhances the way queries are performed using an existing ANN index. Instead of a single query, LAFS performs a number of smaller (fewer similarity comparisons) queries and focuses onmore » a local neighborhood which is refined as candidates are identified. We show that our technique improves performance on several well known datasets and is easily extended to general similarity metrics using kernel projection techniques.« less
Rapid and Robust Cross-Correlation-Based Seismic Signal Identification Using an Approximate Nearest Neighbor Method

DOE PAGES

Tibi, Rigobert; Young, Christopher; Gonzales, Antonio; ...

2017-07-04

The matched filtering technique that uses the cross correlation of a waveform of interest with archived signals from a template library has proven to be a powerful tool for detecting events in regions with repeating seismicity. However, waveform correlation is computationally expensive and therefore impractical for large template sets unless dedicated distributed computing hardware and software are used. In this paper, we introduce an approximate nearest neighbor (ANN) approach that enables the use of very large template libraries for waveform correlation. Our method begins with a projection into a reduced dimensionality space, based on correlation with a randomized subset ofmore » the full template archive. Searching for a specified number of nearest neighbors for a query waveform is accomplished by iteratively comparing it with the neighbors of its immediate neighbors. We used the approach to search for matches to each of ~2300 analyst-reviewed signal detections reported in May 2010 for the International Monitoring System station MKAR. The template library in this case consists of a data set of more than 200,000 analyst-reviewed signal detections for the same station from February 2002 to July 2016 (excluding May 2010). Of these signal detections, 73% are teleseismic first P and 17% regional phases (Pn, Pg, Sn, and Lg). Finally, the analyses performed on a standard desktop computer show that the proposed ANN approach performs a search of the large template libraries about 25 times faster than the standard full linear search and achieves recall rates greater than 80%, with the recall rate increasing for higher correlation thresholds.« less
Rapid and Robust Cross-Correlation-Based Seismic Signal Identification Using an Approximate Nearest Neighbor Method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tibi, Rigobert; Young, Christopher; Gonzales, Antonio

The matched filtering technique that uses the cross correlation of a waveform of interest with archived signals from a template library has proven to be a powerful tool for detecting events in regions with repeating seismicity. However, waveform correlation is computationally expensive and therefore impractical for large template sets unless dedicated distributed computing hardware and software are used. In this paper, we introduce an approximate nearest neighbor (ANN) approach that enables the use of very large template libraries for waveform correlation. Our method begins with a projection into a reduced dimensionality space, based on correlation with a randomized subset ofmore » the full template archive. Searching for a specified number of nearest neighbors for a query waveform is accomplished by iteratively comparing it with the neighbors of its immediate neighbors. We used the approach to search for matches to each of ~2300 analyst-reviewed signal detections reported in May 2010 for the International Monitoring System station MKAR. The template library in this case consists of a data set of more than 200,000 analyst-reviewed signal detections for the same station from February 2002 to July 2016 (excluding May 2010). Of these signal detections, 73% are teleseismic first P and 17% regional phases (Pn, Pg, Sn, and Lg). Finally, the analyses performed on a standard desktop computer show that the proposed ANN approach performs a search of the large template libraries about 25 times faster than the standard full linear search and achieves recall rates greater than 80%, with the recall rate increasing for higher correlation thresholds.« less
Rapid and Robust Cross-Correlation-Based Seismic Phase Identification Using an Approximate Nearest Neighbor Method

NASA Astrophysics Data System (ADS)

Tibi, R.; Young, C. J.; Gonzales, A.; Ballard, S.; Encarnacao, A. V.

2016-12-01

The matched filtering technique involving the cross-correlation of a waveform of interest with archived signals from a template library has proven to be a powerful tool for detecting events in regions with repeating seismicity. However, waveform correlation is computationally expensive, and therefore impractical for large template sets unless dedicated distributed computing hardware and software are used. In this study, we introduce an Approximate Nearest Neighbor (ANN) approach that enables the use of very large template libraries for waveform correlation without requiring a complex distributed computing system. Our method begins with a projection into a reduced dimensionality space based on correlation with a randomized subset of the full template archive. Searching for a specified number of nearest neighbors is accomplished by using randomized K-dimensional trees. We used the approach to search for matches to each of 2700 analyst-reviewed signal detections reported for May 2010 for the IMS station MKAR. The template library in this case consists of a dataset of more than 200,000 analyst-reviewed signal detections for the same station from 2002-2014 (excluding May 2010). Of these signal detections, 60% are teleseismic first P, and 15% regional phases (Pn, Pg, Sn, and Lg). The analyses performed on a standard desktop computer shows that the proposed approach performs the search of the large template libraries about 20 times faster than the standard full linear search, while achieving recall rates greater than 80%, with the recall rate increasing for higher correlation values. To decide whether to confirm a match, we use a hybrid method involving a cluster approach for queries with two or more matches, and correlation score for single matches. Of the signal detections that passed our confirmation process, 52% were teleseismic first P, and 30% were regional phases.
Weak doping dependence of the antiferromagnetic coupling between nearest-neighbor Mn2 + spins in (Ba1 -xKx) (Zn1-yMny) 2As2

NASA Astrophysics Data System (ADS)

Surmach, M. A.; Chen, B. J.; Deng, Z.; Jin, C. Q.; Glasbrenner, J. K.; Mazin, I. I.; Ivanov, A.; Inosov, D. S.

2018-03-01

Dilute magnetic semiconductors (DMS) are nonmagnetic semiconductors doped with magnetic transition metals. The recently discovered DMS material (Ba1 -xKx) (Zn1-yMny) 2As2 offers a unique and versatile control of the Curie temperature TC by decoupling the spin (Mn2 +, S =5 /2 ) and charge (K+) doping in different crystallographic layers. In an attempt to describe from first-principles calculations the role of hole doping in stabilizing ferromagnetic order, it was recently suggested that the antiferromagnetic exchange coupling J between the nearest-neighbor Mn ions would experience a nearly twofold suppression upon doping 20% of holes by potassium substitution. At the same time, further-neighbor interactions become increasingly ferromagnetic upon doping, leading to a rapid increase of TC. Using inelastic neutron scattering, we have observed a localized magnetic excitation at about 13 meV associated with the destruction of the nearest-neighbor Mn-Mn singlet ground state. Hole doping results in a notable broadening of this peak, evidencing significant particle-hole damping, but with only a minor change in the peak position. We argue that this unexpected result can be explained by a combined effect of superexchange and double-exchange interactions.
ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data

PubMed Central

McKinney, Brett A.; White, Bill C.; Grill, Diane E.; Li, Peter W.; Kennedy, Richard B.; Poland, Gregory A.; Oberg, Ann L.

2013-01-01

Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main
Nearest neighbor 3D segmentation with context features

NASA Astrophysics Data System (ADS)

Hristova, Evelin; Schulz, Heinrich; Brosch, Tom; Heinrich, Mattias P.; Nickisch, Hannes

2018-03-01

Automated and fast multi-label segmentation of medical images is challenging and clinically important. This paper builds upon a supervised machine learning framework that uses training data sets with dense organ annotations and vantage point trees to classify voxels in unseen images based on similarity of binary feature vectors extracted from the data. Without explicit model knowledge, the algorithm is applicable to different modalities and organs, and achieves high accuracy. The method is successfully tested on 70 abdominal CT and 42 pelvic MR images. With respect to ground truth, an average Dice overlap score of 0.76 for the CT segmentation of liver, spleen and kidneys is achieved. The mean score for the MR delineation of bladder, bones, prostate and rectum is 0.65. Additionally, we benchmark several variations of the main components of the method and reduce the computation time by up to 47% without significant loss of accuracy. The segmentation results are - for a nearest neighbor method - surprisingly accurate, robust as well as data and time efficient.
False-nearest-neighbors algorithm and noise-corrupted time series

NASA Astrophysics Data System (ADS)

Rhodes, Carl; Morari, Manfred

1997-05-01

The false-nearest-neighbors (FNN) algorithm was originally developed to determine the embedding dimension for autonomous time series. For noise-free computer-generated time series, the algorithm does a good job in predicting the embedding dimension. However, the problem of predicting the embedding dimension when the time-series data are corrupted by noise was not fully examined in the original studies of the FNN algorithm. Here it is shown that with large data sets, even small amounts of noise can lead to incorrect prediction of the embedding dimension. Surprisingly, as the length of the time series analyzed by FNN grows larger, the cause of incorrect prediction becomes more pronounced. An analysis of the effect of noise on the FNN algorithm and a solution for dealing with the effects of noise are given here. Some results on the theoretically correct choice of the FNN threshold are also presented.
Control of coherence among the spins of a single electron and the three nearest neighbor {sup 13}C nuclei of a nitrogen-vacancy center in diamond

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shimo-Oka, T.; Miwa, S.; Suzuki, Y.

2015-04-13

Individual nuclear spins in diamond can be optically detected through hyperfine couplings with the electron spin of a single nitrogen-vacancy (NV) center; such nuclear spins have outstandingly long coherence times. Among the hyperfine couplings in the NV center, the nearest neighbor {sup 13}C nuclear spins have the largest coupling strength. Nearest neighbor {sup 13}C nuclear spins have the potential to perform fastest gate operations, providing highest fidelity in quantum computing. Herein, we report on the control of coherences in the NV center where all three nearest neighbor carbons are of the {sup 13}C isotope. Coherence among the three and fourmore » qubits are generated and analyzed at room temperature.« less
Heterogeneity and nearest-neighbor coupling can explain small-worldness and wave properties in pancreatic islets

NASA Astrophysics Data System (ADS)

Cappon, Giacomo; Pedersen, Morten Gram

2016-05-01

Many multicellular systems consist of coupled cells that work as a syncytium. The pancreatic islet of Langerhans is a well-studied example of such a microorgan. The islets are responsible for secretion of glucose-regulating hormones, mainly glucagon and insulin, which are released in distinct pulses. In order to observe pulsatile insulin secretion from the β-cells within the islets, the cellular responses must be synchronized. It is now well established that gap junctions provide the electrical nearest-neighbor coupling that allows excitation waves to spread across islets to synchronize the β-cell population. Surprisingly, functional coupling analysis of calcium responses in β-cells shows small-world properties, i.e., a high degree of local coupling with a few long-range "short-cut" connections that reduce the average path-length greatly. Here, we investigate how such long-range functional coupling can appear as a result of heterogeneity, nearest-neighbor coupling, and wave propagation. Heterogeneity is also able to explain a set of experimentally observed synchronization and wave properties without introducing all-or-none cell coupling and percolation theory. Our theoretical results highlight how local biological coupling can give rise to functional small-world properties via heterogeneity and wave propagation.
Metric learning for automatic sleep stage classification.

PubMed

Phan, Huy; Do, Quan; Do, The-Luan; Vu, Duc-Lung

2013-01-01

We introduce in this paper a metric learning approach for automatic sleep stage classification based on single-channel EEG data. We show that learning a global metric from training data instead of using the default Euclidean metric, the k-nearest neighbor classification rule outperforms state-of-the-art methods on Sleep-EDF dataset with various classification settings. The overall accuracy for Awake/Sleep and 4-class classification setting are 98.32% and 94.49% respectively. Furthermore, the superior accuracy is achieved by performing classification on a low-dimensional feature space derived from time and frequency domains and without the need for artifact removal as a preprocessing step.
The roles of nearest neighbor methods in imputing missing data in forest inventory and monitoring databases

Treesearch

Bianca N. I. Eskelson; Hailemariam Temesgen; Valerie Lemay; Tara M. Barrett; Nicholas L. Crookston; Andrew T. Hudak

2009-01-01

Almost universally, forest inventory and monitoring databases are incomplete, ranging from missing data for only a few records and a few variables, common for small land areas, to missing data for many observations and many variables, common for large land areas. For a wide variety of applications, nearest neighbor (NN) imputation methods have been developed to fill in...
Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

Treesearch

Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

2009-01-01

Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....
Kinetic Models for Topological Nearest-Neighbor Interactions

NASA Astrophysics Data System (ADS)

Blanchet, Adrien; Degond, Pierre

2017-12-01

We consider systems of agents interacting through topological interactions. These have been shown to play an important part in animal and human behavior. Precisely, the system consists of a finite number of particles characterized by their positions and velocities. At random times a randomly chosen particle, the follower, adopts the velocity of its closest neighbor, the leader. We study the limit of a system size going to infinity and, under the assumption of propagation of chaos, show that the limit kinetic equation is a non-standard spatial diffusion equation for the particle distribution function. We also study the case wherein the particles interact with their K closest neighbors and show that the corresponding kinetic equation is the same. Finally, we prove that these models can be seen as a singular limit of the smooth rank-based model previously studied in Blanchet and Degond (J Stat Phys 163:41-60, 2016). The proofs are based on a combinatorial interpretation of the rank as well as some concentration of measure arguments.
nth-Nearest-neighbor distribution functions of an interacting fluid from the pair correlation function: a hierarchical approach.

PubMed

Bhattacharjee, Biplab

2003-04-01

The paper presents a general formalism for the nth-nearest-neighbor distribution (NND) of identical interacting particles in a fluid confined in a nu-dimensional space. The nth-NND functions, W(n,r) (for n=1,2,3, em leader) in a fluid are obtained hierarchically in terms of the pair correlation function and W(n-1,r) alone. The radial distribution function (RDF) profiles obtained from the molecular dynamics (MD) simulation of Lennard-Jones (LJ) fluid is used to illustrate the results. It is demonstrated that the collective structural information contained in the maxima and minima of the RDF profiles being resolved in terms of individual NND functions may provide more insights about the microscopic neighborhood structure around a reference particle in a fluid. Representative comparison between the results obtained from the formalism and the MD simulation data shows good agreement. Apart from the quantities such as nth-NND functions and nth-nearest-neighbor distances, the average neighbor population number is defined. These quantities are evaluated for the LJ model system and interesting density dependence of the microscopic neighborhood shell structures are discussed in terms of them. The relevance of the NND functions in various phenomena is also pointed out.
nth-nearest-neighbor distribution functions of an interacting fluid from the pair correlation function: A hierarchical approach

NASA Astrophysics Data System (ADS)

Bhattacharjee, Biplab

2003-04-01

The paper presents a general formalism for the nth-nearest-neighbor distribution (NND) of identical interacting particles in a fluid confined in a ν-dimensional space. The nth-NND functions, W(n,r¯) (for n=1,2,3,…) in a fluid are obtained hierarchically in terms of the pair correlation function and W(n-1,r¯) alone. The radial distribution function (RDF) profiles obtained from the molecular dynamics (MD) simulation of Lennard-Jones (LJ) fluid is used to illustrate the results. It is demonstrated that the collective structural information contained in the maxima and minima of the RDF profiles being resolved in terms of individual NND functions may provide more insights about the microscopic neighborhood structure around a reference particle in a fluid. Representative comparison between the results obtained from the formalism and the MD simulation data shows good agreement. Apart from the quantities such as nth-NND functions and nth-nearest-neighbor distances, the average neighbor population number is defined. These quantities are evaluated for the LJ model system and interesting density dependence of the microscopic neighborhood shell structures are discussed in terms of them. The relevance of the NND functions in various phenomena is also pointed out.
A Novel Quantum Solution to Privacy-Preserving Nearest Neighbor Query in Location-Based Services

NASA Astrophysics Data System (ADS)

Luo, Zhen-yu; Shi, Run-hua; Xu, Min; Zhang, Shun

2018-04-01

We present a cheating-sensitive quantum protocol for Privacy-Preserving Nearest Neighbor Query based on Oblivious Quantum Key Distribution and Quantum Encryption. Compared with the classical related protocols, our proposed protocol has higher security, because the security of our protocol is based on basic physical principles of quantum mechanics, instead of difficulty assumptions. Especially, our protocol takes single photons as quantum resources and only needs to perform single-photon projective measurement. Therefore, it is feasible to implement this protocol with the present technologies.
Second-Nearest-Neighbor Effects upon N NMR Shieldings in Models for Solid Si 3N 4and C 3N 4

NASA Astrophysics Data System (ADS)

Tossell, J. A.

1997-07-01

NMR shifts are generally determined mainly by the nearest-neighbor environment of an atom, with fairly small changes in the shift arising from differences in the second-nearest-neighbor environment. Previous calculations on the (SiH3)3N molecule used as a model for the local environment of N in crystalline α- and β-Si3N4gave N NMR shieldings much larger than those measured in the solids and gave the wrong order for the shifts of the inequivalent N sites (e.g., N1 and N2 in β-Si3N4). We have now calculated the N NMR shieldings in larger molecular models for the N2 site of β-Si3N4and have found that the N2 shielding is greatly reduced when additional N1 atoms (second-nearest-neighbors to the central N2) are included. The calculated N2 shieldings (using the GIAO method with the 6-31G* basis set and 6-31G* SCF optimized geometries) are 288.1, 244.7, and 206.0 ppm for the molecules (SiH3)3N, Si6N5H15, and Si9N9H21(central N2), respectively, while the experimental shielding of N2 in β-Si3N4is about 155 ppm. Second-nearest-neighbor effects of only slightly smaller magnitude are calculated for the analog C molecules. At the same time, the effects of molecule size upon Si NMR shieldings and N electric field gradients are small. The local geometries at the N2-like Ns in C6N5H15and C9N9H21are calculated to be planar, consistent with the planar local geometry recently calculated for N in crystalline C3N4using density functional theory.

d -wave superconductivity in the presence of nearest-neighbor Coulomb repulsion

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jiang, M.; Hahner, U. R.; Schulthess, T. C.

Dynamic cluster quantum Monte Carlo calculations for a doped two-dimensional extended Hubbard model are used to study the stability and dynamics of d-wave pairing when a nearest-neighbor Coulomb repulsion V is present in addition to the on-site Coulomb repulsion U. We find that d-wave pairing and the superconducting transition temperature Tc are only weakly suppressed as long as V does not exceed U/2. This stability is traced to the strongly retarded nature of pairing that allows the d-wave pairs to minimize the repulsive effect of V. When V approaches U/2, large momentum charge fluctuations are found to become important andmore » to give rise to a more rapid suppression of d-wave pairing and T c than for smaller V.« less
Comparative Performance Analysis of Support Vector Machine, Random Forest, Logistic Regression and k-Nearest Neighbours in Rainbow Trout (Oncorhynchus Mykiss) Classification Using Image-Based Features

PubMed Central

Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry

2018-01-01

The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout (Oncorhynchus mykiss) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k-Nearest neighbours (k-NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k-NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet’s effects on fish skin. PMID:29596375
Comparative Performance Analysis of Support Vector Machine, Random Forest, Logistic Regression and k-Nearest Neighbours in Rainbow Trout (Oncorhynchus Mykiss) Classification Using Image-Based Features.

PubMed

Saberioon, Mohammadmehdi; Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry

2018-03-29

The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout ( Oncorhynchus mykiss ) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k -Nearest neighbours ( k -NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k -NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet's effects on fish skin.
Hash Bit Selection for Nearest Neighbor Search.

PubMed

Xianglong Liu; Junfeng He; Shih-Fu Chang

2017-11-01

To overcome the barrier of storage and computation when dealing with gigantic-scale data sets, compact hashing has been studied extensively to approximate the nearest neighbor search. Despite the recent advances, critical design issues remain open in how to select the right features, hashing algorithms, and/or parameter settings. In this paper, we address these by posing an optimal hash bit selection problem, in which an optimal subset of hash bits are selected from a pool of candidate bits generated by different features, algorithms, or parameters. Inspired by the optimization criteria used in existing hashing algorithms, we adopt the bit reliability and their complementarity as the selection criteria that can be carefully tailored for hashing performance in different tasks. Then, the bit selection solution is discovered by finding the best tradeoff between search accuracy and time using a modified dynamic programming method. To further reduce the computational complexity, we employ the pairwise relationship among hash bits to approximate the high-order independence property, and formulate it as an efficient quadratic programming method that is theoretically equivalent to the normalized dominant set problem in a vertex- and edge-weighted graph. Extensive large-scale experiments have been conducted under several important application scenarios of hash techniques, where our bit selection framework can achieve superior performance over both the naive selection methods and the state-of-the-art hashing algorithms, with significant accuracy gains ranging from 10% to 50%, relatively.
Weighted Parzen Windows for Pattern Classification

DTIC Science & Technology

1994-05-01

Nearest-Neighbor Rule The k-Nearest-Neighbor ( kNN ) technique is nonparametric, assuming nothing about the distribution of the data. Stated succinctly...probabilities P(wj I x) from samples." Raudys and Jain [20:255] advance this interpretation by pointing out that the kNN technique can be viewed as the...34Parzen window classifier with a hyper- rectangular window function." As with the Parzen-window technique, the kNN classifier is more accurate as the
Effect of nearest-neighbor ions on excited ionic states, emission spectra, and line profiles in hot and dense plasmas

NASA Technical Reports Server (NTRS)

Salzmann, D.; Stein, J.; Goldberg, I. B.; Pratt, R. H.

1991-01-01

The effect of the cylindrical symmetry imposed by the nearest-neighbor ions on the ionic levels and the emission spectra of a Li-like Kr ion immersed in hot and dense plasmas is investigated using the Stein et al. (1989) two-centered model extended to include computations of the line profiles, shifts, and widths, as well as the energy-level mixing and the forbidden transition probabilities. It is shown that the cylindrical symmetry mixes states with different orbital quantum numbers l, particularly for highly excited states, and, thereby, gives rise to forbidden transitions in the emission spectrum. Results are obtained for the variation of the ionic level shifts and mixing coefficients with the distance to the nearest neighbor. Also obtained are representative computed spectra that show the density effects on the spectral line profiles, shifts, and widths, and the forbidden components in the spectrum.
Efficient computation of k-Nearest Neighbour Graphs for large high-dimensional data sets on GPU clusters.

PubMed

Dashti, Ali; Komarov, Ivan; D'Souza, Roshan M

2013-01-01

This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible [Formula: see text]-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.
Study of parameters of the nearest neighbour shared algorithm on clustering documents

NASA Astrophysics Data System (ADS)

Mustika Rukmi, Alvida; Budi Utomo, Daryono; Imro’atus Sholikhah, Neni

2018-03-01

Document clustering is one way of automatically managing documents, extracting of document topics and fastly filtering information. Preprocess of clustering documents processed by textmining consists of: keyword extraction using Rapid Automatic Keyphrase Extraction (RAKE) and making the document as concept vector using Latent Semantic Analysis (LSA). Furthermore, the clustering process is done so that the documents with the similarity of the topic are in the same cluster, based on the preprocesing by textmining performed. Shared Nearest Neighbour (SNN) algorithm is a clustering method based on the number of "nearest neighbors" shared. The parameters in the SNN Algorithm consist of: k nearest neighbor documents, ɛ shared nearest neighbor documents and MinT minimum number of similar documents, which can form a cluster. Characteristics The SNN algorithm is based on shared ‘neighbor’ properties. Each cluster is formed by keywords that are shared by the documents. SNN algorithm allows a cluster can be built more than one keyword, if the value of the frequency of appearing keywords in document is also high. Determination of parameter values on SNN algorithm affects document clustering results. The higher parameter value k, will increase the number of neighbor documents from each document, cause similarity of neighboring documents are lower. The accuracy of each cluster is also low. The higher parameter value ε, caused each document catch only neighbor documents that have a high similarity to build a cluster. It also causes more unclassified documents (noise). The higher the MinT parameter value cause the number of clusters will decrease, since the number of similar documents can not form clusters if less than MinT. Parameter in the SNN Algorithm determine performance of clustering result and the amount of noise (unclustered documents ). The Silhouette coeffisient shows almost the same result in many experiments, above 0.9, which means that SNN algorithm works well
Comparing Forest/Nonforest Classifications of Landsat TM Imagery for Stratifying FIA Estimates of Forest Land Area

Treesearch

Mark D. Nelson; Ronald E. McRoberts; Greg C. Liknes; Geoffrey R. Holden

2005-01-01

Landsat Thematic Mapper (TM) satellite imagery and Forest Inventory and Analysis (FIA) plot data were used to construct forest/nonforest maps of Mapping Zone 41, National Land Cover Dataset 2000 (NLCD 2000). Stratification approaches resulting from Maximum Likelihood, Fuzzy Convolution, Logistic Regression, and k-Nearest Neighbors classification/prediction methods were...
An improved coupled-states approximation including the nearest neighbor Coriolis couplings for diatom-diatom inelastic collision

NASA Astrophysics Data System (ADS)

Yang, Dongzheng; Hu, Xixi; Zhang, Dong H.; Xie, Daiqian

2018-02-01

Solving the time-independent close coupling equations of a diatom-diatom inelastic collision system by using the rigorous close-coupling approach is numerically difficult because of its expensive matrix manipulation. The coupled-states approximation decouples the centrifugal matrix by neglecting the important Coriolis couplings completely. In this work, a new approximation method based on the coupled-states approximation is presented and applied to time-independent quantum dynamic calculations. This approach only considers the most important Coriolis coupling with the nearest neighbors and ignores weaker Coriolis couplings with farther K channels. As a result, it reduces the computational costs without a significant loss of accuracy. Numerical tests for para-H2+ortho-H2 and para-H2+HD inelastic collision were carried out and the results showed that the improved method dramatically reduces the errors due to the neglect of the Coriolis couplings in the coupled-states approximation. This strategy should be useful in quantum dynamics of other systems.
Mapping wildland fuels and forest structure for land management: a comparison of nearest neighbor imputation and other methods

Treesearch

Kenneth B. Pierce; Janet L. Ohmann; Michael C. Wimberly; Matthew J. Gregory; Jeremy S. Fried

2009-01-01

Land managers need consistent information about the geographic distribution of wildland fuels and forest structure over large areas to evaluate fire risk and plan fuel treatments. We compared spatial predictions for 12 fuel and forest structure variables across three regions in the western United States using gradient nearest neighbor (GNN) imputation, linear models (...
A systematic molecular dynamics study of nearest-neighbor effects on base pair and base pair step conformations and fluctuations in B-DNA

PubMed Central

Lavery, Richard; Zakrzewska, Krystyna; Beveridge, David; Bishop, Thomas C.; Case, David A.; Cheatham, Thomas; Dixit, Surjit; Jayaram, B.; Lankas, Filip; Laughton, Charles; Maddocks, John H.; Michon, Alexis; Osman, Roman; Orozco, Modesto; Perez, Alberto; Singh, Tanya; Spackova, Nada; Sponer, Jiri

2010-01-01

It is well recognized that base sequence exerts a significant influence on the properties of DNA and plays a significant role in protein–DNA interactions vital for cellular processes. Understanding and predicting base sequence effects requires an extensive structural and dynamic dataset which is currently unavailable from experiment. A consortium of laboratories was consequently formed to obtain this information using molecular simulations. This article describes results providing information not only on all 10 unique base pair steps, but also on all possible nearest-neighbor effects on these steps. These results are derived from simulations of 50–100 ns on 39 different DNA oligomers in explicit solvent and using a physiological salt concentration. We demonstrate that the simulations are converged in terms of helical and backbone parameters. The results show that nearest-neighbor effects on base pair steps are very significant, implying that dinucleotide models are insufficient for predicting sequence-dependent behavior. Flanking base sequences can notably lead to base pair step parameters in dynamic equilibrium between two conformational sub-states. Although this study only provides limited data on next-nearest-neighbor effects, we suggest that such effects should be analyzed before attempting to predict the sequence-dependent behavior of DNA. PMID:19850719
Classification of multispectral image data by the Binary Diamond neural network and by nonparametric, pixel-by-pixel methods

NASA Technical Reports Server (NTRS)

Salu, Yehuda; Tilton, James

1993-01-01

The classification of multispectral image data obtained from satellites has become an important tool for generating ground cover maps. This study deals with the application of nonparametric pixel-by-pixel classification methods in the classification of pixels, based on their multispectral data. A new neural network, the Binary Diamond, is introduced, and its performance is compared with a nearest neighbor algorithm and a back-propagation network. The Binary Diamond is a multilayer, feed-forward neural network, which learns from examples in unsupervised, 'one-shot' mode. It recruits its neurons according to the actual training set, as it learns. The comparisons of the algorithms were done by using a realistic data base, consisting of approximately 90,000 Landsat 4 Thematic Mapper pixels. The Binary Diamond and the nearest neighbor performances were close, with some advantages to the Binary Diamond. The performance of the back-propagation network lagged behind. An efficient nearest neighbor algorithm, the binned nearest neighbor, is described. Ways for improving the performances, such as merging categories, and analyzing nonboundary pixels, are addressed and evaluated.
Neural Network and Nearest Neighbor Algorithms for Enhancing Sampling of Molecular Dynamics.

PubMed

Galvelis, Raimondas; Sugita, Yuji

2017-06-13

The free energy calculations of complex chemical and biological systems with molecular dynamics (MD) are inefficient due to multiple local minima separated by high-energy barriers. The minima can be escaped using an enhanced sampling method such as metadynamics, which apply bias (i.e., importance sampling) along a set of collective variables (CV), but the maximum number of CVs (or dimensions) is severely limited. We propose a high-dimensional bias potential method (NN2B) based on two machine learning algorithms: the nearest neighbor density estimator (NNDE) and the artificial neural network (ANN) for the bias potential approximation. The bias potential is constructed iteratively from short biased MD simulations accounting for correlation among CVs. Our method is capable of achieving ergodic sampling and calculating free energy of polypeptides with up to 8-dimensional bias potential.
Ground state of a Heisenberg chain with next-nearest-neighbor bond alternation

NASA Astrophysics Data System (ADS)

Capriotti, Luca; Becca, Federico; Sorella, Sandro; Parola, Alberto

2003-05-01

We investigate the ground-state properties of the spin-half J1-J2 Heisenberg chain with a next-nearest-neighbor spin-Peierls dimerization using conformal field theory and Lanczos exact diagonalizations. In agreement with the results of a recent bosonization analysis by Sarkar and Sen [Phys. Rev. B 65, 172408 (2002)], we find that for small frustration (J2/J1) the system is in a Luttinger spin-fluid phase, with gapless excitations, and a finite spin-wave velocity. In the regime of strong frustration the ground state is spontaneously dimerized and the bond alternation reduces the triplet gap, leading to a slight enhancement of the critical point separating the Luttinger phase from the gapped one. An accurate determination of the phase boundary is obtained numerically from the study of the excitation spectrum.
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets.

PubMed

Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O; Gelfand, Alan E

2016-01-01

Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online.
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets

PubMed Central

Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O.; Gelfand, Alan E.

2018-01-01

Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online. PMID:29720777
The Use of Fuzzy Set Classification for Pattern Recognition of the Polygraph

DTIC Science & Technology

1993-12-01

actual feature extraction was done, It was decided to use the K-nearest neighbor ( KNN ) the data was preprocessed. The electrocardiogram classifier in...showing heart pulse, and a low frequency not known beforehand, and the KNN classifier does not component showing blood volume. The derivative of...the characteristics of the conventional KNN these six derived signals were detrended and filtered, classification method is that it assigns each
Evaluation of normalization methods for cDNA microarray data by k-NN classification

PubMed Central

Wu, Wei; Xing, Eric P; Myers, Connie; Mian, I Saira; Bissell, Mina J

2005-01-01

Background Non-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification. Results Ten location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using NONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatial-dependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error. Conclusion Using LOOCV error of k-NNs as the evaluation criterion, three
Evaluation of normalization methods for cDNA microarray data by k-NN classification.

PubMed

Wu, Wei; Xing, Eric P; Myers, Connie; Mian, I Saira; Bissell, Mina J

2005-07-26

Non-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification. Ten location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using NONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatial-dependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error. Using LOOCV error of k-NNs as the evaluation criterion, three double

Comparison of four approaches to a rock facies classification problem

USGS Publications Warehouse

Dubois, M.K.; Bohling, Geoffrey C.; Chakrabarti, S.

2007-01-01

In this study, seven classifiers based on four different approaches were tested in a rock facies classification problem: classical parametric methods using Bayes' rule, and non-parametric methods using fuzzy logic, k-nearest neighbor, and feed forward-back propagating artificial neural network. Determining the most effective classifier for geologic facies prediction in wells without cores in the Panoma gas field, in Southwest Kansas, was the objective. Study data include 3600 samples with known rock facies class (from core) with each sample having either four or five measured properties (wire-line log curves), and two derived geologic properties (geologic constraining variables). The sample set was divided into two subsets, one for training and one for testing the ability of the trained classifier to correctly assign classes. Artificial neural networks clearly outperformed all other classifiers and are effective tools for this particular classification problem. Classical parametric models were inadequate due to the nature of the predictor variables (high dimensional and not linearly correlated), and feature space of the classes (overlapping). The other non-parametric methods tested, k-nearest neighbor and fuzzy logic, would need considerable improvement to match the neural network effectiveness, but further work, possibly combining certain aspects of the three non-parametric methods, may be justified. ?? 2006 Elsevier Ltd. All rights reserved.
Analysis of miRNA expression profile based on SVM algorithm

NASA Astrophysics Data System (ADS)

Ting-ting, Dai; Chang-ji, Shan; Yan-shou, Dong; Yi-duo, Bian

2018-05-01

Based on mirna expression spectrum data set, a new data mining algorithm - tSVM - KNN (t statistic with support vector machine - k nearest neighbor) is proposed. the idea of the algorithm is: firstly, the feature selection of the data set is carried out by the unified measurement method; Secondly, SVM - KNN algorithm, which combines support vector machine (SVM) and k - nearest neighbor (k - nearest neighbor) is used as classifier. Simulation results show that SVM - KNN algorithm has better classification ability than SVM and KNN alone. Tsvm - KNN algorithm only needs 5 mirnas to obtain 96.08 % classification accuracy in terms of the number of mirna " tags" and recognition accuracy. compared with similar algorithms, tsvm - KNN algorithm has obvious advantages.
Nearest-Neighbor Distances and Aggregative Effects in Turbulence

NASA Astrophysics Data System (ADS)

Lanerolle, Lyon W. J.; Rothschild, B. J.; Yeung, P. K.

2000-11-01

The dispersive nature of turbulence which causes fluid elements to move apart (on average) is well known. Here we study another facet of turbulent mixing relevant to marine population dynamics - on how small organisms (approximated by fluid particles) are brought close to each other and allowed to interact. The crucial role played by the small scales in this process allows us to use direct numerical simulations of stationary isotropic turbulence, here with Taylor-scale Reynolds numbers (R_λ) from 38 to 91. We study the evolution of the Nearest-Neighbor Distances (NND) for collections of fluid particles initially located randomly in space satisfying Poisson-type distributions with mean values from 0.5 to 2.0 Kolmogorov length scales. Our results show that as particles begin to disperse on average, some also begin to aggregate in space. In particular, we find that (i) a significant proportion of particles are closer to each other than if their NNDs were randomly distributed, (ii) aggregative effects become stronger with R_λ, and (iii) although the mean value of NND grows monotonically with time in Kolmogorov variables, the growth rates are slower at higher R_λ. These results may assist in explaining the ``patchiness'' in plankton distributions observed in biological oceanography. Further details are given in B. J. Rothschild et al., The Biophysical Interpretation of Spatial Effects of Small-scale Turbulent Flow in the Ocean (paper in prep.).
Nearest clusters based partial least squares discriminant analysis for the classification of spectral data.

PubMed

Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar

2018-06-07

Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time. Copyright © 2018 Elsevier B.V. All rights reserved.
Local Subspace Classifier with Transform-Invariance for Image Classification

NASA Astrophysics Data System (ADS)

Hotta, Seiji

A family of linear subspace classifiers called local subspace classifier (LSC) outperforms the k-nearest neighbor rule (kNN) and conventional subspace classifiers in handwritten digit classification. However, LSC suffers very high sensitivity to image transformations because it uses projection and the Euclidean distances for classification. In this paper, I present a combination of a local subspace classifier (LSC) and a tangent distance (TD) for improving accuracy of handwritten digit recognition. In this classification rule, we can deal with transform-invariance easily because we are able to use tangent vectors for approximation of transformations. However, we cannot use tangent vectors in other type of images such as color images. Hence, kernel LSC (KLSC) is proposed for incorporating transform-invariance into LSC via kernel mapping. The performance of the proposed methods is verified with the experiments on handwritten digit and color image classification.
GPU based cloud system for high-performance arrhythmia detection with parallel k-NN algorithm.

PubMed

Tae Joon Jun; Hyun Ji Park; Hyuk Yoo; Young-Hak Kim; Daeyoung Kim

2016-08-01

In this paper, we propose an GPU based Cloud system for high-performance arrhythmia detection. Pan-Tompkins algorithm is used for QRS detection and we optimized beat classification algorithm with K-Nearest Neighbor (K-NN). To support high performance beat classification on the system, we parallelized beat classification algorithm with CUDA to execute the algorithm on virtualized GPU devices on the Cloud system. MIT-BIH Arrhythmia database is used for validation of the algorithm. The system achieved about 93.5% of detection rate which is comparable to previous researches while our algorithm shows 2.5 times faster execution time compared to CPU only detection algorithm.
Impact of nearest-neighbor repulsion on superconducting pairing in 2D extended Hubbard model

NASA Astrophysics Data System (ADS)

Jiang, Mi; Hahner, U. R.; Maier, T. A.; Schulthess, T. C.

Using dynamical cluster approximation (DCA) with an continuous-time QMC solver for the two-dimensional extended Hubbard model, we studied the impact of nearest-neighbor Coulomb repulsion V on d-wave superconducting pairing dynamics. By solving Bethe-Salpeter equation for particle-particle superconducting channel, we focused on the evolution of leading d-wave eigenvalue with V and the momentum and frequency dependence of the corresponding eigenfunction. The comparison with the evolution of both spin and charge susceptibilities versus V is presented showing the competition between spin and charge fluctuations. This research received generous support from the MARVEL NCCR and used resources of the Swiss National Supercomputing Center, as well as (INCITE) program in Oak Ridge Leadership Computing Facility.
Predictive mapping of forest composition and structure with direct gradient analysis and nearest neighbor imputation in coastal Oregon, U.S.A.

Treesearch

Janet L. Ohmann; Matthew J. Gregory

2002-01-01

Spatially explicit information on the species composition and structure of forest vegetation is needed at broad spatial scales for natural resource policy analysis and ecological research. We present a method for predictive vegetation mapping that applies direct gradient analysis and nearest-neighbor imputation to ascribe detailed ground attributes of vegetation to...
Spin canting in a Dy-based single-chain magnet with dominant next-nearest-neighbor antiferromagnetic interactions

NASA Astrophysics Data System (ADS)

Bernot, K.; Luzon, J.; Caneschi, A.; Gatteschi, D.; Sessoli, R.; Bogani, L.; Vindigni, A.; Rettori, A.; Pini, M. G.

2009-04-01

We investigate theoretically and experimentally the static magnetic properties of single crystals of the molecular-based single-chain magnet of formula [Dy(hfac)3NIT(C6H4OPh)]∞ comprising alternating Dy3+ and organic radicals. The magnetic molar susceptibility χM displays a strong angular variation for sample rotations around two directions perpendicular to the chain axis. A peculiar inversion between maxima and minima in the angular dependence of χM occurs on increasing temperature. Using information regarding the monomeric building block as well as an ab initio estimation of the magnetic anisotropy of the Dy3+ ion, this “anisotropy-inversion” phenomenon can be assigned to weak one-dimensional ferromagnetism along the chain axis. This indicates that antiferromagnetic next-nearest-neighbor interactions between Dy3+ ions dominate, despite the large Dy-Dy separation, over the nearest-neighbor interactions between the radicals and the Dy3+ ions. Measurements of the field dependence of the magnetization, both along and perpendicularly to the chain, and of the angular dependence of χM in a strong magnetic field confirm such an interpretation. Transfer-matrix simulations of the experimental measurements are performed using a classical one-dimensional spin model with antiferromagnetic Heisenberg exchange interaction and noncollinear uniaxial single-ion anisotropies favoring a canted antiferromagnetic spin arrangement, with a net magnetic moment along the chain axis. The fine agreement obtained with experimental data provides estimates of the Hamiltonian parameters, essential for further study of the dynamics of rare-earth-based molecular chains.
Reentrant behavior in the nearest-neighbor Ising antiferromagnet in a magnetic field

NASA Astrophysics Data System (ADS)

Neto, Minos A.; de Sousa, J. Ricardo

2004-12-01

Motived by the H-T phase diagram in the bcc Ising antiferromagnetic with nearest-neighbor interactions obtained by Monte Carlo simulation [Landau, Phys. Rev. B 16, 4164 (1977)] that shows a reentrant behavior at low temperature, with two critical temperatures in magnetic field about 2% greater than the critical value Hc=8J , we apply the effective field renormalization group (EFRG) approach in this model on three-dimensional lattices (simple cubic-sc and body centered cubic-bcc). We find that the critical curve TN(H) exhibits a maximum point around of H≃Hc only in the bcc lattice case. We also discuss the critical behavior by the effective field theory in clusters with one (EFT-1) and two (EFT-2) spins, and a reentrant behavior is observed for the sc and bcc lattices. We have compared our results of EFRG in the bcc lattice with Monte Carlo and series expansion, and we observe a good accordance between the methods.
Heat perturbation spreading in the Fermi-Pasta-Ulam-β system with next-nearest-neighbor coupling: Competition between phonon dispersion and nonlinearity

NASA Astrophysics Data System (ADS)

Xiong, Daxing

2017-06-01

We employ the heat perturbation correlation function to study thermal transport in the one-dimensional Fermi-Pasta-Ulam-β lattice with both nearest-neighbor and next-nearest-neighbor couplings. We find that such a system bears a peculiar phonon dispersion relation, and thus there exists a competition between phonon dispersion and nonlinearity that can strongly affect the heat correlation function's shape and scaling property. Specifically, for small and large anharmoncities, the scaling laws are ballistic and superdiffusive types, respectively, which are in good agreement with the recent theoretical predictions; whereas in the intermediate range of the nonlinearity, we observe an unusual multiscaling property characterized by a nonmonotonic delocalization process of the central peak of the heat correlation function. To understand these multiscaling laws, we also examine the momentum perturbation correlation function and find a transition process with the same turning point of the anharmonicity as that shown in the heat correlation function. This suggests coupling between the momentum transport and the heat transport, in agreement with the theoretical arguments of mode cascade theory.
The classification of hunger behaviour of Lates Calcarifer through the integration of image processing technique and k-Nearest Neighbour learning algorithm

NASA Astrophysics Data System (ADS)

Taha, Z.; Razman, M. A. M.; Ghani, A. S. Abdul; Majeed, A. P. P. Abdul; Musa, R. M.; Adnan, F. A.; Sallehudin, M. F.; Mukai, Y.

2018-04-01

Fish Hunger behaviour is essential in determining the fish feeding routine, particularly for fish farmers. The inability to provide accurate feeding routines (under-feeding or over-feeding) may lead the death of the fish and consequently inhibits the quantity of the fish produced. Moreover, the excessive food that is not consumed by the fish will be dissolved in the water and accordingly reduce the water quality through the reduction of oxygen quantity. This problem also leads the death of the fish or even spur fish diseases. In the present study, a correlation of Barramundi fish-school behaviour with hunger condition through the hybrid data integration of image processing technique is established. The behaviour is clustered with respect to the position of the school size as well as the school density of the fish before feeding, during feeding and after feeding. The clustered fish behaviour is then classified through k-Nearest Neighbour (k-NN) learning algorithm. Three different variations of the algorithm namely cosine, cubic and weighted are assessed on its ability to classify the aforementioned fish hunger behaviour. It was found from the study that the weighted k-NN variation provides the best classification with an accuracy of 86.5%. Therefore, it could be concluded that the proposed integration technique may assist fish farmers in ascertaining fish feeding routine.
Transportation Modes Classification Using Sensors on Smartphones.

PubMed

Fang, Shih-Hau; Liao, Hao-Hsiang; Fei, Yu-Xiang; Chen, Kai-Hsiang; Huang, Jen-Wei; Lu, Yu-Ding; Tsao, Yu

2016-08-19

This paper investigates the transportation and vehicular modes classification by using big data from smartphone sensors. The three types of sensors used in this paper include the accelerometer, magnetometer, and gyroscope. This study proposes improved features and uses three machine learning algorithms including decision trees, K-nearest neighbor, and support vector machine to classify the user's transportation and vehicular modes. In the experiments, we discussed and compared the performance from different perspectives including the accuracy for both modes, the executive time, and the model size. Results show that the proposed features enhance the accuracy, in which the support vector machine provides the best performance in classification accuracy whereas it consumes the largest prediction time. This paper also investigates the vehicle classification mode and compares the results with that of the transportation modes.
Transportation Modes Classification Using Sensors on Smartphones

PubMed Central

Fang, Shih-Hau; Liao, Hao-Hsiang; Fei, Yu-Xiang; Chen, Kai-Hsiang; Huang, Jen-Wei; Lu, Yu-Ding; Tsao, Yu

2016-01-01

This paper investigates the transportation and vehicular modes classification by using big data from smartphone sensors. The three types of sensors used in this paper include the accelerometer, magnetometer, and gyroscope. This study proposes improved features and uses three machine learning algorithms including decision trees, K-nearest neighbor, and support vector machine to classify the user’s transportation and vehicular modes. In the experiments, we discussed and compared the performance from different perspectives including the accuracy for both modes, the executive time, and the model size. Results show that the proposed features enhance the accuracy, in which the support vector machine provides the best performance in classification accuracy whereas it consumes the largest prediction time. This paper also investigates the vehicle classification mode and compares the results with that of the transportation modes. PMID:27548182
pKa shifting in double-stranded RNA is highly dependent upon nearest neighbors and bulge positioning.

PubMed

Wilcox, Jennifer L; Bevilacqua, Philip C

2013-10-22

Shifting of pKa's in RNA is important for many biological processes; however, the driving forces responsible for shifting are not well understood. Herein, we determine how structural environments surrounding protonated bases affect pKa shifting in double-stranded RNA (dsRNA). Using (31)P NMR, we determined the pKa of the adenine in an A(+)·C base pair in various sequence and structural environments. We found a significant dependence of pKa on the base pairing strength of nearest neighbors and the location of a nearby bulge. Increasing nearest neighbor base pairing strength shifted the pKa of the adenine in an A(+)·C base pair higher by an additional 1.6 pKa units, from 6.5 to 8.1, which is well above neutrality. The addition of a bulge two base pairs away from a protonated A(+)·C base pair shifted the pKa by only ~0.5 units less than a perfectly base paired hairpin; however, positioning the bulge just one base pair away from the A(+)·C base pair prohibited formation of the protonated base pair as well as several flanking base pairs. Comparison of data collected at 25 °C and 100 mM KCl to biological temperature and Mg(2+) concentration revealed only slight pKa changes, suggesting that similar sequence contexts in biological systems have the potential to be protonated at biological pH. We present a general model to aid in the determination of the roles protonated bases may play in various dsRNA-mediated processes including ADAR editing, miRNA processing, programmed ribosomal frameshifting, and general acid-base catalysis in ribozymes.
Phase transitions and critical properties in the antiferromagnetic Ising model on a layered triangular lattice with allowance for intralayer next-nearest-neighbor interactions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Badiev, M. K., E-mail: m-zagir@mail.ru; Murtazaev, A. K.; Ramazanov, M. K.

2016-10-15

The phase transitions (PTs) and critical properties of the antiferromagnetic Ising model on a layered (stacked) triangular lattice have been studied by the Monte Carlo method using a replica algorithm with allowance for the next-nearest-neighbor interactions. The character of PTs is analyzed using the histogram technique and the method of Binder cumulants. It is established that the transition from the disordered to paramagnetic phase in the adopted model is a second-order PT. Static critical exponents of the heat capacity (α), susceptibility (γ), order parameter (β), and correlation radius (ν) and the Fischer exponent η are calculated using the finite-size scalingmore » theory. It is shown that (i) the antiferromagnetic Ising model on a layered triangular lattice belongs to the XY universality class of critical behavior and (ii) allowance for the intralayer interactions of next-nearest neighbors in the adopted model leads to a change in the universality class of critical behavior.« less
Nearest-neighbor guided evaluation of data reliability and its applications.

PubMed

Boongoen, Tossapon; Shen, Qiang

2010-12-01

The intuition of data reliability has recently been incorporated into the main stream of research on ordered weighted averaging (OWA) operators. Instead of relying on human-guided variables, the aggregation behavior is determined in accordance with the underlying characteristics of the data being aggregated. Data-oriented operators such as the dependent OWA (DOWA) utilize centralized data structures to generate reliable weights, however. Despite their simplicity, the approach taken by these operators neglects entirely any local data structure that represents a strong agreement or consensus. To address this issue, the cluster-based OWA (Clus-DOWA) operator has been proposed. It employs a cluster-based reliability measure that is effective to differentiate the accountability of different input arguments. Yet, its actual application is constrained by the high computational requirement. This paper presents a more efficient nearest-neighbor-based reliability assessment for which an expensive clustering process is not required. The proposed measure can be perceived as a stress function, from which the OWA weights and associated decision-support explanations can be generated. To illustrate the potential of this measure, it is applied to both the problem of information aggregation for alias detection and the problem of unsupervised feature selection (in which unreliable features are excluded from an actual learning process). Experimental results demonstrate that these techniques usually outperform their conventional state-of-the-art counterparts.
Exact density functional theory for ideal polymer fluids with nearest neighbor bonding constraints.

PubMed

Woodward, Clifford E; Forsman, Jan

2008-08-07

We present a new density functional theory of ideal polymer fluids, assuming nearest-neighbor bonding constraints. The free energy functional is expressed in terms of end site densities of chain segments and thus has a simpler mathematical structure than previously used expressions using multipoint distributions. This work is based on a formalism proposed by Tripathi and Chapman [Phys. Rev. Lett. 94, 087801 (2005)]. Those authors obtain an approximate free energy functional for ideal polymers in terms of monomer site densities. Calculations on both repulsive and attractive surfaces show that their theory is reasonably accurate in some cases, but does differ significantly from the exact result for longer polymers with attractive surfaces. We suggest that segment end site densities, rather than monomer site densities, are the preferred choice of "site functions" for expressing the free energy functional of polymer fluids. We illustrate the application of our theory to derive an expression for the free energy of an ideal fluid of infinitely long polymers.
Microscopic theory of the nearest-neighbor valence bond sector of the spin-1/2 kagome antiferromagnet

NASA Astrophysics Data System (ADS)

Ralko, Arnaud; Mila, Frédéric; Rousochatzakis, Ioannis

2018-03-01

The spin-1/2 Heisenberg model on the kagome lattice, which is closely realized in layered Mott insulators such as ZnCu3(OH) 6Cl2 , is one of the oldest and most enigmatic spin-1/2 lattice models. While the numerical evidence has accumulated in favor of a quantum spin liquid, the debate is still open as to whether it is a Z2 spin liquid with very short-range correlations (some kind of resonating valence bond spin liquid), or an algebraic spin liquid with power-law correlations. To address this issue, we have pushed the program started by Rokhsar and Kivelson in their derivation of the effective quantum dimer model description of Heisenberg models to unprecedented accuracy for the spin-1/2 kagome, by including all the most important virtual singlet contributions on top of the orthogonalization of the nearest-neighbor valence bond singlet basis. Quite remarkably, the resulting picture is a competition between a Z2 spin liquid and a diamond valence bond crystal with a 12-site unit cell, as in the density-matrix renormalization group simulations of Yan et al. Furthermore, we found that, on cylinders of finite diameter d , there is a transition between the Z2 spin liquid at small d and the diamond valence bond crystal at large d , the prediction of the present microscopic description for the two-dimensional lattice. These results show that, if the ground state of the spin-1/2 kagome antiferromagnet can be described by nearest-neighbor singlet dimers, it is a diamond valence bond crystal, and, a contrario, that, if the system is a quantum spin liquid, it has to involve long-range singlets, consistent with the algebraic spin liquid scenario.
Spectral identification of melon seeds variety based on k-nearest neighbor and Fisher discriminant analysis

NASA Astrophysics Data System (ADS)

Li, Cuiling; Jiang, Kai; Zhao, Xueguan; Fan, Pengfei; Wang, Xiu; Liu, Chuan

2017-10-01

Impurity of melon seeds variety will cause reductions of melon production and economic benefits of farmers, this research aimed to adopt spectral technology combined with chemometrics methods to identify melon seeds variety. Melon seeds whose varieties were "Yi Te Bai", "Yi Te Jin", "Jing Mi NO.7", "Jing Mi NO.11" and " Yi Li Sha Bai "were used as research samples. A simple spectral system was developed to collect reflective spectral data of melon seeds, including a light source unit, a spectral data acquisition unit and a data processing unit, the detection wavelength range of this system was 200-1100nm with spectral resolution of 0.14 7.7nm. The original reflective spectral data was pre-treated with de-trend (DT), multiple scattering correction (MSC), first derivative (FD), normalization (NOR) and Savitzky-Golay (SG) convolution smoothing methods. Principal Component Analysis (PCA) method was adopted to reduce the dimensions of reflective spectral data and extract principal components. K-nearest neighbour (KNN) and Fisher discriminant analysis (FDA) methods were used to develop discriminant models of melon seeds variety based on PCA. Spectral data pretreatments improved the discriminant effects of KNN and FDA, FDA generated better discriminant results than KNN, both KNN and FDA methods produced discriminant accuracies reaching to 90.0% for validation set. Research results showed that using spectral technology in combination with KNN and FDA modelling methods to identify melon seeds variety was feasible.

Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations

PubMed Central

Ceyhan, Elvan

2014-01-01

We consider two types of spatial symmetry, namely, symmetry in the mixed or shared nearest neighbor (NN) structures. We use Pielou's and Dixon's symmetry tests which are defined using contingency tables based on the NN relationships between the data points. We generalize these tests to multiple classes and demonstrate that both the asymptotic and exact versions of Pielou's first type of symmetry test are extremely conservative in rejecting symmetry in the mixed NN structure and hence should be avoided or only the Monte Carlo randomized version should be used. Under RL, we derive the asymptotic distribution for Dixon's symmetry test and also observe that the usual independence test seems to be appropriate for Pielou's second type of test. Moreover, we apply variants of Fisher's exact test on the shared NN contingency table for Pielou's second test and determine the most appropriate version for our setting. We also consider pairwise and one-versus-rest type tests in post hoc analysis after a significant overall symmetry test. We investigate the asymptotic properties of the tests, prove their consistency under appropriate null hypotheses, and investigate finite sample performance of them by extensive Monte Carlo simulations. The methods are illustrated on a real-life ecological data set. PMID:24605061
Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric

DOE Office of Scientific and Technical Information (OSTI.GOV)

Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.

In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less
Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric

DOE PAGES

Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.; ...

2015-10-09

In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less
Consistent latent position estimation and vertex classification for random dot product graphs.

PubMed

Sussman, Daniel L; Tang, Minh; Priebe, Carey E

2014-01-01

In this work, we show that using the eigen-decomposition of the adjacency matrix, we can consistently estimate latent positions for random dot product graphs provided the latent positions are i.i.d. from some distribution. If class labels are observed for a number of vertices tending to infinity, then we show that the remaining vertices can be classified with error converging to Bayes optimal using the $(k)$-nearest-neighbors classification rule. We evaluate the proposed methods on simulated data and a graph derived from Wikipedia.
Activity recognition in planetary navigation field tests using classification algorithms applied to accelerometer data.

PubMed

Song, Wen; Ade, Carl; Broxterman, Ryan; Barstow, Thomas; Nelson, Thomas; Warren, Steve

2012-01-01

Accelerometer data provide useful information about subject activity in many different application scenarios. For this study, single-accelerometer data were acquired from subjects participating in field tests that mimic tasks that astronauts might encounter in reduced gravity environments. The primary goal of this effort was to apply classification algorithms that could identify these tasks based on features present in their corresponding accelerometer data, where the end goal is to establish methods to unobtrusively gauge subject well-being based on sensors that reside in their local environment. In this initial analysis, six different activities that involve leg movement are classified. The k-Nearest Neighbors (kNN) algorithm was found to be the most effective, with an overall classification success rate of 90.8%.
Anomaly Detection Based on Local Nearest Neighbor Distance Descriptor in Crowded Scenes

PubMed Central

Hu, Shiqiang; Zhang, Huanlong; Luo, Lingkun

2014-01-01

We propose a novel local nearest neighbor distance (LNND) descriptor for anomaly detection in crowded scenes. Comparing with the commonly used low-level feature descriptors in previous works, LNND descriptor has two major advantages. First, LNND descriptor efficiently incorporates spatial and temporal contextual information around the video event that is important for detecting anomalous interaction among multiple events, while most existing feature descriptors only contain the information of single event. Second, LNND descriptor is a compact representation and its dimensionality is typically much lower than the low-level feature descriptor. Therefore, not only the computation time and storage requirement can be accordingly saved by using LNND descriptor for the anomaly detection method with offline training fashion, but also the negative aspects caused by using high-dimensional feature descriptor can be avoided. We validate the effectiveness of LNND descriptor by conducting extensive experiments on different benchmark datasets. Experimental results show the promising performance of LNND-based method against the state-of-the-art methods. It is worthwhile to notice that the LNND-based approach requires less intermediate processing steps without any subsequent processing such as smoothing but achieves comparable event better performance. PMID:25105164
Spatiotemporal distribution of Oklahoma earthquakes: Exploring relationships using a nearest-neighbor approach

NASA Astrophysics Data System (ADS)

Vasylkivska, Veronika S.; Huerta, Nicolas J.

2017-07-01

Determining the spatiotemporal characteristics of natural and induced seismic events holds the opportunity to gain new insights into why these events occur. Linking the seismicity characteristics with other geologic, geographic, natural, or anthropogenic factors could help to identify the causes and suggest mitigation strategies that reduce the risk associated with such events. The nearest-neighbor approach utilized in this work represents a practical first step toward identifying statistically correlated clusters of recorded earthquake events. Detailed study of the Oklahoma earthquake catalog's inherent errors, empirical model parameters, and model assumptions is presented. We found that the cluster analysis results are stable with respect to empirical parameters (e.g., fractal dimension) but were sensitive to epicenter location errors and seismicity rates. Most critically, we show that the patterns in the distribution of earthquake clusters in Oklahoma are primarily defined by spatial relationships between events. This observation is a stark contrast to California (also known for induced seismicity) where a comparable cluster distribution is defined by both spatial and temporal interactions between events. These results highlight the difficulty in understanding the mechanisms and behavior of induced seismicity but provide insights for future work.
Floating phase in the one-dimensional transverse axial next-nearest-neighbor Ising model.

PubMed

Chandra, Anjan Kumar; Dasgupta, Subinay

2007-02-01

To study the ground state of an axial next-nearest-neighbor Ising chain under transverse field as a function of frustration parameter kappa and field strength Gamma, we present here two different perturbative analyses. In one, we consider the (known) ground state at kappa=0.5 and Gamma=0 as the unperturbed state and treat an increase of the field from 0 to Gamma coupled with an increase of kappa from 0.5 to 0.5+rGamma/J as perturbation. The first-order perturbation correction to eigenvalue can be calculated exactly and we could conclude that there are only two phase-transition lines emanating from the point kappa=0.5, Gamma=0. In the second perturbation scheme, we consider the number of domains of length 1 as the perturbation and obtain the zeroth-order eigenfunction for the perturbed ground state. From the longitudinal spin-spin correlation, we conclude that floating phase exists for small values of transverse field over the entire region intermediate between the ferromagnetic phase and antiphase.
Magnetization reversal in magnetic dot arrays: Nearest-neighbor interactions and global configurational anisotropy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Van de Wiele, Ben; Fin, Samuele; Pancaldi, Matteo

2016-05-28

Various proposals for future magnetic memories, data processing devices, and sensors rely on a precise control of the magnetization ground state and magnetization reversal process in periodically patterned media. In finite dot arrays, such control is hampered by the magnetostatic interactions between the nanomagnets, leading to the non-uniform magnetization state distributions throughout the sample while reversing. In this paper, we evidence how during reversal typical geometric arrangements of dots in an identical magnetization state appear that originate in the dominance of either Global Configurational Anisotropy or Nearest-Neighbor Magnetostatic interactions, which depends on the fields at which the magnetization reversal setsmore » in. Based on our findings, we propose design rules to obtain the uniform magnetization state distributions throughout the array, and also suggest future research directions to achieve non-uniform state distributions of interest, e.g., when aiming at guiding spin wave edge-modes through dot arrays. Our insights are based on the Magneto-Optical Kerr Effect and Magnetic Force Microscopy measurements as well as the extensive micromagnetic simulations.« less
A multiple-point spatially weighted k-NN method for object-based classification

NASA Astrophysics Data System (ADS)

Tang, Yunwei; Jing, Linhai; Li, Hui; Atkinson, Peter M.

2016-10-01

Object-based classification, commonly referred to as object-based image analysis (OBIA), is now commonly regarded as able to produce more appealing classification maps, often of greater accuracy, than pixel-based classification and its application is now widespread. Therefore, improvement of OBIA using spatial techniques is of great interest. In this paper, multiple-point statistics (MPS) is proposed for object-based classification enhancement in the form of a new multiple-point k-nearest neighbour (k-NN) classification method (MPk-NN). The proposed method first utilises a training image derived from a pre-classified map to characterise the spatial correlation between multiple points of land cover classes. The MPS borrows spatial structures from other parts of the training image, and then incorporates this spatial information, in the form of multiple-point probabilities, into the k-NN classifier. Two satellite sensor images with a fine spatial resolution were selected to evaluate the new method. One is an IKONOS image of the Beijing urban area and the other is a WorldView-2 image of the Wolong mountainous area, in China. The images were object-based classified using the MPk-NN method and several alternatives, including the k-NN, the geostatistically weighted k-NN, the Bayesian method, the decision tree classifier (DTC), and the support vector machine classifier (SVM). It was demonstrated that the new spatial weighting based on MPS can achieve greater classification accuracy relative to the alternatives and it is, thus, recommended as appropriate for object-based classification.
``Glue" approximation for the pairing interaction in the Hubbard model with next nearest neighbor hopping

NASA Astrophysics Data System (ADS)

Khatami, Ehsan; Macridin, Alexandru; Jarrell, Mark

2008-03-01

Recently, several authors have employed the ``glue" approximation for the Cuprates in which the full pairing vertex is approximated by the spin susceptibility. We study this approximation using Quantum Monte Carlo Dynamical Cluster Approximation methods on a 2D Hubbard model. By considering a reasonable finite value for the next nearest neighbor hopping, we find that this ``glue" approximation, in the current form, does not capture the correct pairing symmetry. Here, d-wave is not the leading pairing symmetry while it is the dominant symmetry using the ``exact" QMC results. We argue that the sensitivity of this approximation to the band structure changes leads to this inconsistency and that this form of interaction may not be the appropriate description of the pairing mechanism in Cuprates. We suggest improvements to this approximation which help to capture the the essential features of the QMC data.
Fast-HPLC Fingerprinting to Discriminate Olive Oil from Other Edible Vegetable Oils by Multivariate Classification Methods.

PubMed

Jiménez-Carvelo, Ana M; González-Casado, Antonio; Pérez-Castaño, Estefanía; Cuadros-Rodríguez, Luis

2017-03-01

A new analytical method for the differentiation of olive oil from other vegetable oils using reversed-phase LC and applying chemometric techniques was developed. A 3 cm short column was used to obtain the chromatographic fingerprint of the methyl-transesterified fraction of each vegetable oil. The chromatographic analysis took only 4 min. The multivariate classification methods used were k-nearest neighbors, partial least-squares (PLS) discriminant analysis, one-class PLS, support vector machine classification, and soft independent modeling of class analogies. The discrimination of olive oil from other vegetable edible oils was evaluated by several classification quality metrics. Several strategies for the classification of the olive oil were used: one input-class, two input-class, and pseudo two input-class.
Nearest neighbor density ratio estimation for large-scale applications in astronomy

NASA Astrophysics Data System (ADS)

Kremer, J.; Gieseke, F.; Steenstrup Pedersen, K.; Igel, C.

2015-09-01

In astronomical applications of machine learning, the distribution of objects used for building a model is often different from the distribution of the objects the model is later applied to. This is known as sample selection bias, which is a major challenge for statistical inference as one can no longer assume that the labeled training data are representative. To address this issue, one can re-weight the labeled training patterns to match the distribution of unlabeled data that are available already in the training phase. There are many examples in practice where this strategy yielded good results, but estimating the weights reliably from a finite sample is challenging. We consider an efficient nearest neighbor density ratio estimator that can exploit large samples to increase the accuracy of the weight estimates. To solve the problem of choosing the right neighborhood size, we propose to use cross-validation on a model selection criterion that is unbiased under covariate shift. The resulting algorithm is our method of choice for density ratio estimation when the feature space dimensionality is small and sample sizes are large. The approach is simple and, because of the model selection, robust. We empirically find that it is on a par with established kernel-based methods on relatively small regression benchmark datasets. However, when applied to large-scale photometric redshift estimation, our approach outperforms the state-of-the-art.
Fracton topological order from nearest-neighbor two-spin interactions and dualities

NASA Astrophysics Data System (ADS)

Slagle, Kevin; Kim, Yong Baek

2017-10-01

Fracton topological order describes a remarkable phase of matter, which can be characterized by fracton excitations with constrained dynamics and a ground-state degeneracy that increases exponentially with the length of the system on a three-dimensional torus. However, previous models exhibiting this order require many-spin interactions, which may be very difficult to realize in a real material or cold atom system. In this work, we present a more physically realistic model which has the so-called X-cube fracton topological order [Vijay, Haah, and Fu, Phys. Rev. B 94, 235157 (2016), 10.1103/PhysRevB.94.235157] but only requires nearest-neighbor two-spin interactions. The model lives on a three-dimensional honeycomb-based lattice with one to two spin-1/2 degrees of freedom on each site and a unit cell of six sites. The model is constructed from two orthogonal stacks of Z2 topologically ordered Kitaev honeycomb layers [Kitaev, Ann. Phys. 321, 2 (2006), 10.1016/j.aop.2005.10.005], which are coupled together by a two-spin interaction. It is also shown that a four-spin interaction can be included to instead stabilize 3+1D Z2 topological order. We also find dual descriptions of four quantum phase transitions in our model, all of which appear to be discontinuous first-order transitions.
The minimum distance approach to classification

NASA Technical Reports Server (NTRS)

Wacker, A. G.; Landgrebe, D. A.

1971-01-01

The work to advance the state-of-the-art of miminum distance classification is reportd. This is accomplished through a combination of theoretical and comprehensive experimental investigations based on multispectral scanner data. A survey of the literature for suitable distance measures was conducted and the results of this survey are presented. It is shown that minimum distance classification, using density estimators and Kullback-Leibler numbers as the distance measure, is equivalent to a form of maximum likelihood sample classification. It is also shown that for the parametric case, minimum distance classification is equivalent to nearest neighbor classification in the parameter space.
Protein classification based on text document classification techniques.

PubMed

Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith

2005-03-01

The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.
Natural Language Processing Based Instrument for Classification of Free Text Medical Records

PubMed Central

2016-01-01

According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray) and 13 subgroups using two well-known methods: Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system) due to common features characterizing these subclasses. The overall results of the study were successful. PMID:27668260
An unbalanced spectra classification method based on entropy

NASA Astrophysics Data System (ADS)

Liu, Zhong-bao; Zhao, Wen-juan

2017-05-01

How to solve the problem of distinguishing the minority spectra from the majority of the spectra is quite important in astronomy. In view of this, an unbalanced spectra classification method based on entropy (USCM) is proposed in this paper to deal with the unbalanced spectra classification problem. USCM greatly improves the performances of the traditional classifiers on distinguishing the minority spectra as it takes the data distribution into consideration in the process of classification. However, its time complexity is exponential with the training size, and therefore, it can only deal with the problem of small- and medium-scale classification. How to solve the large-scale classification problem is quite important to USCM. It can be easily obtained by mathematical computation that the dual form of USCM is equivalent to the minimum enclosing ball (MEB), and core vector machine (CVM) is introduced, USCM based on CVM is proposed to deal with the large-scale classification problem. Several comparative experiments on the 4 subclasses of K-type spectra, 3 subclasses of F-type spectra and 3 subclasses of G-type spectra from Sloan Digital Sky Survey (SDSS) verify USCM and USCM based on CVM perform better than kNN (k nearest neighbor) and SVM (support vector machine) in dealing with the problem of rare spectra mining respectively on the small- and medium-scale datasets and the large-scale datasets.
Relationship between neighbor number and vibrational spectra in disordered colloidal clusters with attractive interactions

NASA Astrophysics Data System (ADS)

Yunker, Peter J.; Zhang, Zexin; Gratale, Matthew; Chen, Ke; Yodh, A. G.

2013-03-01

We study connections between vibrational spectra and average nearest neighbor number in disordered clusters of colloidal particles with attractive interactions. Measurements of displacement covariances between particles in each cluster permit calculation of the stiffness matrix, which contains effective spring constants linking pairs of particles. From the cluster stiffness matrix, we derive vibrational properties of corresponding "shadow" glassy clusters, with the same geometric configuration and interactions as the "source" cluster but without damping. Here, we investigate the stiffness matrix to elucidate the origin of the correlations between the median frequency of cluster vibrational modes and average number of nearest neighbors in the cluster. We find that the mean confining stiffness of particles in a cluster, i.e., the ensemble-averaged sum of nearest neighbor spring constants, correlates strongly with average nearest neighbor number, and even more strongly with median frequency. Further, we find that the average oscillation frequency of an individual particle is set by the total stiffness of its nearest neighbor bonds; this average frequency increases as the square root of the nearest neighbor bond stiffness, in a manner similar to the simple harmonic oscillator.
Histogram Curve Matching Approaches for Object-based Image Classification of Land Cover and Land Use

PubMed Central

Toure, Sory I.; Stow, Douglas A.; Weeks, John R.; Kumar, Sunil

2013-01-01

The classification of image-objects is usually done using parametric statistical measures of central tendency and/or dispersion (e.g., mean or standard deviation). The objectives of this study were to analyze digital number histograms of image objects and evaluate classifications measures exploiting characteristic signatures of such histograms. Two histograms matching classifiers were evaluated and compared to the standard nearest neighbor to mean classifier. An ADS40 airborne multispectral image of San Diego, California was used for assessing the utility of curve matching classifiers in a geographic object-based image analysis (GEOBIA) approach. The classifications were performed with data sets having 0.5 m, 2.5 m, and 5 m spatial resolutions. Results show that histograms are reliable features for characterizing classes. Also, both histogram matching classifiers consistently performed better than the one based on the standard nearest neighbor to mean rule. The highest classification accuracies were produced with images having 2.5 m spatial resolution. PMID:24403648

Liquid li structure and dynamics: A comparison between OFDFT and second nearest-neighbor embedded-atom method

DOE PAGES

Chen, Mohan; Vella, Joseph R.; Panagiotopoulos, Athanassios Z.; ...

2015-04-08

The structure and dynamics of liquid lithium are studied using two simulation methods: orbital-free (OF) first-principles molecular dynamics (MD), which employs OF density functional theory (DFT), and classical MD utilizing a second nearest-neighbor embedded-atom method potential. The properties we studied include the dynamic structure factor, the self-diffusion coefficient, the dispersion relation, the viscosity, and the bond angle distribution function. Our simulation results were compared to available experimental data when possible. Each method has distinct advantages and disadvantages. For example, OFDFT gives better agreement with experimental dynamic structure factors, yet is more computationally demanding than classical simulations. Classical simulations can accessmore » a broader temperature range and longer time scales. The combination of first-principles and classical simulations is a powerful tool for studying properties of liquid lithium.« less
Heterogeneous autoregressive model with structural break using nearest neighbor truncation volatility estimators for DAX.

PubMed

Chin, Wen Cheong; Lee, Min Cherng; Yap, Grace Lee Ching

2016-01-01

High frequency financial data modelling has become one of the important research areas in the field of financial econometrics. However, the possible structural break in volatile financial time series often trigger inconsistency issue in volatility estimation. In this study, we propose a structural break heavy-tailed heterogeneous autoregressive (HAR) volatility econometric model with the enhancement of jump-robust estimators. The breakpoints in the volatility are captured by dummy variables after the detection by Bai-Perron sequential multi breakpoints procedure. In order to further deal with possible abrupt jump in the volatility, the jump-robust volatility estimators are composed by using the nearest neighbor truncation approach, namely the minimum and median realized volatility. Under the structural break improvements in both the models and volatility estimators, the empirical findings show that the modified HAR model provides the best performing in-sample and out-of-sample forecast evaluations as compared with the standard HAR models. Accurate volatility forecasts have direct influential to the application of risk management and investment portfolio analysis.
Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.

PubMed

Haghverdi, Laleh; Lun, Aaron T L; Morgan, Michael D; Marioni, John C

2018-06-01

Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.
Exploitation of RF-DNA for Device Classification and Verification Using GRLVQI Processing

DTIC Science & Technology

2012-12-01

5 FLD Fisher’s Linear Discriminant . . . . . . . . . . . . . . . . . . . 6 kNN K-Nearest Neighbor...Neighbor ( kNN ), Support Vector Machine (SVM), and simple cross-correlation techniques [40, 57, 82, 88, 94, 95]. The RF-DNA fingerprinting research in...Expansion and the Dis- crete Gabor Transform on a Non-Separable Lattice”. 2000 IEEE Int’l Conf on Acoustics, Speech , and Signal Processing (ICASSP00
Ground-state entropy of the potts antiferromagnet with next-nearest-neighbor spin-spin couplings on strips of the square lattice

PubMed

Chang; Shrock

2000-10-01

We present exact calculations of the zero-temperature partition function (chromatic polynomial) and W(q), the exponent of the ground-state entropy, for the q-state Potts antiferromagnet with next-nearest-neighbor spin-spin couplings on square lattice strips, of width L(y)=3 and L(y)=4 vertices and arbitrarily great length Lx vertices, with both free and periodic boundary conditions. The resultant values of W for a range of physical q values are compared with each other and with the values for the full two-dimensional lattice. These results give insight into the effect of such nonnearest-neighbor couplings on the ground-state entropy. We show that the q=2 (Ising) and q=4 Potts antiferromagnets have zero-temperature critical points on the Lx-->infinity limits of the strips that we study. With the generalization of q from Z+ to C, we determine the analytic structure of W(q) in the q plane for the various cases.
Ising model of cardiac thin filament activation with nearest-neighbor cooperative interactions

NASA Technical Reports Server (NTRS)

Rice, John Jeremy; Stolovitzky, Gustavo; Tu, Yuhai; de Tombe, Pieter P.; Bers, D. M. (Principal Investigator)

2003-01-01

We have developed a model of cardiac thin filament activation using an Ising model approach from equilibrium statistical physics. This model explicitly represents nearest-neighbor interactions between 26 troponin/tropomyosin units along a one-dimensional array that represents the cardiac thin filament. With transition rates chosen to match experimental data, the results show that the resulting force-pCa (F-pCa) relations are similar to Hill functions with asymmetries, as seen in experimental data. Specifically, Hill plots showing (log(F/(1-F)) vs. log [Ca]) reveal a steeper slope below the half activation point (Ca(50)) compared with above. Parameter variation studies show interplay of parameters that affect the apparent cooperativity and asymmetry in the F-pCa relations. The model also predicts that Ca binding is uncooperative for low [Ca], becomes steeper near Ca(50), and becomes uncooperative again at higher [Ca]. The steepness near Ca(50) mirrors the steep F-pCa as a result of thermodynamic considerations. The model also predicts that the correlation between troponin/tropomyosin units along the one-dimensional array quickly decays at high and low [Ca], but near Ca(50), high correlation occurs across the whole array. This work provides a simple model that can account for the steepness and shape of F-pCa relations that other models fail to reproduce.
A discrete wavelet based feature extraction and hybrid classification technique for microarray data analysis.

PubMed

Bennet, Jaison; Ganaprakasam, Chilambuchelvan Arul; Arputharaj, Kannan

2014-01-01

Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.
Random ensemble learning for EEG classification.

PubMed

Hosseini, Mohammad-Parsa; Pompili, Dario; Elisevich, Kost; Soltanian-Zadeh, Hamid

2018-01-01

Real-time detection of seizure activity in epilepsy patients is critical in averting seizure activity and improving patients' quality of life. Accurate evaluation, presurgical assessment, seizure prevention, and emergency alerts all depend on the rapid detection of seizure onset. A new method of feature selection and classification for rapid and precise seizure detection is discussed wherein informative components of electroencephalogram (EEG)-derived data are extracted and an automatic method is presented using infinite independent component analysis (I-ICA) to select independent features. The feature space is divided into subspaces via random selection and multichannel support vector machines (SVMs) are used to classify these subspaces. The result of each classifier is then combined by majority voting to establish the final output. In addition, a random subspace ensemble using a combination of SVM, multilayer perceptron (MLP) neural network and an extended k-nearest neighbors (k-NN), called extended nearest neighbor (ENN), is developed for the EEG and electrocorticography (ECoG) big data problem. To evaluate the solution, a benchmark ECoG of eight patients with temporal and extratemporal epilepsy was implemented in a distributed computing framework as a multitier cloud-computing architecture. Using leave-one-out cross-validation, the accuracy, sensitivity, specificity, and both false positive and false negative ratios of the proposed method were found to be 0.97, 0.98, 0.96, 0.04, and 0.02, respectively. Application of the solution to cases under investigation with ECoG has also been effected to demonstrate its utility. Copyright © 2017 Elsevier B.V. All rights reserved.
Optimization of Classification Strategies of Acetowhite Temporal Patterns towards Improving Diagnostic Performance of Colposcopy

PubMed Central

Acosta-Mesa, Héctor Gabriel; Cruz-Ramírez, Nicandro; Hernández-Jiménez, Rodolfo

2017-01-01

Efforts have been being made to improve the diagnostic performance of colposcopy, trying to help better diagnose cervical cancer, particularly in developing countries. However, improvements in a number of areas are still necessary, such as the time it takes to process the full digital image of the cervix, the performance of the computing systems used to identify different kinds of tissues, and biopsy sampling. In this paper, we explore three different, well-known automatic classification methods (k-Nearest Neighbors, Naïve Bayes, and C4.5), in addition to different data models that take full advantage of this information and improve the diagnostic performance of colposcopy based on acetowhite temporal patterns. Based on the ROC and PRC area scores, the k-Nearest Neighbors and discrete PLA representation performed better than other methods. The values of sensitivity, specificity, and accuracy reached using this method were 60% (95% CI 50–70), 79% (95% CI 71–86), and 70% (95% CI 60–80), respectively. The acetowhitening phenomenon is not exclusive to high-grade lesions, and we have found acetowhite temporal patterns of epithelial changes that are not precancerous lesions but that are similar to positive ones. These findings need to be considered when developing more robust computing systems in the future. PMID:28744318
Classification Model for Damage Localization in a Plate Structure

NASA Astrophysics Data System (ADS)

Janeliukstis, R.; Ruchevskis, S.; Chate, A.

2018-01-01

The present study is devoted to the problem of damage localization by means of data classification. The commercial ANSYS finite-elements program was used to make a model of a cantilevered composite plate equipped with numerous strain sensors. The plate was divided into zones, and, for data classification purposes, each of them housed several points to which a point mass of magnitude 5 and 10% of plate mass was applied. At each of these points, a numerical modal analysis was performed, from which the first few natural frequencies and strain readings were extracted. The strain data for every point were the input for a classification procedure involving k nearest neighbors and decision trees. The classification model was trained and optimized by finetuning the key parameters of both algorithms. Finally, two new query points were simulated and subjected to a classification in terms of assigning a label to one of the zones of the plate, thus localizing these points. Damage localization results were compared for both algorithms and were found to be in good agreement with the actual application positions of point load.
New feature extraction method for classification of agricultural products from x-ray images

NASA Astrophysics Data System (ADS)

Talukder, Ashit; Casasent, David P.; Lee, Ha-Woon; Keagy, Pamela M.; Schatzki, Thomas F.

1999-01-01

Classification of real-time x-ray images of randomly oriented touching pistachio nuts is discussed. The ultimate objective is the development of a system for automated non- invasive detection of defective product items on a conveyor belt. We discuss the extraction of new features that allow better discrimination between damaged and clean items. This feature extraction and classification stage is the new aspect of this paper; our new maximum representation and discrimination between damaged and clean items. This feature extraction and classification stage is the new aspect of this paper; our new maximum representation and discriminating feature (MRDF) extraction method computes nonlinear features that are used as inputs to a new modified k nearest neighbor classifier. In this work the MRDF is applied to standard features. The MRDF is robust to various probability distributions of the input class and is shown to provide good classification and new ROC data.
Emotional State Classification in Virtual Reality Using Wearable Electroencephalography

NASA Astrophysics Data System (ADS)

Suhaimi, N. S.; Teo, J.; Mountstephens, J.

2018-03-01

This paper presents the classification of emotions on EEG signals. One of the key issues in this research is the lack of mental classification using VR as the medium to stimulate emotion. The approach towards this research is by using K-nearest neighbor (KNN) and Support Vector Machine (SVM). Firstly, each of the participant will be required to wear the EEG headset and recording their brainwaves when they are immersed inside the VR. The data points are then marked if they showed any physical signs of emotion or by observing the brainwave pattern. Secondly, the data will then be tested and trained with KNN and SVM algorithms. The accuracy achieved from both methods were approximately 82% throughout the brainwave spectrum (α, β, γ, δ, θ). These methods showed promising results and will be further enhanced using other machine learning approaches in VR stimulus.
The probability of misassociation between neighboring targets

NASA Astrophysics Data System (ADS)

Areta, Javier A.; Bar-Shalom, Yaakov; Rothrock, Ronald

2008-04-01

This paper presents procedures to calculate the probability that the measurement originating from an extraneous target will be (mis)associated with a target of interest for the cases of Nearest Neighbor and Global association. It is shown that these misassociation probabilities depend, under certain assumptions, on a particular - covariance weighted - norm of the difference between the targets' predicted measurements. For the Nearest Neighbor association, the exact solution, obtained for the case of equal innovation covariances, is based on a noncentral chi-square distribution. An approximate solution is also presented for the case of unequal innovation covariances. For the Global case an approximation is presented for the case of "similar" innovation covariances. In the general case of unequal innovation covariances where this approximation fails, an exact method based on the inversion of the characteristic function is presented. The theoretical results, confirmed by Monte Carlo simulations, quantify the benefit of Global vs. Nearest Neighbor association. These results are applied to problems of single sensor as well as centralized fusion architecture multiple sensor tracking.
Predicting Drug-Target Interactions for New Drug Compounds Using a Weighted Nearest Neighbor Profile.

PubMed

van Laarhoven, Twan; Marchiori, Elena

2013-01-01

In silico discovery of interactions between drug compounds and target proteins is of core importance for improving the efficiency of the laborious and costly experimental determination of drug-target interaction. Drug-target interaction data are available for many classes of pharmaceutically useful target proteins including enzymes, ion channels, GPCRs and nuclear receptors. However, current drug-target interaction databases contain a small number of drug-target pairs which are experimentally validated interactions. In particular, for some drug compounds (or targets) there is no available interaction. This motivates the need for developing methods that predict interacting pairs with high accuracy also for these 'new' drug compounds (or targets). We show that a simple weighted nearest neighbor procedure is highly effective for this task. We integrate this procedure into a recent machine learning method for drug-target interaction we developed in previous work. Results of experiments indicate that the resulting method predicts true interactions with high accuracy also for new drug compounds and achieves results comparable or better than those of recent state-of-the-art algorithms. Software is publicly available at http://cs.ru.nl/~tvanlaarhoven/drugtarget2013/.
Rule groupings in expert systems using nearest neighbour decision rules, and convex hulls

NASA Technical Reports Server (NTRS)

Anastasiadis, Stergios

1991-01-01

Expert System shells are lacking in many areas of software engineering. Large rule based systems are not semantically comprehensible, difficult to debug, and impossible to modify or validate. Partitioning a set of rules found in CLIPS (C Language Integrated Production System) into groups of rules which reflect the underlying semantic subdomains of the problem, will address adequately the concerns stated above. Techniques are introduced to structure a CLIPS rule base into groups of rules that inherently have common semantic information. The concepts involved are imported from the field of A.I., Pattern Recognition, and Statistical Inference. Techniques focus on the areas of feature selection, classification, and a criteria of how 'good' the classification technique is, based on Bayesian Decision Theory. A variety of distance metrics are discussed for measuring the 'closeness' of CLIPS rules and various Nearest Neighbor classification algorithms are described based on the above metric.
Text Classification for Intelligent Portfolio Management

DTIC Science & Technology

2002-05-01

years including nearest neighbor classification [15], naive Bayes with EM (Ex- pectation Maximization) [11] [13], Winnow with active learning [10... Active Learning and Expectation Maximization (EM). In particular, active learning is used to actively select documents for labeling, then EM assigns...generalization with active learning . Machine Learning, 15(2):201–221, 1994. [3] I. Dagan and P. Engelson. Committee-based sampling for training
Personalised news filtering and recommendation system using Chi-square statistics-based K-nearest neighbour (χ2SB-KNN) model

NASA Astrophysics Data System (ADS)

Adeniyi, D. A.; Wei, Z.; Yang, Y.

2017-10-01

Recommendation problem has been extensively studied by researchers in the field of data mining, database and information retrieval. This study presents the design and realisation of an automated, personalised news recommendations system based on Chi-square statistics-based K-nearest neighbour (χ2SB-KNN) model. The proposed χ2SB-KNN model has the potential to overcome computational complexity and information overloading problems, reduces runtime and speeds up execution process through the use of critical value of χ2 distribution. The proposed recommendation engine can alleviate scalability challenges through combined online pattern discovery and pattern matching for real-time recommendations. This work also showcases the development of a novel method of feature selection referred to as Data Discretisation-Based feature selection method. This is used for selecting the best features for the proposed χ2SB-KNN algorithm at the preprocessing stage of the classification procedures. The implementation of the proposed χ2SB-KNN model is achieved through the use of a developed in-house Java program on an experimental website called OUC newsreaders' website. Finally, we compared the performance of our system with two baseline methods which are traditional Euclidean distance K-nearest neighbour and Naive Bayesian techniques. The result shows a significant improvement of our method over the baseline methods studied.
Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines.

PubMed

Majid, Abdul; Ali, Safdar; Iqbal, Mubashar; Kausar, Nabeela

2014-03-01

This study proposes a novel prediction approach for human breast and colon cancers using different feature spaces. The proposed scheme consists of two stages: the preprocessor and the predictor. In the preprocessor stage, the mega-trend diffusion (MTD) technique is employed to increase the samples of the minority class, thereby balancing the dataset. In the predictor stage, machine-learning approaches of K-nearest neighbor (KNN) and support vector machines (SVM) are used to develop hybrid MTD-SVM and MTD-KNN prediction models. MTD-SVM model has provided the best values of accuracy, G-mean and Matthew's correlation coefficient of 96.71%, 96.70% and 71.98% for cancer/non-cancer dataset, breast/non-breast cancer dataset and colon/non-colon cancer dataset, respectively. We found that hybrid MTD-SVM is the best with respect to prediction performance and computational cost. MTD-KNN model has achieved moderately better prediction as compared to hybrid MTD-NB (Naïve Bayes) but at the expense of higher computing cost. MTD-KNN model is faster than MTD-RF (random forest) but its prediction is not better than MTD-RF. To the best of our knowledge, the reported results are the best results, so far, for these datasets. The proposed scheme indicates that the developed models can be used as a tool for the prediction of cancer. This scheme may be useful for study of any sequential information such as protein sequence or any nucleic acid sequence. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Intra-regional classification of grape seeds produced in Mendoza province (Argentina) by multi-elemental analysis and chemometrics tools.

PubMed

Canizo, Brenda V; Escudero, Leticia B; Pérez, María B; Pellerano, Roberto G; Wuilloud, Rodolfo G

2018-03-01

The feasibility of the application of chemometric techniques associated with multi-element analysis for the classification of grape seeds according to their provenance vineyard soil was investigated. Grape seed samples from different localities of Mendoza province (Argentina) were evaluated. Inductively coupled plasma mass spectrometry (ICP-MS) was used for the determination of twenty-nine elements (Ag, As, Ce, Co, Cs, Cu, Eu, Fe, Ga, Gd, La, Lu, Mn, Mo, Nb, Nd, Ni, Pr, Rb, Sm, Te, Ti, Tl, Tm, U, V, Y, Zn and Zr). Once the analytical data were collected, supervised pattern recognition techniques such as linear discriminant analysis (LDA), partial least square discriminant analysis (PLS-DA), k-nearest neighbors (k-NN), support vector machine (SVM) and Random Forest (RF) were applied to construct classification/discrimination rules. The results indicated that nonlinear methods, RF and SVM, perform best with up to 98% and 93% accuracy rate, respectively, and therefore are excellent tools for classification of grapes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Mapping from multiple-control Toffoli circuits to linear nearest neighbor quantum circuits

NASA Astrophysics Data System (ADS)

Cheng, Xueyun; Guan, Zhijin; Ding, Weiping

2018-07-01

In recent years, quantum computing research has been attracting more and more attention, but few studies on the limited interaction distance between quantum bits (qubit) are deeply carried out. This paper presents a mapping method for transforming multiple-control Toffoli (MCT) circuits into linear nearest neighbor (LNN) quantum circuits instead of traditional decomposition-based methods. In order to reduce the number of inserted SWAP gates, a novel type of gate with the optimal LNN quantum realization was constructed, namely NNTS gate. The MCT gate with multiple control bits could be better cascaded by the NNTS gates, in which the arrangement of the input lines was LNN arrangement of the MCT gate. Then, the communication overhead measurement model on inserted SWAP gate count from the original arrangement to the new arrangement was put forward, and we selected one of the LNN arrangements with the minimum SWAP gate count. Moreover, the LNN arrangement-based mapping algorithm was given, and it dealt with the MCT gates in turn and mapped each MCT gate into its LNN form by inserting the minimum number of SWAP gates. Finally, some simplification rules were used, which can further reduce the final quantum cost of the LNN quantum circuit. Experiments on some benchmark MCT circuits indicate that the direct mapping algorithm results in fewer additional SWAP gates in about 50%, while the average improvement rate in quantum cost is 16.95% compared to the decomposition-based method. In addition, it has been verified that the proposed method has greater superiority for reversible circuits cascaded by MCT gates with more control bits.

Influence of the number of topologically interacting neighbors on swarm dynamics

PubMed Central

Shang, Yilun; Bouffanais, Roland

2014-01-01

Recent empirical and theoretical works on collective behaviors based on a topological interaction are beginning to offer some explanations as for the physical reasons behind the selection of a particular number of nearest neighbors locally affecting each individual's dynamics. Recently, flocking starlings have been shown to topologically interact with a very specific number of neighbors, between six to eight, while metric-free interactions were found to govern human crowd dynamics. Here, we use network- and graph-theoretic approaches combined with a dynamical model of locally interacting self-propelled particles to study how the consensus reaching process and its dynamics are influenced by the number k of topological neighbors. Specifically, we prove exactly that, in the absence of noise, consensus is always attained with a speed to consensus strictly increasing with k. The analysis of both speed and time to consensus reveals that, irrespective of the swarm size, a value of k ~ 10 speeds up the rate of convergence to consensus to levels close to the one of the optimal all-to-all interaction signaling. Furthermore, this effect is found to be more pronounced in the presence of environmental noise. PMID:24567077
Equilibrium, metastability, and hysteresis in a model spin-crossover material with nearest-neighbor antiferromagnetic-like and long-range ferromagnetic-like interactions

NASA Astrophysics Data System (ADS)

Rikvold, Per Arne; Brown, Gregory; Miyashita, Seiji; Omand, Conor; Nishino, Masamichi

2016-02-01

Phase diagrams and hysteresis loops were obtained by Monte Carlo simulations and a mean-field method for a simplified model of a spin-crossover material with a two-step transition between the high-spin and low-spin states. This model is a mapping onto a square-lattice S =1 /2 Ising model with antiferromagnetic nearest-neighbor and ferromagnetic Husimi-Temperley (equivalent-neighbor) long-range interactions. Phase diagrams obtained by the two methods for weak and strong long-range interactions are found to be similar. However, for intermediate-strength long-range interactions, the Monte Carlo simulations show that tricritical points decompose into pairs of critical end points and mean-field critical points surrounded by horn-shaped regions of metastability. Hysteresis loops along paths traversing the horn regions are strongly reminiscent of thermal two-step transition loops with hysteresis, recently observed experimentally in several spin-crossover materials. We believe analogous phenomena should be observable in experiments and simulations for many systems that exhibit competition between local antiferromagnetic-like interactions and long-range ferromagnetic-like interactions caused by elastic distortions.
Equilibrium, metastability, and hysteresis in a model spin-crossover material with nearest-neighbor antiferromagnetic-like and long-range ferromagnetic-like interactions

DOE PAGES

Rikvold, Per Arne; Brown, Gregory; Miyashita, Seiji; ...

2016-02-16

Phase diagrams and hysteresis loops were obtained by Monte Carlo simulations and a mean- field method for a simplified model of a spin-crossovermaterialwith a two-step transition between the high-spin and low-spin states. This model is a mapping onto a square-lattice S = 1/2 Ising model with antiferromagnetic nearest-neighbor and ferromagnetic Husimi-Temperley ( equivalent-neighbor) long-range interactions. Phase diagrams obtained by the two methods for weak and strong long-range interactions are found to be similar. However, for intermediate-strength long-range interactions, the Monte Carlo simulations show that tricritical points decompose into pairs of critical end points and mean-field critical points surrounded by horn-shapedmore » regions of metastability. Hysteresis loops along paths traversing the horn regions are strongly reminiscent of thermal two-step transition loops with hysteresis, recently observed experimentally in several spin-crossover materials. As a result, we believe analogous phenomena should be observable in experiments and simulations for many systems that exhibit competition between local antiferromagnetic-like interactions and long-range ferromagnetic-like interactions caused by elastic distortions.« less
Symmetrized Nearest Neighbor Regression Estimates.

DTIC Science & Technology

1987-12-01

TELEPHONE NUMBER 22C. OFFICE SYMBO0L (Inetude A me. Code) Major Brian Woodruff 1(202) 767-5026 1 Dr -’ 00 PORN 147,303- APR EDI1TION OF I JAN 73 IS...in tenth of a pence) in 1973. The data come from the Family Ex- penditure Survey, Annual Base Tapes 1968-198S, Department of Employment, Statistics...Statistics, 13, 1465- 1481. Hildenbrand, K. and Hildenbrand, W. (1986). On the mean income effect: a data analysis of the U.K. family expenditure
Comparative Analysis of Document level Text Classification Algorithms using R

NASA Astrophysics Data System (ADS)

Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.

2017-08-01

From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.
Randomized Approaches for Nearest Neighbor Search in Metric Space When Computing the Pairwise Distance Is Extremely Expensive

NASA Astrophysics Data System (ADS)

Wang, Lusheng; Yang, Yong; Lin, Guohui

Finding the closest object for a query in a database is a classical problem in computer science. For some modern biological applications, computing the similarity between two objects might be very time consuming. For example, it takes a long time to compute the edit distance between two whole chromosomes and the alignment cost of two 3D protein structures. In this paper, we study the nearest neighbor search problem in metric space, where the pair-wise distance between two objects in the database is known and we want to minimize the number of distances computed on-line between the query and objects in the database in order to find the closest object. We have designed two randomized approaches for indexing metric space databases, where objects are purely described by their distances with each other. Analysis and experiments show that our approaches only need to compute O(logn) objects in order to find the closest object, where n is the total number of objects in the database.
Comparing K-mer based methods for improved classification of 16S sequences.

PubMed

Vinje, Hilde; Liland, Kristian Hovde; Almøy, Trygve; Snipen, Lars

2015-07-01

The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.
An integrated classifier for computer-aided diagnosis of colorectal polyps based on random forest and location index strategies

NASA Astrophysics Data System (ADS)

Hu, Yifan; Han, Hao; Zhu, Wei; Li, Lihong; Pickhardt, Perry J.; Liang, Zhengrong

2016-03-01

Feature classification plays an important role in differentiation or computer-aided diagnosis (CADx) of suspicious lesions. As a widely used ensemble learning algorithm for classification, random forest (RF) has a distinguished performance for CADx. Our recent study has shown that the location index (LI), which is derived from the well-known kNN (k nearest neighbor) and wkNN (weighted k nearest neighbor) classifier [1], has also a distinguished role in the classification for CADx. Therefore, in this paper, based on the property that the LI will achieve a very high accuracy, we design an algorithm to integrate the LI into RF for improved or higher value of AUC (area under the curve of receiver operating characteristics -- ROC). Experiments were performed by the use of a database of 153 lesions (polyps), including 116 neoplastic lesions and 37 hyperplastic lesions, with comparison to the existing classifiers of RF and wkNN, respectively. A noticeable gain by the proposed integrated classifier was quantified by the AUC measure.
A Proposed Methodology to Classify Frontier Capital Markets

DTIC Science & Technology

2011-07-31

but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project involves basic...machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ), ensemble...Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F. Horizontal
A Proposed Methodology to Classify Frontier Capital Markets

DTIC Science & Technology

2011-07-31

out of charity, but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project...identification, and machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ...Support Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F
Classification enhancement for post-stroke dementia using fuzzy neighborhood preserving analysis with QR-decomposition.

PubMed

Al-Qazzaz, Noor Kamal; Ali, Sawal; Ahmad, Siti Anom; Escudero, Javier

2017-07-01

The aim of the present study was to discriminate the electroencephalogram (EEG) of 5 patients with vascular dementia (VaD), 15 patients with stroke-related mild cognitive impairment (MCI), and 15 control normal subjects during a working memory (WM) task. We used independent component analysis (ICA) and wavelet transform (WT) as a hybrid preprocessing approach for EEG artifact removal. Three different features were extracted from the cleaned EEG signals: spectral entropy (SpecEn), permutation entropy (PerEn) and Tsallis entropy (TsEn). Two classification schemes were applied - support vector machine (SVM) and k-nearest neighbors (kNN) - with fuzzy neighborhood preserving analysis with QR-decomposition (FNPAQR) as a dimensionality reduction technique. The FNPAQR dimensionality reduction technique increased the SVM classification accuracy from 82.22% to 90.37% and from 82.6% to 86.67% for kNN. These results suggest that FNPAQR consistently improves the discrimination of VaD, MCI patients and control normal subjects and it could be a useful feature selection to help the identification of patients with VaD and MCI.
Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.

PubMed

Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi

2013-01-01

The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.
A neural network approach to cloud classification

NASA Technical Reports Server (NTRS)

Lee, Jonathan; Weger, Ronald C.; Sengupta, Sailes K.; Welch, Ronald M.

1990-01-01

It is shown that, using high-spatial-resolution data, very high cloud classification accuracies can be obtained with a neural network approach. A texture-based neural network classifier using only single-channel visible Landsat MSS imagery achieves an overall cloud identification accuracy of 93 percent. Cirrus can be distinguished from boundary layer cloudiness with an accuracy of 96 percent, without the use of an infrared channel. Stratocumulus is retrieved with an accuracy of 92 percent, cumulus at 90 percent. The use of the neural network does not improve cirrus classification accuracy. Rather, its main effect is in the improved separation between stratocumulus and cumulus cloudiness. While most cloud classification algorithms rely on linear parametric schemes, the present study is based on a nonlinear, nonparametric four-layer neural network approach. A three-layer neural network architecture, the nonparametric K-nearest neighbor approach, and the linear stepwise discriminant analysis procedure are compared. A significant finding is that significantly higher accuracies are attained with the nonparametric approaches using only 20 percent of the database as training data, compared to 67 percent of the database in the linear approach.
Object Classification in Semi Structured Enviroment Using Forward-Looking Sonar

PubMed Central

dos Santos, Matheus; Ribeiro, Pedro Otávio; Núñez, Pedro; Botelho, Silvia

2017-01-01

The submarine exploration using robots has been increasing in recent years. The automation of tasks such as monitoring, inspection, and underwater maintenance requires the understanding of the robot’s environment. The object recognition in the scene is becoming a critical issue for these systems. On this work, an underwater object classification pipeline applied in acoustic images acquired by Forward-Looking Sonar (FLS) are studied. The object segmentation combines thresholding, connected pixels searching and peak of intensity analyzing techniques. The object descriptor extract intensity and geometric features of the detected objects. A comparison between the Support Vector Machine, K-Nearest Neighbors, and Random Trees classifiers are presented. An open-source tool was developed to annotate and classify the objects and evaluate their classification performance. The proposed method efficiently segments and classifies the structures in the scene using a real dataset acquired by an underwater vehicle in a harbor area. Experimental results demonstrate the robustness and accuracy of the method described in this paper. PMID:28961163
An application to pulmonary emphysema classification based on model of texton learning by sparse representation

NASA Astrophysics Data System (ADS)

Zhang, Min; Zhou, Xiangrong; Goshima, Satoshi; Chen, Huayue; Muramatsu, Chisako; Hara, Takeshi; Yokoyama, Ryojiro; Kanematsu, Masayuki; Fujita, Hiroshi

2012-03-01

We aim at using a new texton based texture classification method in the classification of pulmonary emphysema in computed tomography (CT) images of the lungs. Different from conventional computer-aided diagnosis (CAD) pulmonary emphysema classification methods, in this paper, firstly, the dictionary of texton is learned via applying sparse representation(SR) to image patches in the training dataset. Then the SR coefficients of the test images over the dictionary are used to construct the histograms for texture presentations. Finally, classification is performed by using a nearest neighbor classifier with a histogram dissimilarity measure as distance. The proposed approach is tested on 3840 annotated regions of interest consisting of normal tissue and mild, moderate and severe pulmonary emphysema of three subtypes. The performance of the proposed system, with an accuracy of about 88%, is comparably higher than state of the art method based on the basic rotation invariant local binary pattern histograms and the texture classification method based on texton learning by k-means, which performs almost the best among other approaches in the literature.
Multispectral imaging burn wound tissue classification system: a comparison of test accuracies between several common machine learning algorithms

NASA Astrophysics Data System (ADS)

Squiers, John J.; Li, Weizhi; King, Darlene R.; Mo, Weirong; Zhang, Xu; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.

2016-03-01

The clinical judgment of expert burn surgeons is currently the standard on which diagnostic and therapeutic decisionmaking regarding burn injuries is based. Multispectral imaging (MSI) has the potential to increase the accuracy of burn depth assessment and the intraoperative identification of viable wound bed during surgical debridement of burn injuries. A highly accurate classification model must be developed using machine-learning techniques in order to translate MSI data into clinically-relevant information. An animal burn model was developed to build an MSI training database and to study the burn tissue classification ability of several models trained via common machine-learning algorithms. The algorithms tested, from least to most complex, were: K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), weighted linear discriminant analysis (W-LDA), quadratic discriminant analysis (QDA), ensemble linear discriminant analysis (EN-LDA), ensemble K-nearest neighbors (EN-KNN), and ensemble decision tree (EN-DT). After the ground-truth database of six tissue types (healthy skin, wound bed, blood, hyperemia, partial injury, full injury) was generated by histopathological analysis, we used 10-fold cross validation to compare the algorithms' performances based on their accuracies in classifying data against the ground truth, and each algorithm was tested 100 times. The mean test accuracy of the algorithms were KNN 68.3%, DT 61.5%, LDA 70.5%, W-LDA 68.1%, QDA 68.9%, EN-LDA 56.8%, EN-KNN 49.7%, and EN-DT 36.5%. LDA had the highest test accuracy, reflecting the bias-variance tradeoff over the range of complexities inherent to the algorithms tested. Several algorithms were able to match the current standard in burn tissue classification, the clinical judgment of expert burn surgeons. These results will guide further development of an MSI burn tissue classification system. Given that there are few surgeons and facilities specializing in burn care
Nearest neighbor imputation using spatial–temporal correlations in wireless sensor networks

PubMed Central

Li, YuanYuan; Parker, Lynne E.

2016-01-01

Missing data is common in Wireless Sensor Networks (WSNs), especially with multi-hop communications. There are many reasons for this phenomenon, such as unstable wireless communications, synchronization issues, and unreliable sensors. Unfortunately, missing data creates a number of problems for WSNs. First, since most sensor nodes in the network are battery-powered, it is too expensive to have the nodes retransmit missing data across the network. Data re-transmission may also cause time delays when detecting abnormal changes in an environment. Furthermore, localized reasoning techniques on sensor nodes (such as machine learning algorithms to classify states of the environment) are generally not robust enough to handle missing data. Since sensor data collected by a WSN is generally correlated in time and space, we illustrate how replacing missing sensor values with spatially and temporally correlated sensor values can significantly improve the network’s performance. However, our studies show that it is important to determine which nodes are spatially and temporally correlated with each other. Simple techniques based on Euclidean distance are not sufficient for complex environmental deployments. Thus, we have developed a novel Nearest Neighbor (NN) imputation method that estimates missing data in WSNs by learning spatial and temporal correlations between sensor nodes. To improve the search time, we utilize a kd-tree data structure, which is a non-parametric, data-driven binary search tree. Instead of using traditional mean and variance of each dimension for kd-tree construction, and Euclidean distance for kd-tree search, we use weighted variances and weighted Euclidean distances based on measured percentages of missing data. We have evaluated this approach through experiments on sensor data from a volcano dataset collected by a network of Crossbow motes, as well as experiments using sensor data from a highway traffic monitoring application. Our experimental results
Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.

PubMed

Hernandez, Troy; Yang, Jie

2016-10-01

The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques. We present a new family of alignment-free vectorizations of the genome, the generalized vector, that maintains the speed of existing alignment-free methods while outperforming all available methods. This new alignment-free vectorization uses the frequency of genomic words (k-mers), as is done in the composition vector, and incorporates descriptive statistics of those k-mers' positional information, as inspired by the natural vector. We analyze five different characterizations of genome similarity using k-nearest neighbor classification and evaluate these on two collections of viruses totaling over 10,000 viruses. We show that our proposed method performs better than, or as well as, other methods at every level of the phylogenetic hierarchy. The data and R code is available upon request.
Comparisons and Selections of Features and Classifiers for Short Text Classification

NASA Astrophysics Data System (ADS)

Wang, Ye; Zhou, Zhi; Jin, Shan; Liu, Debin; Lu, Mi

2017-10-01

Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naive Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analysed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.
Mapping growing stock volume and forest live biomass: a case study of the Polissya region of Ukraine

NASA Astrophysics Data System (ADS)

Bilous, Andrii; Myroniuk, Viktor; Holiaka, Dmytrii; Bilous, Svitlana; See, Linda; Schepaschenko, Dmitry

2017-10-01

Forest inventory and biomass mapping are important tasks that require inputs from multiple data sources. In this paper we implement two methods for the Ukrainian region of Polissya: random forest (RF) for tree species prediction and k-nearest neighbors (k-NN) for growing stock volume and biomass mapping. We examined the suitability of the five-band RapidEye satellite image to predict the distribution of six tree species. The accuracy of RF is quite high: ~99% for forest/non-forest mask and 89% for tree species prediction. Our results demonstrate that inclusion of elevation as a predictor variable in the RF model improved the performance of tree species classification. We evaluated different distance metrics for the k-NN method, including Euclidean or Mahalanobis distance, most similar neighbor (MSN), gradient nearest neighbor, and independent component analysis. The MSN with the four nearest neighbors (k = 4) is the most precise (according to the root-mean-square deviation) for predicting forest attributes across the study area. The k-NN method allowed us to estimate growing stock volume with an accuracy of 3 m3 ha-1 and for live biomass of about 2 t ha-1 over the study area.

An Ant Colony Optimization Based Feature Selection for Web Page Classification

PubMed Central

2014-01-01

The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods. PMID:25136678
An ant colony optimization based feature selection for web page classification.

PubMed

Saraç, Esra; Özel, Selma Ayşe

2014-01-01

The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.
Deep feature extraction and combination for synthetic aperture radar target classification

NASA Astrophysics Data System (ADS)

Amrani, Moussa; Jiang, Feng

2017-10-01

Feature extraction has always been a difficult problem in the classification performance of synthetic aperture radar automatic target recognition (SAR-ATR). It is very important to select discriminative features to train a classifier, which is a prerequisite. Inspired by the great success of convolutional neural network (CNN), we address the problem of SAR target classification by proposing a feature extraction method, which takes advantage of exploiting the extracted deep features from CNNs on SAR images to introduce more powerful discriminative features and robust representation ability for them. First, the pretrained VGG-S net is fine-tuned on moving and stationary target acquisition and recognition (MSTAR) public release database. Second, after a simple preprocessing is performed, the fine-tuned network is used as a fixed feature extractor to extract deep features from the processed SAR images. Third, the extracted deep features are fused by using a traditional concatenation and a discriminant correlation analysis algorithm. Finally, for target classification, K-nearest neighbors algorithm based on LogDet divergence-based metric learning triplet constraints is adopted as a baseline classifier. Experiments on MSTAR are conducted, and the classification accuracy results demonstrate that the proposed method outperforms the state-of-the-art methods.
Brain tissue segmentation in 4D CT using voxel classification

NASA Astrophysics Data System (ADS)

van den Boom, R.; Oei, M. T. H.; Lafebre, S.; Oostveen, L. J.; Meijer, F. J. A.; Steens, S. C. A.; Prokop, M.; van Ginneken, B.; Manniesing, R.

2012-02-01

A method is proposed to segment anatomical regions of the brain from 4D computer tomography (CT) patient data. The method consists of a three step voxel classification scheme, each step focusing on structures that are increasingly difficult to segment. The first step classifies air and bone, the second step classifies vessels and the third step classifies white matter, gray matter and cerebrospinal fluid. As features the time averaged intensity value and the temporal intensity change value were used. In each step, a k-Nearest-Neighbor classifier was used to classify the voxels. Training data was obtained by placing regions of interest in reconstructed 3D image data. The method has been applied to ten 4D CT cerebral patient data. A leave-one-out experiment showed consistent and accurate segmentation results.
A semi-supervised classification algorithm using the TAD-derived background as training data

NASA Astrophysics Data System (ADS)

Fan, Lei; Ambeau, Brittany; Messinger, David W.

2013-05-01

In general, spectral image classification algorithms fall into one of two categories: supervised and unsupervised. In unsupervised approaches, the algorithm automatically identifies clusters in the data without a priori information about those clusters (except perhaps the expected number of them). Supervised approaches require an analyst to identify training data to learn the characteristics of the clusters such that they can then classify all other pixels into one of the pre-defined groups. The classification algorithm presented here is a semi-supervised approach based on the Topological Anomaly Detection (TAD) algorithm. The TAD algorithm defines background components based on a mutual k-Nearest Neighbor graph model of the data, along with a spectral connected components analysis. Here, the largest components produced by TAD are used as regions of interest (ROI's),or training data for a supervised classification scheme. By combining those ROI's with a Gaussian Maximum Likelihood (GML) or a Minimum Distance to the Mean (MDM) algorithm, we are able to achieve a semi supervised classification method. We test this classification algorithm against data collected by the HyMAP sensor over the Cooke City, MT area and University of Pavia scene.
Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

PubMed

Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi

2017-11-02

Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.
Evidence of codon usage in the nearest neighbor spacing distribution of bases in bacterial genomes

NASA Astrophysics Data System (ADS)

Higareda, M. F.; Geiger, O.; Mendoza, L.; Méndez-Sánchez, R. A.

2012-02-01

Statistical analysis of whole genomic sequences usually assumes a homogeneous nucleotide density throughout the genome, an assumption that has been proved incorrect for several organisms since the nucleotide density is only locally homogeneous. To avoid giving a single numerical value to this variable property, we propose the use of spectral statistics, which characterizes the density of nucleotides as a function of its position in the genome. We show that the cumulative density of bases in bacterial genomes can be separated into an average (or secular) plus a fluctuating part. Bacterial genomes can be divided into two groups according to the qualitative description of their secular part: linear and piecewise linear. These two groups of genomes show different properties when their nucleotide spacing distribution is studied. In order to analyze genomes having a variable nucleotide density, statistically, the use of unfolding is necessary, i.e., to get a separation between the secular part and the fluctuations. The unfolding allows an adequate comparison with the statistical properties of other genomes. With this methodology, four genomes were analyzed Burkholderia, Bacillus, Clostridium and Corynebacterium. Interestingly, the nearest neighbor spacing distributions or detrended distance distributions are very similar for species within the same genus but they are very different for species from different genera. This difference can be attributed to the difference in the codon usage.
New wideband radar target classification method based on neural learning and modified Euclidean metric

NASA Astrophysics Data System (ADS)

Jiang, Yicheng; Cheng, Ping; Ou, Yangkui

2001-09-01

A new method for target classification of high-range resolution radar is proposed. It tries to use neural learning to obtain invariant subclass features of training range profiles. A modified Euclidean metric based on the Box-Cox transformation technique is investigated for Nearest Neighbor target classification improvement. The classification experiments using real radar data of three different aircraft have demonstrated that classification error can reduce 8% if this method proposed in this paper is chosen instead of the conventional method. The results of this paper have shown that by choosing an optimized metric, it is indeed possible to reduce the classification error without increasing the number of samples.
yaImpute: An R package for kNN imputation

Treesearch

Nicholas L. Crookston; Andrew O. Finley

2008-01-01

This article introduces yaImpute, an R package for nearest neighbor search and imputation. Although nearest neighbor imputation is used in a host of disciplines, the methods implemented in the yaImpute package are tailored to imputation-based forest attribute estimation and mapping. The impetus to writing the yaImpute is a growing interest in nearest neighbor...
PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer

Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models tomore » curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.« less
Photometric Supernova Classification with Machine Learning

NASA Astrophysics Data System (ADS)

Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.

2016-08-01

Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.
Effect of the next-nearest-neighbor hopping on the charge collective modes in the paramagnetic phase of the Hubbard model

NASA Astrophysics Data System (ADS)

Dao, Vu Hung; Frésard, Raymond

2017-10-01

The charge dynamical response function of the t-t'-U Hubbard model is investigated on the square lattice in the thermodynamical limit. The correlation function is calculated from Gaussian fluctuations around the paramagnetic saddle-point within the Kotliar and Ruckenstein slave-boson representation. The next-nearest-neighbor hopping only slightly affects the renormalization of the quasiparticle mass. In contrast a negative t'/t notably decreases (increases) their velocity, and hence the zero-sound velocity, at positive (negative) doping. For low (high) density n ≲ 0.5 (n ≳ 1.5) we find that it enhances (reduces) the damping of the zero-sound mode. Furthermore it softens (hardens) the upper-Hubbard-band collective mode at positive (negative) doping. It is also shown that our results differ markedly from the random-phase approximation in the strong-coupling limit, even at high doping, while they compare favorably with existing quantum Monte Carlo numerical simulations.
Portable Language-Independent Adaptive Translation from OCR. Phase 1

DTIC Science & Technology

2009-04-01

including brute-force k-Nearest Neighbors ( kNN ), fast approximate kNN using hashed k-d trees, classification and regression trees, and locality...achieved by refinements in ground-truthing protocols. Recent algorithmic improvements to our approximate kNN classifier using hashed k-D trees allows...recent years discriminative training has been shown to outperform phonetic HMMs estimated using ML for speech recognition. Standard ML estimation
Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations.

PubMed

Fabelo, Himar; Ortega, Samuel; Ravi, Daniele; Kiran, B Ravi; Sosa, Coralia; Bulters, Diederik; Callicó, Gustavo M; Bulstrode, Harry; Szolna, Adam; Piñeiro, Juan F; Kabwama, Silvester; Madroñal, Daniel; Lazcano, Raquel; J-O'Shanahan, Aruma; Bisshopp, Sara; Hernández, María; Báez, Abelardo; Yang, Guang-Zhong; Stanciulescu, Bogdan; Salvador, Rubén; Juárez, Eduardo; Sarmiento, Roberto

2018-01-01

Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries during surgery is challenging. Hyperspectral imaging is a non-contact, non-ionizing and non-invasive technique suitable for medical diagnosis. This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor. The algorithm proposed in this study to approach an efficient solution consists of a hybrid framework that combines both supervised and unsupervised machine learning methods. Firstly, a supervised pixel-wise classification using a Support Vector Machine classifier is performed. The generated classification map is spatially homogenized using a one-band representation of the HS cube, employing the Fixed Reference t-Stochastic Neighbors Embedding dimensional reduction algorithm, and performing a K-Nearest Neighbors filtering. The information generated by the supervised stage is combined with a segmentation map obtained via unsupervised clustering employing a Hierarchical K-Means algorithm. The fusion is performed using a majority voting approach that associates each cluster with a certain class. To evaluate the proposed approach, five hyperspectral images of surface of the brain affected by glioblastoma tumor in vivo from five different patients have been used. The final classification maps obtained have been analyzed and validated by specialists. These preliminary results are promising
Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations

PubMed Central

Kabwama, Silvester; Madroñal, Daniel; Lazcano, Raquel; J-O’Shanahan, Aruma; Bisshopp, Sara; Hernández, María; Báez, Abelardo; Yang, Guang-Zhong; Stanciulescu, Bogdan; Salvador, Rubén; Juárez, Eduardo; Sarmiento, Roberto

2018-01-01

Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries during surgery is challenging. Hyperspectral imaging is a non-contact, non-ionizing and non-invasive technique suitable for medical diagnosis. This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor. The algorithm proposed in this study to approach an efficient solution consists of a hybrid framework that combines both supervised and unsupervised machine learning methods. Firstly, a supervised pixel-wise classification using a Support Vector Machine classifier is performed. The generated classification map is spatially homogenized using a one-band representation of the HS cube, employing the Fixed Reference t-Stochastic Neighbors Embedding dimensional reduction algorithm, and performing a K-Nearest Neighbors filtering. The information generated by the supervised stage is combined with a segmentation map obtained via unsupervised clustering employing a Hierarchical K-Means algorithm. The fusion is performed using a majority voting approach that associates each cluster with a certain class. To evaluate the proposed approach, five hyperspectral images of surface of the brain affected by glioblastoma tumor in vivo from five different patients have been used. The final classification maps obtained have been analyzed and validated by specialists. These preliminary results are promising
Primary mass discrimination of high energy cosmic rays using PNN and k-NN methods

NASA Astrophysics Data System (ADS)

Rastegarzadeh, G.; Nemati, M.

2018-02-01

Probabilistic neural network (PNN) and k-Nearest Neighbors (k-NN) methods are widely used data classification techniques. In this paper, these two methods have been used to classify the Extensive Air Shower (EAS) data sets which were simulated using the CORSIKA code for three primary cosmic rays. The primaries are proton, oxygen and iron nuclei at energies of 100 TeV-10 PeV. This study is performed in the following of the investigations into the primary cosmic ray mass sensitive observables. We propose a new approach for measuring the mass sensitive observables of EAS in order to improve the primary mass separation. In this work, the EAS observables measurement has performed locally instead of total measurements. Also the relationships between the included number of observables in the classification methods and the prediction accuracy have been investigated. We have shown that the local measurements and inclusion of more mass sensitive observables in the classification processes can improve the classifying quality and also we have shown that muons and electrons energy density can be considered as primary mass sensitive observables in primary mass classification. Also it must be noted that this study is performed for Tehran observation level without considering the details of any certain EAS detection array.
A classification scheme for edge-localized modes based on their probability distributions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shabbir, A., E-mail: aqsa.shabbir@ugent.be; Max Planck Institute for Plasma Physics, D-85748 Garching; Hornung, G.

We present here an automated classification scheme which is particularly well suited to scenarios where the parameters have significant uncertainties or are stochastic quantities. To this end, the parameters are modeled with probability distributions in a metric space and classification is conducted using the notion of nearest neighbors. The presented framework is then applied to the classification of type I and type III edge-localized modes (ELMs) from a set of carbon-wall plasmas at JET. This provides a fast, standardized classification of ELM types which is expected to significantly reduce the effort of ELM experts in identifying ELM types. Further, themore » classification scheme is general and can be applied to various other plasma phenomena as well.« less
Consensus Classification Using Non-Optimized Classifiers.

PubMed

Brownfield, Brett; Lemos, Tony; Kalivas, John H

2018-04-03

Classifying samples into categories is a common problem in analytical chemistry and other fields. Classification is usually based on only one method, but numerous classifiers are available with some being complex, such as neural networks, and others are simple, such as k nearest neighbors. Regardless, most classification schemes require optimization of one or more tuning parameters for best classification accuracy, sensitivity, and specificity. A process not requiring exact selection of tuning parameter values would be useful. To improve classification, several ensemble approaches have been used in past work to combine classification results from multiple optimized single classifiers. The collection of classifications for a particular sample are then combined by a fusion process such as majority vote to form the final classification. Presented in this Article is a method to classify a sample by combining multiple classification methods without specifically classifying the sample by each method, that is, the classification methods are not optimized. The approach is demonstrated on three analytical data sets. The first is a beer authentication set with samples measured on five instruments, allowing fusion of multiple instruments by three ways. The second data set is composed of textile samples from three classes based on Raman spectra. This data set is used to demonstrate the ability to classify simultaneously with different data preprocessing strategies, thereby reducing the need to determine the ideal preprocessing method, a common prerequisite for accurate classification. The third data set contains three wine cultivars for three classes measured at 13 unique chemical and physical variables. In all cases, fusion of nonoptimized classifiers improves classification. Also presented are atypical uses of Procrustes analysis and extended inverted signal correction (EISC) for distinguishing sample similarities to respective classes.
DichroMatch at the protein circular dichroism data bank (DM@PCDDB): A web-based tool for identifying protein nearest neighbors using circular dichroism spectroscopy.

PubMed

Whitmore, Lee; Mavridis, Lazaros; Wallace, B A; Janes, Robert W

2018-01-01

Circular dichroism spectroscopy is a well-used, but simple method in structural biology for providing information on the secondary structure and folds of proteins. DichroMatch (DM@PCDDB) is an online tool that is newly available in the Protein Circular Dichroism Data Bank (PCDDB), which takes advantage of the wealth of spectral and metadata deposited therein, to enable identification of spectral nearest neighbors of a query protein based on four different methods of spectral matching. DM@PCDDB can potentially provide novel information about structural relationships between proteins and can be used in comparison studies of protein homologs and orthologs. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Characterization of 3D Voronoi Tessellation Nearest Neighbor Lipid Shells Provides Atomistic Lipid Disruption Profile of Protein Containing Lipid Membranes

PubMed Central

Cheng, Sara Y.; Duong, Hai V.; Compton, Campbell; Vaughn, Mark W.; Nguyen, Hoa; Cheng, Kwan H.

2015-01-01

Quantifying protein-induced lipid disruptions at the atomistic level is a challenging problem in membrane biophysics. Here we propose a novel 3D Voronoi tessellation nearest-atom-neighbor shell method to classify and characterize lipid domains into discrete concentric lipid shells surrounding membrane proteins in structurally heterogeneous lipid membranes. This method needs only the coordinates of the system and is independent of force fields and simulation conditions. As a proof-of-principle, we use this multiple lipid shell method to analyze the lipid disruption profiles of three simulated membrane systems: phosphatidylcholine, phosphatidylcholine/cholesterol, and beta-amyloid/phosphatidylcholine/cholesterol. We observed different atomic volume disruption mechanisms due to cholesterol and beta-amyloid Additionally, several lipid fractional groups and lipid-interfacial water did not converge to their control values with increasing distance or shell order from the protein. This volume divergent behavior was confirmed by bilayer thickness and chain orientational order calculations. Our method can also be used to analyze high-resolution structural experimental data. PMID:25637891

Activity Recognition in Egocentric video using SVM, kNN and Combined SVMkNN Classifiers

NASA Astrophysics Data System (ADS)

Sanal Kumar, K. P.; Bhavani, R., Dr.

2017-08-01

Egocentric vision is a unique perspective in computer vision which is human centric. The recognition of egocentric actions is a challenging task which helps in assisting elderly people, disabled patients and so on. In this work, life logging activity videos are taken as input. There are 2 categories, first one is the top level and second one is second level. Here, the recognition is done using the features like Histogram of Oriented Gradients (HOG), Motion Boundary Histogram (MBH) and Trajectory. The features are fused together and it acts as a single feature. The extracted features are reduced using Principal Component Analysis (PCA). The features that are reduced are provided as input to the classifiers like Support Vector Machine (SVM), k nearest neighbor (kNN) and combined Support Vector Machine (SVM) and k Nearest Neighbor (kNN) (combined SVMkNN). These classifiers are evaluated and the combined SVMkNN provided better results than other classifiers in the literature.
Automated identification of Monogeneans using digital image processing and K-nearest neighbour approaches.

PubMed

Yousef Kalafi, Elham; Tan, Wooi Boon; Town, Christopher; Dhillon, Sarinder Kaur

2016-12-22

Monogeneans are flatworms (Platyhelminthes) that are primarily found on gills and skin of fishes. Monogenean parasites have attachment appendages at their haptoral regions that help them to move about the body surface and feed on skin and gill debris. Haptoral attachment organs consist of sclerotized hard parts such as hooks, anchors and marginal hooks. Monogenean species are differentiated based on their haptoral bars, anchors, marginal hooks, reproductive parts' (male and female copulatory organs) morphological characters and soft anatomical parts. The complex structure of these diagnostic organs and also their overlapping in microscopic digital images are impediments for developing fully automated identification system for monogeneans (LNCS 7666:256-263, 2012), (ISDA; 457-462, 2011), (J Zoolog Syst Evol Res 52(2): 95-99. 2013;). In this study images of hard parts of the haptoral organs such as bars and anchors are used to develop a fully automated identification technique for monogenean species identification by implementing image processing techniques and machine learning methods. Images of four monogenean species namely Sinodiplectanotrema malayanus, Trianchoratus pahangensis, Metahaliotrema mizellei and Metahaliotrema sp. (undescribed) were used to develop an automated technique for identification. K-nearest neighbour (KNN) was applied to classify the monogenean specimens based on the extracted features. 50% of the dataset was used for training and the other 50% was used as testing for system evaluation. Our approach demonstrated overall classification accuracy of 90%. In this study Leave One Out (LOO) cross validation is used for validation of our system and the accuracy is 91.25%. The methods presented in this study facilitate fast and accurate fully automated classification of monogeneans at the species level. In future studies more classes will be included in the model, the time to capture the monogenean images will be reduced and improvements in
Design of a hybrid model for cardiac arrhythmia classification based on Daubechies wavelet transform.

PubMed

Rajagopal, Rekha; Ranganathan, Vidhyapriya

2018-06-05

Automation in cardiac arrhythmia classification helps medical professionals make accurate decisions about the patient's health. The aim of this work was to design a hybrid classification model to classify cardiac arrhythmias. The design phase of the classification model comprises the following stages: preprocessing of the cardiac signal by eliminating detail coefficients that contain noise, feature extraction through Daubechies wavelet transform, and arrhythmia classification using a collaborative decision from the K nearest neighbor classifier (KNN) and a support vector machine (SVM). The proposed model is able to classify 5 arrhythmia classes as per the ANSI/AAMI EC57: 1998 classification standard. Level 1 of the proposed model involves classification using the KNN and the classifier is trained with examples from all classes. Level 2 involves classification using an SVM and is trained specifically to classify overlapped classes. The final classification of a test heartbeat pertaining to a particular class is done using the proposed KNN/SVM hybrid model. The experimental results demonstrated that the average sensitivity of the proposed model was 92.56%, the average specificity 99.35%, the average positive predictive value 98.13%, the average F-score 94.5%, and the average accuracy 99.78%. The results obtained using the proposed model were compared with the results of discriminant, tree, and KNN classifiers. The proposed model is able to achieve a high classification accuracy.
Analysis and Identification of Aptamer-Compound Interactions with a Maximum Relevance Minimum Redundancy and Nearest Neighbor Algorithm

PubMed Central

Wang, ShaoPeng; Zhang, Yu-Hang; Lu, Jing; Cui, Weiren; Hu, Jerry; Cai, Yu-Dong

2016-01-01

The development of biochemistry and molecular biology has revealed an increasingly important role of compounds in several biological processes. Like the aptamer-protein interaction, aptamer-compound interaction attracts increasing attention. However, it is time-consuming to select proper aptamers against compounds using traditional methods, such as exponential enrichment. Thus, there is an urgent need to design effective computational methods for searching effective aptamers against compounds. This study attempted to extract important features for aptamer-compound interactions using feature selection methods, such as Maximum Relevance Minimum Redundancy, as well as incremental feature selection. Each aptamer-compound pair was represented by properties derived from the aptamer and compound, including frequencies of single nucleotides and dinucleotides for the aptamer, as well as the constitutional, electrostatic, quantum-chemical, and space conformational descriptors of the compounds. As a result, some important features were obtained. To confirm the importance of the obtained features, we further discussed the associations between them and aptamer-compound interactions. Simultaneously, an optimal prediction model based on the nearest neighbor algorithm was built to identify aptamer-compound interactions, which has the potential to be a useful tool for the identification of novel aptamer-compound interactions. The program is available upon the request. PMID:26955638
Analysis and Identification of Aptamer-Compound Interactions with a Maximum Relevance Minimum Redundancy and Nearest Neighbor Algorithm.

PubMed

Wang, ShaoPeng; Zhang, Yu-Hang; Lu, Jing; Cui, Weiren; Hu, Jerry; Cai, Yu-Dong

2016-01-01

The development of biochemistry and molecular biology has revealed an increasingly important role of compounds in several biological processes. Like the aptamer-protein interaction, aptamer-compound interaction attracts increasing attention. However, it is time-consuming to select proper aptamers against compounds using traditional methods, such as exponential enrichment. Thus, there is an urgent need to design effective computational methods for searching effective aptamers against compounds. This study attempted to extract important features for aptamer-compound interactions using feature selection methods, such as Maximum Relevance Minimum Redundancy, as well as incremental feature selection. Each aptamer-compound pair was represented by properties derived from the aptamer and compound, including frequencies of single nucleotides and dinucleotides for the aptamer, as well as the constitutional, electrostatic, quantum-chemical, and space conformational descriptors of the compounds. As a result, some important features were obtained. To confirm the importance of the obtained features, we further discussed the associations between them and aptamer-compound interactions. Simultaneously, an optimal prediction model based on the nearest neighbor algorithm was built to identify aptamer-compound interactions, which has the potential to be a useful tool for the identification of novel aptamer-compound interactions. The program is available upon the request.
Dissimilarity representations in lung parenchyma classification

NASA Astrophysics Data System (ADS)

Sørensen, Lauge; de Bruijne, Marleen

2009-02-01

A good problem representation is important for a pattern recognition system to be successful. The traditional approach to statistical pattern recognition is feature representation. More specifically, objects are represented by a number of features in a feature vector space, and classifiers are built in this representation. This is also the general trend in lung parenchyma classification in computed tomography (CT) images, where the features often are measures on feature histograms. Instead, we propose to build normal density based classifiers in dissimilarity representations for lung parenchyma classification. This allows for the classifiers to work on dissimilarities between objects, which might be a more natural way of representing lung parenchyma. In this context, dissimilarity is defined between CT regions of interest (ROI)s. ROIs are represented by their CT attenuation histogram and ROI dissimilarity is defined as a histogram dissimilarity measure between the attenuation histograms. In this setting, the full histograms are utilized according to the chosen histogram dissimilarity measure. We apply this idea to classification of different emphysema patterns as well as normal, healthy tissue. Two dissimilarity representation approaches as well as different histogram dissimilarity measures are considered. The approaches are evaluated on a set of 168 CT ROIs using normal density based classifiers all showing good performance. Compared to using histogram dissimilarity directly as distance in a emph{k} nearest neighbor classifier, which achieves a classification accuracy of 92.9%, the best dissimilarity representation based classifier is significantly better with a classification accuracy of 97.0% (text{emph{p" border="0" class="imgtopleft"> = 0.046).
Gender classification from face images by using local binary pattern and gray-level co-occurrence matrix

NASA Astrophysics Data System (ADS)

Uzbaş, Betül; Arslan, Ahmet

2018-04-01

Gender is an important step for human computer interactive processes and identification. Human face image is one of the important sources to determine gender. In the present study, gender classification is performed automatically from facial images. In order to classify gender, we propose a combination of features that have been extracted face, eye and lip regions by using a hybrid method of Local Binary Pattern and Gray-Level Co-Occurrence Matrix. The features have been extracted from automatically obtained face, eye and lip regions. All of the extracted features have been combined and given as input parameters to classification methods (Support Vector Machine, Artificial Neural Networks, Naive Bayes and k-Nearest Neighbor methods) for gender classification. The Nottingham Scan face database that consists of the frontal face images of 100 people (50 male and 50 female) is used for this purpose. As the result of the experimental studies, the highest success rate has been achieved as 98% by using Support Vector Machine. The experimental results illustrate the efficacy of our proposed method.
Nearest Neighbor Classification Using a Density Sensitive Distance Measurement

DTIC Science & Technology

2009-09-01

both the proposed density sensitive distance measurement and Euclidean distance are compared on the Wisconsin Diagnostic Breast Cancer dataset and...proposed density sensitive distance measurement and Euclidean distance are compared on the Wisconsin Diagnostic Breast Cancer dataset and the MNIST...35 1. The Wisconsin Diagnostic Breast Cancer (WDBC) Dataset..........35 2. The
3D texture analysis for classification of second harmonic generation images of human ovarian cancer

NASA Astrophysics Data System (ADS)

Wen, Bruce; Campbell, Kirby R.; Tilbury, Karissa; Nadiarnykh, Oleg; Brewer, Molly A.; Patankar, Manish; Singh, Vikas; Eliceiri, Kevin. W.; Campagnola, Paul J.

2016-10-01

Remodeling of the collagen architecture in the extracellular matrix (ECM) has been implicated in ovarian cancer. To quantify these alterations we implemented a form of 3D texture analysis to delineate the fibrillar morphology observed in 3D Second Harmonic Generation (SHG) microscopy image data of normal (1) and high risk (2) ovarian stroma, benign ovarian tumors (3), low grade (4) and high grade (5) serous tumors, and endometrioid tumors (6). We developed a tailored set of 3D filters which extract textural features in the 3D image sets to build (or learn) statistical models of each tissue class. By applying k-nearest neighbor classification using these learned models, we achieved 83-91% accuracies for the six classes. The 3D method outperformed the analogous 2D classification on the same tissues, where we suggest this is due the increased information content. This classification based on ECM structural changes will complement conventional classification based on genetic profiles and can serve as an additional biomarker. Moreover, the texture analysis algorithm is quite general, as it does not rely on single morphological metrics such as fiber alignment, length, and width but their combined convolution with a customizable basis set.
Optimizing classification performance in an object-based very-high-resolution land use-land cover urban application

NASA Astrophysics Data System (ADS)

Georganos, Stefanos; Grippa, Tais; Vanhuysse, Sabine; Lennert, Moritz; Shimoni, Michal; Wolff, Eléonore

2017-10-01

This study evaluates the impact of three Feature Selection (FS) algorithms in an Object Based Image Analysis (OBIA) framework for Very-High-Resolution (VHR) Land Use-Land Cover (LULC) classification. The three selected FS algorithms, Correlation Based Selection (CFS), Mean Decrease in Accuracy (MDA) and Random Forest (RF) based Recursive Feature Elimination (RFE), were tested on Support Vector Machine (SVM), K-Nearest Neighbor, and Random Forest (RF) classifiers. The results demonstrate that the accuracy of SVM and KNN classifiers are the most sensitive to FS. The RF appeared to be more robust to high dimensionality, although a significant increase in accuracy was found by using the RFE method. In terms of classification accuracy, SVM performed the best using FS, followed by RF and KNN. Finally, only a small number of features is needed to achieve the highest performance using each classifier. This study emphasizes the benefits of rigorous FS for maximizing performance, as well as for minimizing model complexity and interpretation.
A Novel Feature Level Fusion for Heart Rate Variability Classification Using Correntropy and Cauchy-Schwarz Divergence.

PubMed

Goshvarpour, Ateke; Goshvarpour, Atefeh

2018-04-30

Heart rate variability (HRV) analysis has become a widely used tool for monitoring pathological and psychological states in medical applications. In a typical classification problem, information fusion is a process whereby the effective combination of the data can achieve a more accurate system. The purpose of this article was to provide an accurate algorithm for classifying HRV signals in various psychological states. Therefore, a novel feature level fusion approach was proposed. First, using the theory of information, two similarity indicators of the signal were extracted, including correntropy and Cauchy-Schwarz divergence. Applying probabilistic neural network (PNN) and k-nearest neighbor (kNN), the performance of each index in the classification of meditators and non-meditators HRV signals was appraised. Then, three fusion rules, including division, product, and weighted sum rules were used to combine the information of both similarity measures. For the first time, we propose an algorithm to define the weights of each feature based on the statistical p-values. The performance of HRV classification using combined features was compared with the non-combined features. Totally, the accuracy of 100% was obtained for discriminating all states. The results showed the strong ability and proficiency of division and weighted sum rules in the improvement of the classifier accuracies.
Classification of acoustic emission signals using wavelets and Random Forests : Application to localized corrosion

NASA Astrophysics Data System (ADS)

Morizet, N.; Godin, N.; Tang, J.; Maillet, E.; Fregonese, M.; Normand, B.

2016-03-01

This paper aims to propose a novel approach to classify acoustic emission (AE) signals deriving from corrosion experiments, even if embedded into a noisy environment. To validate this new methodology, synthetic data are first used throughout an in-depth analysis, comparing Random Forests (RF) to the k-Nearest Neighbor (k-NN) algorithm. Moreover, a new evaluation tool called the alter-class matrix (ACM) is introduced to simulate different degrees of uncertainty on labeled data for supervised classification. Then, tests on real cases involving noise and crevice corrosion are conducted, by preprocessing the waveforms including wavelet denoising and extracting a rich set of features as input of the RF algorithm. To this end, a software called RF-CAM has been developed. Results show that this approach is very efficient on ground truth data and is also very promising on real data, especially for its reliability, performance and speed, which are serious criteria for the chemical industry.
Improving GPU-accelerated adaptive IDW interpolation algorithm using fast kNN search.

PubMed

Mei, Gang; Xu, Nengxiong; Xu, Liangliang

2016-01-01

This paper presents an efficient parallel Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm on modern Graphics Processing Unit (GPU). The presented algorithm is an improvement of our previous GPU-accelerated AIDW algorithm by adopting fast k-nearest neighbors (kNN) search. In AIDW, it needs to find several nearest neighboring data points for each interpolated point to adaptively determine the power parameter; and then the desired prediction value of the interpolated point is obtained by weighted interpolating using the power parameter. In this work, we develop a fast kNN search approach based on the space-partitioning data structure, even grid, to improve the previous GPU-accelerated AIDW algorithm. The improved algorithm is composed of the stages of kNN search and weighted interpolating. To evaluate the performance of the improved algorithm, we perform five groups of experimental tests. The experimental results indicate: (1) the improved algorithm can achieve a speedup of up to 1017 over the corresponding serial algorithm; (2) the improved algorithm is at least two times faster than our previous GPU-accelerated AIDW algorithm; and (3) the utilization of fast kNN search can significantly improve the computational efficiency of the entire GPU-accelerated AIDW algorithm.
Multiple Sparse Representations Classification

PubMed Central

Plenge, Esben; Klein, Stefan S.; Niessen, Wiro J.; Meijering, Erik

2015-01-01

Sparse representations classification (SRC) is a powerful technique for pixelwise classification of images and it is increasingly being used for a wide variety of image analysis tasks. The method uses sparse representation and learned redundant dictionaries to classify image pixels. In this empirical study we propose to further leverage the redundancy of the learned dictionaries to achieve a more accurate classifier. In conventional SRC, each image pixel is associated with a small patch surrounding it. Using these patches, a dictionary is trained for each class in a supervised fashion. Commonly, redundant/overcomplete dictionaries are trained and image patches are sparsely represented by a linear combination of only a few of the dictionary elements. Given a set of trained dictionaries, a new patch is sparse coded using each of them, and subsequently assigned to the class whose dictionary yields the minimum residual energy. We propose a generalization of this scheme. The method, which we call multiple sparse representations classification (mSRC), is based on the observation that an overcomplete, class specific dictionary is capable of generating multiple accurate and independent estimates of a patch belonging to the class. So instead of finding a single sparse representation of a patch for each dictionary, we find multiple, and the corresponding residual energies provides an enhanced statistic which is used to improve classification. We demonstrate the efficacy of mSRC for three example applications: pixelwise classification of texture images, lumen segmentation in carotid artery magnetic resonance imaging (MRI), and bifurcation point detection in carotid artery MRI. We compare our method with conventional SRC, K-nearest neighbor, and support vector machine classifiers. The results show that mSRC outperforms SRC and the other reference methods. In addition, we present an extensive evaluation of the effect of the main mSRC parameters: patch size, dictionary size, and
A new local-global approach for classification.

PubMed

Peres, R T; Pedreira, C E

2010-09-01

In this paper, we propose a new local-global pattern classification scheme that combines supervised and unsupervised approaches, taking advantage of both, local and global environments. We understand as global methods the ones concerned with the aim of constructing a model for the whole problem space using the totality of the available observations. Local methods focus into sub regions of the space, possibly using an appropriately selected subset of the sample. In the proposed method, the sample is first divided in local cells by using a Vector Quantization unsupervised algorithm, the LBG (Linde-Buzo-Gray). In a second stage, the generated assemblage of much easier problems is locally solved with a scheme inspired by Bayes' rule. Four classification methods were implemented for comparison purposes with the proposed scheme: Learning Vector Quantization (LVQ); Feedforward Neural Networks; Support Vector Machine (SVM) and k-Nearest Neighbors. These four methods and the proposed scheme were implemented in eleven datasets, two controlled experiments, plus nine public available datasets from the UCI repository. The proposed method has shown a quite competitive performance when compared to these classical and largely used classifiers. Our method is simple concerning understanding and implementation and is based on very intuitive concepts. Copyright 2010 Elsevier Ltd. All rights reserved.
Empirical Mode Decomposition and k-Nearest Embedding Vectors for Timely Analyses of Antibiotic Resistance Trends

PubMed Central

Teodoro, Douglas; Lovis, Christian

2013-01-01

Background Antibiotic resistance is a major worldwide public health concern. In clinical settings, timely antibiotic resistance information is key for care providers as it allows appropriate targeted treatment or improved empirical treatment when the specific results of the patient are not yet available. Objective To improve antibiotic resistance trend analysis algorithms by building a novel, fully data-driven forecasting method from the combination of trend extraction and machine learning models for enhanced biosurveillance systems. Methods We investigate a robust model for extraction and forecasting of antibiotic resistance trends using a decade of microbiology data. Our method consists of breaking down the resistance time series into independent oscillatory components via the empirical mode decomposition technique. The resulting waveforms describing intrinsic resistance trends serve as the input for the forecasting algorithm. The algorithm applies the delay coordinate embedding theorem together with the k-nearest neighbor framework to project mappings from past events into the future dimension and estimate the resistance levels. Results The algorithms that decompose the resistance time series and filter out high frequency components showed statistically significant performance improvements in comparison with a benchmark random walk model. We present further qualitative use-cases of antibiotic resistance trend extraction, where empirical mode decomposition was applied to highlight the specificities of the resistance trends. Conclusion The decomposition of the raw signal was found not only to yield valuable insight into the resistance evolution, but also to produce novel models of resistance forecasters with boosted prediction performance, which could be utilized as a complementary method in the analysis of antibiotic resistance trends. PMID:23637796
Smart BIT/TSMD Integration

DTIC Science & Technology

1991-12-01

user using the ’: knn ’ option in the do-scenario command line). An instance of the K-Nearest Neighbor object is first created and initialized before...Navigation Computer HF High Frequency ILS Instrument Landing System KNN K - Nearest Neighbor LRU Line Replaceable Unit MC Mission Computer MTCA...approaches have been investigated here, K-nearest Neighbors ( KNN ) and neural networks (NN). Both approaches require that previously classified examples of
K-Nearest Neighbors Relevance Annotation Model for Distance Education

ERIC Educational Resources Information Center

Ke, Xiao; Li, Shaozi; Cao, Donglin

2011-01-01

With the rapid development of Internet technologies, distance education has become a popular educational mode. In this paper, the authors propose an online image automatic annotation distance education system, which could effectively help children learn interrelations between image content and corresponding keywords. Image automatic annotation is…
Quantum soliton in 1D Heisenberg spin chains with Dzyaloshinsky-Moriya and next-nearest-neighbor interactions.

PubMed

Djoufack, Z I; Tala-Tebue, E; Nguenang, J P; Kenfack-Jiotsa, A

2016-10-01

We report in this work, an analytical study of quantum soliton in 1D Heisenberg spin chains with Dzyaloshinsky-Moriya Interaction (DMI) and Next-Nearest-Neighbor Interactions (NNNI). By means of the time-dependent Hartree approximation and the semi-discrete multiple-scale method, the equation of motion for the single-boson wave function is reduced to the nonlinear Schrödinger equation. It comes from this present study that the spectrum of the frequencies increases, its periodicity changes, in the presence of NNNI. The antisymmetric feature of the DMI was probed from the dispersion curve while changing the sign of the parameter controlling it. Five regions were identified in the dispersion spectrum, when the NNNI are taken into account instead of three as in the opposite case. In each of these regions, the quantum model can exhibit quantum stationary localized and stable bright or dark soliton solutions. In each region, we could set up quantum localized n-boson Hartree states as well as the analytical expression of their energy level, respectively. The accuracy of the analytical studies is confirmed by the excellent agreement with the numerical calculations, and it certifies the stability of the stationary quantum localized solitons solutions exhibited in each region. In addition, we found that the intensity of the localization of quantum localized n-boson Hartree states increases when the NNNI are considered. We also realized that the intensity of Hartree n-boson states corresponding to quantum discrete soliton states depend on the wave vector.
Classification of emotional states from electrocardiogram signals: a non-linear approach based on hurst

PubMed Central

2013-01-01

Background Identifying the emotional state is helpful in applications involving patients with autism and other intellectual disabilities; computer-based training, human computer interaction etc. Electrocardiogram (ECG) signals, being an activity of the autonomous nervous system (ANS), reflect the underlying true emotional state of a person. However, the performance of various methods developed so far lacks accuracy, and more robust methods need to be developed to identify the emotional pattern associated with ECG signals. Methods Emotional ECG data was obtained from sixty participants by inducing the six basic emotional states (happiness, sadness, fear, disgust, surprise and neutral) using audio-visual stimuli. The non-linear feature ‘Hurst’ was computed using Rescaled Range Statistics (RRS) and Finite Variance Scaling (FVS) methods. New Hurst features were proposed by combining the existing RRS and FVS methods with Higher Order Statistics (HOS). The features were then classified using four classifiers – Bayesian Classifier, Regression Tree, K- nearest neighbor and Fuzzy K-nearest neighbor. Seventy percent of the features were used for training and thirty percent for testing the algorithm. Results Analysis of Variance (ANOVA) conveyed that Hurst and the proposed features were statistically significant (p < 0.001). Hurst computed using RRS and FVS methods showed similar classification accuracy. The features obtained by combining FVS and HOS performed better with a maximum accuracy of 92.87% and 76.45% for classifying the six emotional states using random and subject independent validation respectively. Conclusions The results indicate that the combination of non-linear analysis and HOS tend to capture the finer emotional changes that can be seen in healthy ECG data. This work can be further fine tuned to develop a real time system. PMID:23680041

Application of texture analysis method for mammogram density classification

NASA Astrophysics Data System (ADS)

Nithya, R.; Santhi, B.

2017-07-01

Mammographic density is considered a major risk factor for developing breast cancer. This paper proposes an automated approach to classify breast tissue types in digital mammogram. The main objective of the proposed Computer-Aided Diagnosis (CAD) system is to investigate various feature extraction methods and classifiers to improve the diagnostic accuracy in mammogram density classification. Texture analysis methods are used to extract the features from the mammogram. Texture features are extracted by using histogram, Gray Level Co-Occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Difference Matrix (GLDM), Local Binary Pattern (LBP), Entropy, Discrete Wavelet Transform (DWT), Wavelet Packet Transform (WPT), Gabor transform and trace transform. These extracted features are selected using Analysis of Variance (ANOVA). The features selected by ANOVA are fed into the classifiers to characterize the mammogram into two-class (fatty/dense) and three-class (fatty/glandular/dense) breast density classification. This work has been carried out by using the mini-Mammographic Image Analysis Society (MIAS) database. Five classifiers are employed namely, Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). Experimental results show that ANN provides better performance than LDA, NB, KNN and SVM classifiers. The proposed methodology has achieved 97.5% accuracy for three-class and 99.37% for two-class density classification.
Classification of CT examinations for COPD visual severity analysis

NASA Astrophysics Data System (ADS)

Tan, Jun; Zheng, Bin; Wang, Xingwei; Pu, Jiantao; Gur, David; Sciurba, Frank C.; Leader, J. Ken

2012-03-01

In this study we present a computational method of CT examination classification into visual assessed emphysema severity. The visual severity categories ranged from 0 to 5 and were rated by an experienced radiologist. The six categories were none, trace, mild, moderate, severe and very severe. Lung segmentation was performed for every input image and all image features are extracted from the segmented lung only. We adopted a two-level feature representation method for the classification. Five gray level distribution statistics, six gray level co-occurrence matrix (GLCM), and eleven gray level run-length (GLRL) features were computed for each CT image depicted segment lung. Then we used wavelets decomposition to obtain the low- and high-frequency components of the input image, and again extract from the lung region six GLCM features and eleven GLRL features. Therefore our feature vector length is 56. The CT examinations were classified using the support vector machine (SVM) and k-nearest neighbors (KNN) and the traditional threshold (density mask) approach. The SVM classifier had the highest classification performance of all the methods with an overall sensitivity of 54.4% and a 69.6% sensitivity to discriminate "no" and "trace visually assessed emphysema. We believe this work may lead to an automated, objective method to categorically classify emphysema severity on CT exam.
Classification of interstitial lung disease patterns with topological texture features

NASA Astrophysics Data System (ADS)

Huber, Markus B.; Nagarajan, Mahesh; Leinsinger, Gerda; Ray, Lawrence A.; Wismüller, Axel

2010-03-01

Topological texture features were compared in their ability to classify morphological patterns known as 'honeycombing' that are considered indicative for the presence of fibrotic interstitial lung diseases in high-resolution computed tomography (HRCT) images. For 14 patients with known occurrence of honey-combing, a stack of 70 axial, lung kernel reconstructed images were acquired from HRCT chest exams. A set of 241 regions of interest of both healthy and pathological (89) lung tissue were identified by an experienced radiologist. Texture features were extracted using six properties calculated from gray-level co-occurrence matrices (GLCM), Minkowski Dimensions (MDs), and three Minkowski Functionals (MFs, e.g. MF.euler). A k-nearest-neighbor (k-NN) classifier and a Multilayer Radial Basis Functions Network (RBFN) were optimized in a 10-fold cross-validation for each texture vector, and the classification accuracy was calculated on independent test sets as a quantitative measure of automated tissue characterization. A Wilcoxon signed-rank test was used to compare two accuracy distributions and the significance thresholds were adjusted for multiple comparisons by the Bonferroni correction. The best classification results were obtained by the MF features, which performed significantly better than all the standard GLCM and MD features (p < 0.005) for both classifiers. The highest accuracy was found for MF.euler (97.5%, 96.6%; for the k-NN and RBFN classifier, respectively). The best standard texture features were the GLCM features 'homogeneity' (91.8%, 87.2%) and 'absolute value' (90.2%, 88.5%). The results indicate that advanced topological texture features can provide superior classification performance in computer-assisted diagnosis of interstitial lung diseases when compared to standard texture analysis methods.
Quasiclassical description of the nearest-neighbor hopping dc conduction via hydrogen-like donors in intermediately compensated GaAs crystals

NASA Astrophysics Data System (ADS)

Poklonski, N. A.; Vyrko, S. A.; Zabrodskii, A. G.

2010-08-01

Expressions for the pre-exponential factor σ3 and the thermal activation energy ɛ3 of hopping electric conductivity of electrons via hydrogen-like donors in n-type gallium arsenide are obtained in the quasiclassical approximation. Crystals with the donor concentration N and the acceptor concentration KN at the intermediate compensation ratio K (approximately from 0.25 to 0.75) are considered. We assume that the donors in the charge states (0) and (+1) and the acceptors in the charge state (-1) form a joint nonstoichiometric simple cubic 'sublattice' within the crystalline matrix. In such sublattice the distance between nearest impurity atoms is Rh = [(1 + K)N]-1/3 which is also the length of an electron hop between donors. To take into account orientational disorder of hops we assume that the impurity sublattice randomly and smoothly changes orientation inside a macroscopic sample. Values of σ3(N) and ɛ3(N) calculated for the temperature of 2.5 K agree with known experimental data at the insulator side of the insulator-metal phase transition.
Local binary pattern texture-based classification of solid masses in ultrasound breast images

NASA Astrophysics Data System (ADS)

Matsumoto, Monica M. S.; Sehgal, Chandra M.; Udupa, Jayaram K.

2012-03-01

Breast cancer is one of the leading causes of cancer mortality among women. Ultrasound examination can be used to assess breast masses, complementarily to mammography. Ultrasound images reveal tissue information in its echoic patterns. Therefore, pattern recognition techniques can facilitate classification of lesions and thereby reduce the number of unnecessary biopsies. Our hypothesis was that image texture features on the boundary of a lesion and its vicinity can be used to classify masses. We have used intensity-independent and rotation-invariant texture features, known as Local Binary Patterns (LBP). The classifier selected was K-nearest neighbors. Our breast ultrasound image database consisted of 100 patient images (50 benign and 50 malignant cases). The determination of whether the mass was benign or malignant was done through biopsy and pathology assessment. The training set consisted of sixty images, randomly chosen from the database of 100 patients. The testing set consisted of forty images to be classified. The results with a multi-fold cross validation of 100 iterations produced a robust evaluation. The highest performance was observed for feature LBP with 24 symmetrically distributed neighbors over a circle of radius 3 (LBP24,3) with an accuracy rate of 81.0%. We also investigated an approach with a score of malignancy assigned to the images in the test set. This approach provided an ROC curve with Az of 0.803. The analysis of texture features over the boundary of solid masses showed promise for malignancy classification in ultrasound breast images.
Phase diagram and quantum order by disorder in the Kitaev K1-K2 honeycomb magnet

NASA Astrophysics Data System (ADS)

Rousochatzakis, Ioannis; Reuther, Johannes; Thomale, Ronny; Rachel, Stephan; Perkins, Natalia

We show that the topological Kitaev spin liquid on the honeycomb lattice is extremely fragile against the second neighbor Kitaev coupling K2, which has been recently identified as the dominant perturbation away from the nearest neighbor model in iridate Na2IrO3, and may also play a role in α-RuCl3. This coupling explains naturally the zig-zag ordering and the special entanglement between real and spin space observed recently in Na2IrO3. The minimal K1-K2 model that we present here holds in addition the unique property that the classical and quantum phase diagrams and their respective order-by-disorder mechanisms are qualitatively different due to their fundamentally different symmetry structure. Nsf DMR-1511768; Freie Univ. Berlin Excellence Initiative of German Research Foundation; European Research Council, ERC-StG-336012; DFG-SFB 1170; DFG-SFB 1143, DFG-SPP 1666, and Helmholtz association VI-521.
A Novel Extreme Learning Machine Classification Model for e-Nose Application Based on the Multiple Kernel Approach

PubMed Central

Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong

2017-01-01

A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification. PMID:28629202
A Novel Extreme Learning Machine Classification Model for e-Nose Application Based on the Multiple Kernel Approach.

PubMed

Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong

2017-06-19

A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification.
Meat and Fish Freshness Inspection System Based on Odor Sensing

PubMed Central

Hasan, Najam ul; Ejaz, Naveed; Ejaz, Waleed; Kim, Hyung Seok

2012-01-01

We propose a method for building a simple electronic nose based on commercially available sensors used to sniff in the market and identify spoiled/contaminated meat stocked for sale in butcher shops. Using a metal oxide semiconductor-based electronic nose, we measured the smell signature from two of the most common meat foods (beef and fish) stored at room temperature. Food samples were divided into two groups: fresh beef with decayed fish and fresh fish with decayed beef. The prime objective was to identify the decayed item using the developed electronic nose. Additionally, we tested the electronic nose using three pattern classification algorithms (artificial neural network, support vector machine and k-nearest neighbor), and compared them based on accuracy, sensitivity, and specificity. The results demonstrate that the k-nearest neighbor algorithm has the highest accuracy. PMID:23202222
Anterior Chamber Angle Shape Analysis and Classification of Glaucoma in SS-OCT Images.

PubMed

Ni Ni, Soe; Tian, J; Marziliano, Pina; Wong, Hong-Tym

2014-01-01

Optical coherence tomography is a high resolution, rapid, and noninvasive diagnostic tool for angle closure glaucoma. In this paper, we present a new strategy for the classification of the angle closure glaucoma using morphological shape analysis of the iridocorneal angle. The angle structure configuration is quantified by the following six features: (1) mean of the continuous measurement of the angle opening distance; (2) area of the trapezoidal profile of the iridocorneal angle centered at Schwalbe's line; (3) mean of the iris curvature from the extracted iris image; (4) complex shape descriptor, fractal dimension, to quantify the complexity, or changes of iridocorneal angle; (5) ellipticity moment shape descriptor; and (6) triangularity moment shape descriptor. Then, the fuzzy k nearest neighbor (fkNN) classifier is utilized for classification of angle closure glaucoma. Two hundred and sixty-four swept source optical coherence tomography (SS-OCT) images from 148 patients were analyzed in this study. From the experimental results, the fkNN reveals the best classification accuracy (99.11 ± 0.76%) and AUC (0.98 ± 0.012) with the combination of fractal dimension and biometric parameters. It showed that the proposed approach has promising potential to become a computer aided diagnostic tool for angle closure glaucoma (ACG) disease.
Differentiation of AmpC beta-lactamase binders vs. decoys using classification kNN QSAR modeling and application of the QSAR classifier to virtual screening

NASA Astrophysics Data System (ADS)

Hsieh, Jui-Hua; Wang, Xiang S.; Teotico, Denise; Golbraikh, Alexander; Tropsha, Alexander

2008-09-01

The use of inaccurate scoring functions in docking algorithms may result in the selection of compounds with high predicted binding affinity that nevertheless are known experimentally not to bind to the target receptor. Such falsely predicted binders have been termed `binding decoys'. We posed a question as to whether true binders and decoys could be distinguished based only on their structural chemical descriptors using approaches commonly used in ligand based drug design. We have applied the k-Nearest Neighbor ( kNN) classification QSAR approach to a dataset of compounds characterized as binders or binding decoys of AmpC beta-lactamase. Models were subjected to rigorous internal and external validation as part of our standard workflow and a special QSAR modeling scheme was employed that took into account the imbalanced ratio of inhibitors to non-binders (1:4) in this dataset. 342 predictive models were obtained with correct classification rate (CCR) for both training and test sets as high as 0.90 or higher. The prediction accuracy was as high as 100% (CCR = 1.00) for the external validation set composed of 10 compounds (5 true binders and 5 decoys) selected randomly from the original dataset. For an additional external set of 50 known non-binders, we have achieved the CCR of 0.87 using very conservative model applicability domain threshold. The validated binary kNN QSAR models were further employed for mining the NCGC AmpC screening dataset (69653 compounds). The consensus prediction of 64 compounds identified as screening hits in the AmpC PubChem assay disagreed with their annotation in PubChem but was in agreement with the results of secondary assays. At the same time, 15 compounds were identified as potential binders contrary to their annotation in PubChem. Five of them were tested experimentally and showed inhibitory activities in millimolar range with the highest binding constant Ki of 135 μM. Our studies suggest that validated QSAR models could complement
Classification of diabetic retinopathy using fractal dimension analysis of eye fundus image

NASA Astrophysics Data System (ADS)

Safitri, Diah Wahyu; Juniati, Dwi

2017-08-01

Diabetes Mellitus (DM) is a metabolic disorder when pancreas produce inadequate insulin or a condition when body resist insulin action, so the blood glucose level is high. One of the most common complications of diabetes mellitus is diabetic retinopathy which can lead to a vision problem. Diabetic retinopathy can be recognized by an abnormality in eye fundus. Those abnormalities are characterized by microaneurysms, hemorrhage, hard exudate, cotton wool spots, and venous's changes. The diabetic retinopathy is classified depends on the conditions of abnormality in eye fundus, that is grade 1 if there is a microaneurysm only in the eye fundus; grade 2, if there are a microaneurysm and a hemorrhage in eye fundus; and grade 3: if there are microaneurysm, hemorrhage, and neovascularization in the eye fundus. This study proposed a method and a process of eye fundus image to classify of diabetic retinopathy using fractal analysis and K-Nearest Neighbor (KNN). The first phase was image segmentation process using green channel, CLAHE, morphological opening, matched filter, masking, and morphological opening binary image. After segmentation process, its fractal dimension was calculated using box-counting method and the values of fractal dimension were analyzed to make a classification of diabetic retinopathy. Tests carried out by used k-fold cross validation method with k=5. In each test used 10 different grade K of KNN. The accuracy of the result of this method is 89,17% with K=3 or K=4, it was the best results than others K value. Based on this results, it can be concluded that the classification of diabetic retinopathy using fractal analysis and KNN had a good performance.
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.

PubMed

Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G

2017-09-01

To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
Providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer

DOEpatents

Archer, Charles J.; Faraj, Ahmad A.; Inglett, Todd A.; Ratterman, Joseph D.

2012-10-23

Methods, apparatus, and products are disclosed for providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: identifying each link in the global combining network for each compute node of the operational group; designating one of a plurality of point-to-point class routing identifiers for each link such that no compute node in the operational group is connected to two adjacent compute nodes in the operational group with links designated for the same class routing identifiers; and configuring each compute node of the operational group for point-to-point communications with each adjacent compute node in the global combining network through the link between that compute node and that adjacent compute node using that link's designated class routing identifier.
The Comparative Experimental Study of Multilabel Classification for Diagnosis Assistant Based on Chinese Obstetric EMRs

PubMed Central

Zhang, Kunli; Zhao, Yueshu; Zan, Hongying; Zhuang, Lei

2018-01-01

Obstetric electronic medical records (EMRs) contain massive amounts of medical data and health information. The information extraction and diagnosis assistants of obstetric EMRs are of great significance in improving the fertility level of the population. The admitting diagnosis in the first course record of the EMR is reasoned from various sources, such as chief complaints, auxiliary examinations, and physical examinations. This paper treats the diagnosis assistant as a multilabel classification task based on the analyses of obstetric EMRs. The latent Dirichlet allocation (LDA) topic and the word vector are used as features and the four multilabel classification methods, BP-MLL (backpropagation multilabel learning), RAkEL (RAndom k labELsets), MLkNN (multilabel k-nearest neighbor), and CC (chain classifier), are utilized to build the diagnosis assistant models. Experimental results conducted on real cases show that the BP-MLL achieves the best performance with an average precision up to 0.7413 ± 0.0100 when the number of label sets and the word dimensions are 71 and 100, respectively. The result of the diagnosis assistant can be introduced as a supplementary learning method for medical students. Additionally, the method can be used not only for obstetric EMRs but also for other medical records. PMID:29666671
Computer-aided diagnosis system: a Bayesian hybrid classification method.

PubMed

Calle-Alonso, F; Pérez, C J; Arias-Nicolás, J P; Martín, J

2013-10-01

A novel method to classify multi-class biomedical objects is presented. The method is based on a hybrid approach which combines pairwise comparison, Bayesian regression and the k-nearest neighbor technique. It can be applied in a fully automatic way or in a relevance feedback framework. In the latter case, the information obtained from both an expert and the automatic classification is iteratively used to improve the results until a certain accuracy level is achieved, then, the learning process is finished and new classifications can be automatically performed. The method has been applied in two biomedical contexts by following the same cross-validation schemes as in the original studies. The first one refers to cancer diagnosis, leading to an accuracy of 77.35% versus 66.37%, originally obtained. The second one considers the diagnosis of pathologies of the vertebral column. The original method achieves accuracies ranging from 76.5% to 96.7%, and from 82.3% to 97.1% in two different cross-validation schemes. Even with no supervision, the proposed method reaches 96.71% and 97.32% in these two cases. By using a supervised framework the achieved accuracy is 97.74%. Furthermore, all abnormal cases were correctly classified. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Feature selection gait-based gender classification under different circumstances

NASA Astrophysics Data System (ADS)

Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah

2014-05-01

This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.
A Quantum Hybrid PSO Combined with Fuzzy k-NN Approach to Feature Selection and Cell Classification in Cervical Cancer Detection.

PubMed

Iliyasu, Abdullah M; Fatichah, Chastine

2017-12-19

A quantum hybrid (QH) intelligent approach that blends the adaptive search capability of the quantum-behaved particle swarm optimisation (QPSO) method with the intuitionistic rationality of traditional fuzzy k -nearest neighbours (Fuzzy k -NN) algorithm (known simply as the Q-Fuzzy approach) is proposed for efficient feature selection and classification of cells in cervical smeared (CS) images. From an initial multitude of 17 features describing the geometry, colour, and texture of the CS images, the QPSO stage of our proposed technique is used to select the best subset features (i.e., global best particles) that represent a pruned down collection of seven features. Using a dataset of almost 1000 images, performance evaluation of our proposed Q-Fuzzy approach assesses the impact of our feature selection on classification accuracy by way of three experimental scenarios that are compared alongside two other approaches: the All-features (i.e., classification without prior feature selection) and another hybrid technique combining the standard PSO algorithm with the Fuzzy k -NN technique (P-Fuzzy approach). In the first and second scenarios, we further divided the assessment criteria in terms of classification accuracy based on the choice of best features and those in terms of the different categories of the cervical cells. In the third scenario, we introduced new QH hybrid techniques, i.e., QPSO combined with other supervised learning methods, and compared the classification accuracy alongside our proposed Q-Fuzzy approach. Furthermore, we employed statistical approaches to establish qualitative agreement with regards to the feature selection in the experimental scenarios 1 and 3. The synergy between the QPSO and Fuzzy k -NN in the proposed Q-Fuzzy approach improves classification accuracy as manifest in the reduction in number cell features, which is crucial for effective cervical cancer detection and diagnosis.
Multispectral and Panchromatic used Enhancement Resolution and Study Effective Enhancement on Supervised and Unsupervised Classification Land – Cover

NASA Astrophysics Data System (ADS)

Salman, S. S.; Abbas, W. A.

2018-05-01

The goal of the study is to support analysis Enhancement of Resolution and study effect on classification methods on bands spectral information of specific and quantitative approaches. In this study introduce a method to enhancement resolution Landsat 8 of combining the bands spectral of 30 meters resolution with panchromatic band 8 of 15 meters resolution, because of importance multispectral imagery to extracting land - cover. Classification methods used in this study to classify several lands -covers recorded from OLI- 8 imagery. Two methods of Data mining can be classified as either supervised or unsupervised. In supervised methods, there is a particular predefined target, that means the algorithm learn which values of the target are associated with which values of the predictor sample. K-nearest neighbors and maximum likelihood algorithms examine in this work as supervised methods. In other hand, no sample identified as target in unsupervised methods, the algorithm of data extraction searches for structure and patterns between all the variables, represented by Fuzzy C-mean clustering method as one of the unsupervised methods, NDVI vegetation index used to compare the results of classification method, the percent of dense vegetation in maximum likelihood method give a best results.
Comparison of two target classification techniques

NASA Astrophysics Data System (ADS)

Chen, J. S.; Walton, E. K.

1986-01-01

Radar target classification techniques based on backscatter measurements in the resonance region (1.0-20.0 MHz) are discussed. Attention is given to two novel methods currently being tested at the radar range of Ohio State University. The methods include: (1) the nearest neighbor (NN) algorithm for determining the radar cross section (RCS) magnitude and range corrected phase at various operating frequencies; and (2) an inverse Fourier transformation of the complex multifrequency radar returns of the time domain, followed by cross correlation analysis. Comparisons are made of the performance of the two techniques as a function of signal-to-error noise ratio for different types of processing. The results of the comparison are discussed in detail.

Chromatographic profiles of Phyllanthus aqueous extracts samples: a proposition of classification using chemometric models.

PubMed

Martins, Lucia Regina Rocha; Pereira-Filho, Edenir Rodrigues; Cass, Quezia Bezerra

2011-04-01

Taking in consideration the global analysis of complex samples, proposed by the metabolomic approach, the chromatographic fingerprint encompasses an attractive chemical characterization of herbal medicines. Thus, it can be used as a tool in quality control analysis of phytomedicines. The generated multivariate data are better evaluated by chemometric analyses, and they can be modeled by classification methods. "Stone breaker" is a popular Brazilian plant of Phyllanthus genus, used worldwide to treat renal calculus, hepatitis, and many other diseases. In this study, gradient elution at reversed-phase conditions with detection at ultraviolet region were used to obtain chemical profiles (fingerprints) of botanically identified samples of six Phyllanthus species. The obtained chromatograms, at 275 nm, were organized in data matrices, and the time shifts of peaks were adjusted using the Correlation Optimized Warping algorithm. Principal Component Analyses were performed to evaluate similarities among cultivated and uncultivated samples and the discrimination among the species and, after that, the samples were used to compose three classification models using Soft Independent Modeling of Class analogy, K-Nearest Neighbor, and Partial Least Squares for Discriminant Analysis. The ability of classification models were discussed after their successful application for authenticity evaluation of 25 commercial samples of "stone breaker."
Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data.

PubMed

Rahman, Shah Atiqur; Huang, Yuxiao; Claassen, Jan; Heintzman, Nathaniel; Kleinberg, Samantha

2015-12-01

Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length. Copyright © 2015 Elsevier Inc. All rights reserved.
Ensemble Clustering Classification compete SVM and One-Class classifiers applied on plant microRNAs Data.

PubMed

Yousef, Malik; Khalifa, Waleed; AbedAllah, Loai

2016-12-22

The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that ECkNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs.

PubMed

Li, Feifei; Piao, Minghao; Piao, Yongjun; Li, Meijing; Ryu, Keun Ho

2014-10-01

Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearson's correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.
Estimating local scaling properties for the classification of interstitial lung disease patterns

NASA Astrophysics Data System (ADS)

Huber, Markus B.; Nagarajan, Mahesh B.; Leinsinger, Gerda; Ray, Lawrence A.; Wismueller, Axel

2011-03-01

Local scaling properties of texture regions were compared in their ability to classify morphological patterns known as 'honeycombing' that are considered indicative for the presence of fibrotic interstitial lung diseases in high-resolution computed tomography (HRCT) images. For 14 patients with known occurrence of honeycombing, a stack of 70 axial, lung kernel reconstructed images were acquired from HRCT chest exams. 241 regions of interest of both healthy and pathological (89) lung tissue were identified by an experienced radiologist. Texture features were extracted using six properties calculated from gray-level co-occurrence matrices (GLCM), Minkowski Dimensions (MDs), and the estimation of local scaling properties with Scaling Index Method (SIM). A k-nearest-neighbor (k-NN) classifier and a Multilayer Radial Basis Functions Network (RBFN) were optimized in a 10-fold cross-validation for each texture vector, and the classification accuracy was calculated on independent test sets as a quantitative measure of automated tissue characterization. A Wilcoxon signed-rank test was used to compare two accuracy distributions including the Bonferroni correction. The best classification results were obtained by the set of SIM features, which performed significantly better than all the standard GLCM and MD features (p < 0.005) for both classifiers with the highest accuracy (94.1%, 93.7%; for the k-NN and RBFN classifier, respectively). The best standard texture features were the GLCM features 'homogeneity' (91.8%, 87.2%) and 'absolute value' (90.2%, 88.5%). The results indicate that advanced texture features using local scaling properties can provide superior classification performance in computer-assisted diagnosis of interstitial lung diseases when compared to standard texture analysis methods.
Multi-class SVM model for fMRI-based classification and grading of liver fibrosis

NASA Astrophysics Data System (ADS)

Freiman, M.; Sela, Y.; Edrei, Y.; Pappo, O.; Joskowicz, L.; Abramovitch, R.

2010-03-01

We present a novel non-invasive automatic method for the classification and grading of liver fibrosis from fMRI maps based on hepatic hemodynamic changes. This method automatically creates a model for liver fibrosis grading based on training datasets. Our supervised learning method evaluates hepatic hemodynamics from an anatomical MRI image and three T2*-W fMRI signal intensity time-course scans acquired during the breathing of air, air-carbon dioxide, and carbogen. It constructs a statistical model of liver fibrosis from these fMRI scans using a binary-based one-against-all multi class Support Vector Machine (SVM) classifier. We evaluated the resulting classification model with the leave-one out technique and compared it to both full multi-class SVM and K-Nearest Neighbor (KNN) classifications. Our experimental study analyzed 57 slice sets from 13 mice, and yielded a 98.2% separation accuracy between healthy and low grade fibrotic subjects, and an overall accuracy of 84.2% for fibrosis grading. These results are better than the existing image-based methods which can only discriminate between healthy and high grade fibrosis subjects. With appropriate extensions, our method may be used for non-invasive classification and progression monitoring of liver fibrosis in human patients instead of more invasive approaches, such as biopsy or contrast-enhanced imaging.
Analysis of A Drug Target-based Classification System using Molecular Descriptors.

PubMed

Lu, Jing; Zhang, Pin; Bi, Yi; Luo, Xiaomin

2016-01-01

Drug-target interaction is an important topic in drug discovery and drug repositioning. KEGG database offers a drug annotation and classification using a target-based classification system. In this study, we gave an investigation on five target-based classes: (I) G protein-coupled receptors; (II) Nuclear receptors; (III) Ion channels; (IV) Enzymes; (V) Pathogens, using molecular descriptors to represent each drug compound. Two popular feature selection methods, maximum relevance minimum redundancy and incremental feature selection, were adopted to extract the important descriptors. Meanwhile, an optimal prediction model based on nearest neighbor algorithm was constructed, which got the best result in identifying drug target-based classes. Finally, some key descriptors were discussed to uncover their important roles in the identification of drug-target classes.
Bamboo Classification Using WorldView-2 Imagery of Giant Panda Habitat in a Large Shaded Area in Wolong, Sichuan Province, China

PubMed Central

Tang, Yunwei; Jing, Linhai; Li, Hui; Liu, Qingjie; Yan, Qi; Li, Xiuxia

2016-01-01

This study explores the ability of WorldView-2 (WV-2) imagery for bamboo mapping in a mountainous region in Sichuan Province, China. A large area of this place is covered by shadows in the image, and only a few sampled points derived were useful. In order to identify bamboos based on sparse training data, the sample size was expanded according to the reflectance of multispectral bands selected using the principal component analysis (PCA). Then, class separability based on the training data was calculated using a feature space optimization method to select the features for classification. Four regular object-based classification methods were applied based on both sets of training data. The results show that the k-nearest neighbor (k-NN) method produced the greatest accuracy. A geostatistically-weighted k-NN classifier, accounting for the spatial correlation between classes, was then applied to further increase the accuracy. It achieved 82.65% and 93.10% of the producer’s and user’s accuracies respectively for the bamboo class. The canopy densities were estimated to explain the result. This study demonstrates that the WV-2 image can be used to identify small patches of understory bamboos given limited known samples, and the resulting bamboo distribution facilitates the assessments of the habitats of giant pandas. PMID:27879661
Bamboo Classification Using WorldView-2 Imagery of Giant Panda Habitat in a Large Shaded Area in Wolong, Sichuan Province, China.

PubMed

Tang, Yunwei; Jing, Linhai; Li, Hui; Liu, Qingjie; Yan, Qi; Li, Xiuxia

2016-11-22

This study explores the ability of WorldView-2 (WV-2) imagery for bamboo mapping in a mountainous region in Sichuan Province, China. A large area of this place is covered by shadows in the image, and only a few sampled points derived were useful. In order to identify bamboos based on sparse training data, the sample size was expanded according to the reflectance of multispectral bands selected using the principal component analysis (PCA). Then, class separability based on the training data was calculated using a feature space optimization method to select the features for classification. Four regular object-based classification methods were applied based on both sets of training data. The results show that the k -nearest neighbor ( k -NN) method produced the greatest accuracy. A geostatistically-weighted k -NN classifier, accounting for the spatial correlation between classes, was then applied to further increase the accuracy. It achieved 82.65% and 93.10% of the producer's and user's accuracies respectively for the bamboo class. The canopy densities were estimated to explain the result. This study demonstrates that the WV-2 image can be used to identify small patches of understory bamboos given limited known samples, and the resulting bamboo distribution facilitates the assessments of the habitats of giant pandas.
Voxel classification based airway tree segmentation

NASA Astrophysics Data System (ADS)

Lo, Pechin; de Bruijne, Marleen

2008-03-01

This paper presents a voxel classification based method for segmenting the human airway tree in volumetric computed tomography (CT) images. In contrast to standard methods that use only voxel intensities, our method uses a more complex appearance model based on a set of local image appearance features and Kth nearest neighbor (KNN) classification. The optimal set of features for classification is selected automatically from a large set of features describing the local image structure at several scales. The use of multiple features enables the appearance model to differentiate between airway tree voxels and other voxels of similar intensities in the lung, thus making the segmentation robust to pathologies such as emphysema. The classifier is trained on imperfect segmentations that can easily be obtained using region growing with a manual threshold selection. Experiments show that the proposed method results in a more robust segmentation that can grow into the smaller airway branches without leaking into emphysematous areas, and is able to segment many branches that are not present in the training set.
A Method of Neighbor Classes Based SVM Classification for Optical Printed Chinese Character Recognition

PubMed Central

Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng

2013-01-01

In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR. PMID:23536777
A method of neighbor classes based SVM classification for optical printed Chinese character recognition.

PubMed

Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng

2013-01-01

In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR.
Benchmarking protein classification algorithms via supervised cross-validation.

PubMed

Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

2008-04-24

Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and
Nearest Neighbor Averaging and its Effect on the Critical Level and Minimum Detectable Concentration for Scanning Radiological Survey Instruments that Perform Facility Release Surveys.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fournier, Sean Donovan; Beall, Patrick S; Miller, Mark L

2014-08-01

Through the SNL New Mexico Small Business Assistance (NMSBA) program, several Sandia engineers worked with the Environmental Restoration Group (ERG) Inc. to verify and validate a novel algorithm used to determine the scanning Critical Level (L c ) and Minimum Detectable Concentration (MDC) (or Minimum Detectable Areal Activity) for the 102F scanning system. Through the use of Monte Carlo statistical simulations the algorithm mathematically demonstrates accuracy in determining the L c and MDC when a nearest-neighbor averaging (NNA) technique was used. To empirically validate this approach, SNL prepared several spiked sources and ran a test with the ERG 102F instrumentmore » on a bare concrete floor known to have no radiological contamination other than background naturally occurring radioactive material (NORM). The tests conclude that the NNA technique increases the sensitivity (decreases the L c and MDC) for high-density data maps that are obtained by scanning radiological survey instruments.« less
Ensemble Clustering Classification Applied to Competing SVM and One-Class Classifiers Exemplified by Plant MicroRNAs Data.

PubMed

Yousef, Malik; Khalifa, Waleed; AbdAllah, Loai

2016-12-01

The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that EC-kNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Jian; Hamidouche, Khaled; Zheng, Jie

2015-08-05

Machine Learning algorithms are benefiting from the continuous improvement of programming models, including MPI, MapReduce and PGAS. k-Nearest Neighbors (k-NN) algorithm is a widely used machine learning algorithm, applied to supervised learning tasks such as classification. Several parallel implementations of k-NN have been proposed in the literature and practice. However, on high-performance computing systems with high-speed interconnects, it is important to further accelerate existing designs of the k-NN algorithm through taking advantage of scalable programming models. To improve the performance of k-NN on large-scale environment with InfiniBand network, this paper proposes several alternative hybrid MPI+OpenSHMEM designs and performs a systemicmore » evaluation and analysis on typical workloads. The hybrid designs leverage the one-sided memory access to better overlap communication with computation than the existing pure MPI design, and propose better schemes for efficient buffer management. The implementation based on k-NN program from MaTEx with MVAPICH2-X (Unified MPI+PGAS Communication Runtime over InfiniBand) shows up to 9.0% time reduction for training KDD Cup 2010 workload over 512 cores, and 27.6% time reduction for small workload with balanced communication and computation. Experiments of running with varied number of cores show that our design can maintain good scalability.« less
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds.

PubMed

Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M; Bloom, Peter H; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

2017-01-01

Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds

PubMed Central

Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

2017-01-01

Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data. PMID:28403159
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds

USGS Publications Warehouse

Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael J.; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

2017-01-01

Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.
Computational approaches for the classification of seed storage proteins.

PubMed

Radhika, V; Rao, V Sree Hari

2015-07-01

Seed storage proteins comprise a major part of the protein content of the seed and have an important role on the quality of the seed. These storage proteins are important because they determine the total protein content and have an effect on the nutritional quality and functional properties for food processing. Transgenic plants are being used to develop improved lines for incorporation into plant breeding programs and the nutrient composition of seeds is a major target of molecular breeding programs. Hence, classification of these proteins is crucial for the development of superior varieties with improved nutritional quality. In this study we have applied machine learning algorithms for classification of seed storage proteins. We have presented an algorithm based on nearest neighbor approach for classification of seed storage proteins and compared its performance with decision tree J48, multilayer perceptron neural (MLP) network and support vector machine (SVM) libSVM. The model based on our algorithm has been able to give higher classification accuracy in comparison to the other methods.

Evaluation of different classification methods for the diagnosis of schizophrenia based on functional near-infrared spectroscopy.

PubMed

Li, Zhaohua; Wang, Yuduo; Quan, Wenxiang; Wu, Tongning; Lv, Bin

2015-02-15

Based on near-infrared spectroscopy (NIRS), recent converging evidence has been observed that patients with schizophrenia exhibit abnormal functional activities in the prefrontal cortex during a verbal fluency task (VFT). Therefore, some studies have attempted to employ NIRS measurements to differentiate schizophrenia patients from healthy controls with different classification methods. However, no systematic evaluation was conducted to compare their respective classification performances on the same study population. In this study, we evaluated the classification performance of four classification methods (including linear discriminant analysis, k-nearest neighbors, Gaussian process classifier, and support vector machines) on an NIRS-aided schizophrenia diagnosis. We recruited a large sample of 120 schizophrenia patients and 120 healthy controls and measured the hemoglobin response in the prefrontal cortex during the VFT using a multichannel NIRS system. Features for classification were extracted from three types of NIRS data in each channel. We subsequently performed a principal component analysis (PCA) for feature selection prior to comparison of the different classification methods. We achieved a maximum accuracy of 85.83% and an overall mean accuracy of 83.37% using a PCA-based feature selection on oxygenated hemoglobin signals and support vector machine classifier. This is the first comprehensive evaluation of different classification methods for the diagnosis of schizophrenia based on different types of NIRS signals. Our results suggested that, using the appropriate classification method, NIRS has the potential capacity to be an effective objective biomarker for the diagnosis of schizophrenia. Copyright © 2014 Elsevier B.V. All rights reserved.
Computer-aided Prognosis of Neuroblastoma on Whole-slide Images: Classification of Stromal Development

PubMed Central

Sertel, O.; Kong, J.; Shimada, H.; Catalyurek, U.V.; Saltz, J.H.; Gurcan, M.N.

2009-01-01

We are developing a computer-aided prognosis system for neuroblastoma (NB), a cancer of the nervous system and one of the most malignant tumors affecting children. Histopathological examination is an important stage for further treatment planning in routine clinical diagnosis of NB. According to the International Neuroblastoma Pathology Classification (the Shimada system), NB patients are classified into favorable and unfavorable histology based on the tissue morphology. In this study, we propose an image analysis system that operates on digitized H&E stained whole-slide NB tissue samples and classifies each slide as either stroma-rich or stroma-poor based on the degree of Schwannian stromal development. Our statistical framework performs the classification based on texture features extracted using co-occurrence statistics and local binary patterns. Due to the high resolution of digitized whole-slide images, we propose a multi-resolution approach that mimics the evaluation of a pathologist such that the image analysis starts from the lowest resolution and switches to higher resolutions when necessary. We employ an offine feature selection step, which determines the most discriminative features at each resolution level during the training step. A modified k-nearest neighbor classifier is used to determine the confidence level of the classification to make the decision at a particular resolution level. The proposed approach was independently tested on 43 whole-slide samples and provided an overall classification accuracy of 88.4%. PMID:20161324
Phase diagrams and free-energy landscapes for model spin-crossover materials with antiferromagnetic-like nearest-neighbor and ferromagnetic-like long-range interactions

NASA Astrophysics Data System (ADS)

Chan, C. H.; Brown, G.; Rikvold, P. A.

2017-11-01

We present phase diagrams, free-energy landscapes, and order-parameter distributions for a model spin-crossover material with a two-step transition between the high-spin and low-spin states (a square-lattice Ising model with antiferromagnetic-like nearest-neighbor and ferromagnetic-like long-range interactions) [P. A. Rikvold et al., Phys. Rev. B 93, 064109 (2016), 10.1103/PhysRevB.93.064109]. The results are obtained by a recently introduced, macroscopically constrained Wang-Landau Monte Carlo simulation method [Phys. Rev. E 95, 053302 (2017), 10.1103/PhysRevE.95.053302]. The method's computational efficiency enables calculation of thermodynamic quantities for a wide range of temperatures, applied fields, and long-range interaction strengths. For long-range interactions of intermediate strength, tricritical points in the phase diagrams are replaced by pairs of critical end points and mean-field critical points that give rise to horn-shaped regions of metastability. The corresponding free-energy landscapes offer insights into the nature of asymmetric, multiple hysteresis loops that have been experimentally observed in spin-crossover materials characterized by competing short-range interactions and long-range elastic interactions.
Effective Feature Selection for Classification of Promoter Sequences.

PubMed

K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish

2016-01-01

Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
Progress in adapting k-NN methods for forest mapping and estimation using the new annual Forest Inventory and Analysis data

Treesearch

Reija Haapanen; Kimmo Lehtinen; Jukka Miettinen; Marvin E. Bauer; Alan R. Ek

2002-01-01

The k-nearest neighbor (k-NN) method has been undergoing development and testing for applications with USDA Forest Service Forest Inventory and Analysis (FIA) data in Minnesota since 1997. Research began using the 1987-1990 FIA inventory of the state, the then standard 10-point cluster plots, and Landsat TM imagery. In the past year, research has moved to examine...
Classification of postural profiles among mouth-breathing children by learning vector quantization.

PubMed

Mancini, F; Sousa, F S; Hummel, A D; Falcão, A E J; Yi, L C; Ortolani, C F; Sigulem, D; Pisa, I T

2011-01-01

Mouth breathing is a chronic syndrome that may bring about postural changes. Finding characteristic patterns of changes occurring in the complex musculoskeletal system of mouth-breathing children has been a challenge. Learning vector quantization (LVQ) is an artificial neural network model that can be applied for this purpose. The aim of the present study was to apply LVQ to determine the characteristic postural profiles shown by mouth-breathing children, in order to further understand abnormal posture among mouth breathers. Postural training data on 52 children (30 mouth breathers and 22 nose breathers) and postural validation data on 32 children (22 mouth breathers and 10 nose breathers) were used. The performance of LVQ and other classification models was compared in relation to self-organizing maps, back-propagation applied to multilayer perceptrons, Bayesian networks, naive Bayes, J48 decision trees, k, and k-nearest-neighbor classifiers. Classifier accuracy was assessed by means of leave-one-out cross-validation, area under ROC curve (AUC), and inter-rater agreement (Kappa statistics). By using the LVQ model, five postural profiles for mouth-breathing children could be determined. LVQ showed satisfactory results for mouth-breathing and nose-breathing classification: sensitivity and specificity rates of 0.90 and 0.95, respectively, when using the training dataset, and 0.95 and 0.90, respectively, when using the validation dataset. The five postural profiles for mouth-breathing children suggested by LVQ were incorporated into application software for classifying the severity of mouth breathers' abnormal posture.
Model-based mean square error estimators for k-nearest neighbour predictions and applications using remotely sensed data for forest inventories

Treesearch

Steen Magnussen; Ronald E. McRoberts; Erkki O. Tomppo

2009-01-01

New model-based estimators of the uncertainty of pixel-level and areal k-nearest neighbour (knn) predictions of attribute Y from remotely-sensed ancillary data X are presented. Non-parametric functions predict Y from scalar 'Single Index Model' transformations of X. Variance functions generated...
Fast Demand Forecast of Electric Vehicle Charging Stations for Cell Phone Application

DOE Office of Scientific and Technical Information (OSTI.GOV)

Majidpour, Mostafa; Qiu, Charlie; Chung, Ching-Yen

This paper describes the core cellphone application algorithm which has been implemented for the prediction of energy consumption at Electric Vehicle (EV) Charging Stations at UCLA. For this interactive user application, the total time of accessing database, processing the data and making the prediction, needs to be within a few seconds. We analyze four relatively fast Machine Learning based time series prediction algorithms for our prediction engine: Historical Average, kNearest Neighbor, Weighted k-Nearest Neighbor, and Lazy Learning. The Nearest Neighbor algorithm (k Nearest Neighbor with k=1) shows better performance and is selected to be the prediction algorithm implemented for themore » cellphone application. Two applications have been designed on top of the prediction algorithm: one predicts the expected available energy at the station and the other one predicts the expected charging finishing time. The total time, including accessing the database, data processing, and prediction is about one second for both applications.« less
Exact ground states for the nearest neighbor quantum XXZ model on the kagome and other lattices with triangular motifs at Jz /Jxy = - 1 / 2

NASA Astrophysics Data System (ADS)

Changlani, Hitesh; Kumar, Krishna; Kochkov, Dmitrii; Fradkin, Eduardo; Clark, Bryan

We report the existence of a quantum macroscopically degenerate ground state manifold on the nearest neighbor XXZ model on the kagome lattice at the point Jz /Jxy = - 1 / 2 . On many lattices with triangular motifs (including the kagome, sawtooth, icosidodecahedron and Shastry-Sutherland lattice for a certain choice of couplings) this Hamiltonian is found to be frustration-free with exact ground states which correspond to three-colorings of these lattices. Several results also generalize to the case of variable couplings and to other motifs (albeit with possibly more complex Hamiltonians). The degenerate manifold on the kagome lattice corresponds to a ''many-body flat band'' of interacting hard-core bosons; and for the one boson case our results also explain the well-known non-interacting flat band. On adding realistic perturbations, state selection in this manifold of quantum many-body states is discussed along with the implications for the phase diagram of the kagome lattice antiferromagnet. supported by DE-FG02-12ER46875, DMR 1408713, DE-FG02-08ER46544.
Velocity correlations and spatial dependencies between neighbors in a unidirectional flow of pedestrians

NASA Astrophysics Data System (ADS)

Porzycki, Jakub; WÄ s, Jarosław; Hedayatifar, Leila; Hassanibesheli, Forough; Kułakowski, Krzysztof

2017-08-01

The aim of the paper is an analysis of self-organization patterns observed in the unidirectional flow of pedestrians. On the basis of experimental data from Zhang et al. [J. Zhang et al., J. Stat. Mech. (2011) P06004, 10.1088/1742-5468/2011/06/P06004], we analyze the mutual positions and velocity correlations between pedestrians when walking along a corridor. The angular and spatial dependencies of the mutual positions reveal a spatial structure that remains stable during the crowd motion. This structure differs depending on the value of n , for the consecutive n th -nearest-neighbor position set. The preferred position for the first-nearest neighbor is on the side of the pedestrian, while for further neighbors, this preference shifts to the axis of movement. The velocity correlations vary with the angle formed by the pair of neighboring pedestrians and the direction of motion and with the time delay between pedestrians' movements. The delay dependence of the correlations shows characteristic oscillations, produced by the velocity oscillations when striding; however, a filtering of the main frequency of individual striding out reduces the oscillations only partially. We conclude that pedestrians select their path directions so as to evade the necessity of continuously adjusting their speed to their neighbors'. They try to keep a given distance, but follow the person in front of them, as well as accepting and observing pedestrians on their sides. Additionally, we show an empirical example that illustrates the shape of a pedestrian's personal space during movement.
Classification of EEG Signals Based on Pattern Recognition Approach.

PubMed

Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed

2017-01-01

Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a "pattern recognition" approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90-7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11-89.63% and 91.60-81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy.
Classification of EEG Signals Based on Pattern Recognition Approach

PubMed Central

Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed

2017-01-01

Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a “pattern recognition” approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90–7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11–89.63% and 91.60–81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy. PMID
Jastrow-like ground states for quantum many-body potentials with near-neighbors interactions

NASA Astrophysics Data System (ADS)

Baradaran, Marzieh; Carrasco, José A.; Finkel, Federico; González-López, Artemio

2018-01-01

We completely solve the problem of classifying all one-dimensional quantum potentials with nearest- and next-to-nearest-neighbors interactions whose ground state is Jastrow-like, i.e., of Jastrow type but depending only on differences of consecutive particles. In particular, we show that these models must necessarily contain a three-body interaction term, as was the case with all previously known examples. We discuss several particular instances of the general solution, including a new hyperbolic potential and a model with elliptic interactions which reduces to the known rational and trigonometric ones in appropriate limits.
Landscape scale mapping of forest inventory data by nearest neighbor classification

Treesearch

Andrew Lister

2009-01-01

One of the goals of the Forest Service, U.S. Department of Agriculture's Forest Inventory and Analysis (FIA) program is large-area mapping. FIA scientists have tried many methods in the past, including geostatistical methods, linear modeling, nonlinear modeling, and simple choropleth and dot maps. Mapping methods that require individual model-based maps to be...
An Examination of Diameter Density Prediction with k-NN and Airborne Lidar

DOE PAGES

Strunk, Jacob L.; Gould, Peter J.; Packalen, Petteri; ...

2017-11-16

While lidar-based forest inventory methods have been widely demonstrated, performances of methods to predict tree diameters with airborne lidar (lidar) are not well understood. One cause for this is that the performance metrics typically used in studies for prediction of diameters can be difficult to interpret, and may not support comparative inferences between sampling designs and study areas. To help with this problem we propose two indices and use them to evaluate a variety of lidar and k nearest neighbor (k-NN) strategies for prediction of tree diameter distributions. The indices are based on the coefficient of determination ( R 2),more » and root mean square deviation (RMSD). Both of the indices are highly interpretable, and the RMSD-based index facilitates comparisons with alternative (non-lidar) inventory strategies, and with projects in other regions. K-NN diameter distribution prediction strategies were examined using auxiliary lidar for 190 training plots distribute across the 800 km 2 Savannah River Site in South Carolina, USA. In conclusion, we evaluate the performance of k-NN with respect to distance metrics, number of neighbors, predictor sets, and response sets. K-NN and lidar explained 80% of variability in diameters, and Mahalanobis distance with k = 3 neighbors performed best according to a number of criteria.« less
An Examination of Diameter Density Prediction with k-NN and Airborne Lidar

DOE Office of Scientific and Technical Information (OSTI.GOV)

Strunk, Jacob L.; Gould, Peter J.; Packalen, Petteri

While lidar-based forest inventory methods have been widely demonstrated, performances of methods to predict tree diameters with airborne lidar (lidar) are not well understood. One cause for this is that the performance metrics typically used in studies for prediction of diameters can be difficult to interpret, and may not support comparative inferences between sampling designs and study areas. To help with this problem we propose two indices and use them to evaluate a variety of lidar and k nearest neighbor (k-NN) strategies for prediction of tree diameter distributions. The indices are based on the coefficient of determination ( R 2),more » and root mean square deviation (RMSD). Both of the indices are highly interpretable, and the RMSD-based index facilitates comparisons with alternative (non-lidar) inventory strategies, and with projects in other regions. K-NN diameter distribution prediction strategies were examined using auxiliary lidar for 190 training plots distribute across the 800 km 2 Savannah River Site in South Carolina, USA. In conclusion, we evaluate the performance of k-NN with respect to distance metrics, number of neighbors, predictor sets, and response sets. K-NN and lidar explained 80% of variability in diameters, and Mahalanobis distance with k = 3 neighbors performed best according to a number of criteria.« less
Progressive sparse representation-based classification using local discrete cosine transform evaluation for image recognition

NASA Astrophysics Data System (ADS)

Song, Xiaoning; Feng, Zhen-Hua; Hu, Guosheng; Yang, Xibei; Yang, Jingyu; Qi, Yunsong

2015-09-01

This paper proposes a progressive sparse representation-based classification algorithm using local discrete cosine transform (DCT) evaluation to perform face recognition. Specifically, the sum of the contributions of all training samples of each subject is first taken as the contribution of this subject, then the redundant subject with the smallest contribution to the test sample is iteratively eliminated. Second, the progressive method aims at representing the test sample as a linear combination of all the remaining training samples, by which the representation capability of each training sample is exploited to determine the optimal "nearest neighbors" for the test sample. Third, the transformed DCT evaluation is constructed to measure the similarity between the test sample and each local training sample using cosine distance metrics in the DCT domain. The final goal of the proposed method is to determine an optimal weighted sum of nearest neighbors that are obtained under the local correlative degree evaluation, which is approximately equal to the test sample, and we can use this weighted linear combination to perform robust classification. Experimental results conducted on the ORL database of faces (created by the Olivetti Research Laboratory in Cambridge), the FERET face database (managed by the Defense Advanced Research Projects Agency and the National Institute of Standards and Technology), AR face database (created by Aleix Martinez and Robert Benavente in the Computer Vision Center at U.A.B), and USPS handwritten digit database (gathered at the Center of Excellence in Document Analysis and Recognition at SUNY Buffalo) demonstrate the effectiveness of the proposed method.
A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization.

PubMed

Vafaee Sharbaf, Fatemeh; Mosafer, Sara; Moattar, Mohammad Hossein

2016-06-01

This paper proposes an approach for gene selection in microarray data. The proposed approach consists of a primary filter approach using Fisher criterion which reduces the initial genes and hence the search space and time complexity. Then, a wrapper approach which is based on cellular learning automata (CLA) optimized with ant colony method (ACO) is used to find the set of features which improve the classification accuracy. CLA is applied due to its capability to learn and model complicated relationships. The selected features from the last phase are evaluated using ROC curve and the most effective while smallest feature subset is determined. The classifiers which are evaluated in the proposed framework are K-nearest neighbor; support vector machine and naïve Bayes. The proposed approach is evaluated on 4 microarray datasets. The evaluations confirm that the proposed approach can find the smallest subset of genes while approaching the maximum accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.
An incremental knowledge assimilation system (IKAS) for mine detection

NASA Astrophysics Data System (ADS)

Porway, Jake; Raju, Chaitanya; Varadarajan, Karthik Mahesh; Nguyen, Hieu; Yadegar, Joseph

2010-04-01

In this paper we present an adaptive incremental learning system for underwater mine detection and classification that utilizes statistical models of seabed texture and an adaptive nearest-neighbor classifier to identify varied underwater targets in many different environments. The first stage of processing uses our Background Adaptive ANomaly detector (BAAN), which identifies statistically likely target regions using Gabor filter responses over the image. Using this information, BAAN classifies the background type and updates its detection using background-specific parameters. To perform classification, a Fully Adaptive Nearest Neighbor (FAAN) determines the best label for each detection. FAAN uses an extremely fast version of Nearest Neighbor to find the most likely label for the target. The classifier perpetually assimilates new and relevant information into its existing knowledge database in an incremental fashion, allowing improved classification accuracy and capturing concept drift in the target classes. Experiments show that the system achieves >90% classification accuracy on underwater mine detection tasks performed on synthesized datasets provided by the Office of Naval Research. We have also demonstrated that the system can incrementally improve its detection accuracy by constantly learning from new samples.
A systematic comparison of different object-based classification techniques using high spatial resolution imagery in agricultural environments

NASA Astrophysics Data System (ADS)

Li, Manchun; Ma, Lei; Blaschke, Thomas; Cheng, Liang; Tiede, Dirk

2016-07-01

Geographic Object-Based Image Analysis (GEOBIA) is becoming more prevalent in remote sensing classification, especially for high-resolution imagery. Many supervised classification approaches are applied to objects rather than pixels, and several studies have been conducted to evaluate the performance of such supervised classification techniques in GEOBIA. However, these studies did not systematically investigate all relevant factors affecting the classification (segmentation scale, training set size, feature selection and mixed objects). In this study, statistical methods and visual inspection were used to compare these factors systematically in two agricultural case studies in China. The results indicate that Random Forest (RF) and Support Vector Machines (SVM) are highly suitable for GEOBIA classifications in agricultural areas and confirm the expected general tendency, namely that the overall accuracies decline with increasing segmentation scale. All other investigated methods except for RF and SVM are more prone to obtain a lower accuracy due to the broken objects at fine scales. In contrast to some previous studies, the RF classifiers yielded the best results and the k-nearest neighbor classifier were the worst results, in most cases. Likewise, the RF and Decision Tree classifiers are the most robust with or without feature selection. The results of training sample analyses indicated that the RF and adaboost. M1 possess a superior generalization capability, except when dealing with small training sample sizes. Furthermore, the classification accuracies were directly related to the homogeneity/heterogeneity of the segmented objects for all classifiers. Finally, it was suggested that RF should be considered in most cases for agricultural mapping.

Supervised pixel classification for segmenting geographic atrophy in fundus autofluorescene images

NASA Astrophysics Data System (ADS)

Hu, Zhihong; Medioni, Gerard G.; Hernandez, Matthias; Sadda, SriniVas R.

2014-03-01

Age-related macular degeneration (AMD) is the leading cause of blindness in people over the age of 65. Geographic atrophy (GA) is a manifestation of the advanced or late-stage of the AMD, which may result in severe vision loss and blindness. Techniques to rapidly and precisely detect and quantify GA lesions would appear to be of important value in advancing the understanding of the pathogenesis of GA and the management of GA progression. The purpose of this study is to develop an automated supervised pixel classification approach for segmenting GA including uni-focal and multi-focal patches in fundus autofluorescene (FAF) images. The image features include region wise intensity (mean and variance) measures, gray level co-occurrence matrix measures (angular second moment, entropy, and inverse difference moment), and Gaussian filter banks. A k-nearest-neighbor (k-NN) pixel classifier is applied to obtain a GA probability map, representing the likelihood that the image pixel belongs to GA. A voting binary iterative hole filling filter is then applied to fill in the small holes. Sixteen randomly chosen FAF images were obtained from sixteen subjects with GA. The algorithm-defined GA regions are compared with manual delineation performed by certified graders. Two-fold cross-validation is applied for the evaluation of the classification performance. The mean Dice similarity coefficients (DSC) between the algorithm- and manually-defined GA regions are 0.84 +/- 0.06 for one test and 0.83 +/- 0.07 for the other test and the area correlations between them are 0.99 (p < 0.05) and 0.94 (p < 0.05) respectively.
Comparative evaluation of support vector machine classification for computer aided detection of breast masses in mammography

NASA Astrophysics Data System (ADS)

Lesniak, J. M.; Hupse, R.; Blanc, R.; Karssemeijer, N.; Székely, G.

2012-08-01

False positive (FP) marks represent an obstacle for effective use of computer-aided detection (CADe) of breast masses in mammography. Typically, the problem can be approached either by developing more discriminative features or by employing different classifier designs. In this paper, the usage of support vector machine (SVM) classification for FP reduction in CADe is investigated, presenting a systematic quantitative evaluation against neural networks, k-nearest neighbor classification, linear discriminant analysis and random forests. A large database of 2516 film mammography examinations and 73 input features was used to train the classifiers and evaluate for their performance on correctly diagnosed exams as well as false negatives. Further, classifier robustness was investigated using varying training data and feature sets as input. The evaluation was based on the mean exam sensitivity in 0.05-1 FPs on normals on the free-response receiver operating characteristic curve (FROC), incorporated into a tenfold cross validation framework. It was found that SVM classification using a Gaussian kernel offered significantly increased detection performance (P = 0.0002) compared to the reference methods. Varying training data and input features, SVMs showed improved exploitation of large feature sets. It is concluded that with the SVM-based CADe a significant reduction of FPs is possible outperforming other state-of-the-art approaches for breast mass CADe.
Motion data classification on the basis of dynamic time warping with a cloud point distance measure

NASA Astrophysics Data System (ADS)

Switonski, Adam; Josinski, Henryk; Zghidi, Hafedh; Wojciechowski, Konrad

2016-06-01

The paper deals with the problem of classification of model free motion data. The nearest neighbors classifier which is based on comparison performed by Dynamic Time Warping transform with cloud point distance measure is proposed. The classification utilizes both specific gait features reflected by a movements of subsequent skeleton joints and anthropometric data. To validate proposed approach human gait identification challenge problem is taken into consideration. The motion capture database containing data of 30 different humans collected in Human Motion Laboratory of Polish-Japanese Academy of Information Technology is used. The achieved results are satisfactory, the obtained accuracy of human recognition exceeds 90%. What is more, the applied cloud point distance measure does not depend on calibration process of motion capture system which results in reliable validation.
Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study.

PubMed

Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa

2018-07-01

Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Structure of the first- and second-neighbor shells of simulated water: Quantitative relation to translational and orientational order

NASA Astrophysics Data System (ADS)

Yan, Zhenyu; Buldyrev, Sergey V.; Kumar, Pradeep; Giovambattista, Nicolas; Debenedetti, Pablo G.; Stanley, H. Eugene

2007-11-01

We perform molecular dynamics simulations of water using the five-site transferable interaction potential (TIP5P) model to quantify structural order in both the first shell (defined by four nearest neighbors) and second shell (defined by twelve next-nearest neighbors) of a central water molecule. We find that the anomalous decrease of orientational order upon compression occurs in both shells, but the anomalous decrease of translational order upon compression occurs mainly in the second shell. The decreases of translational order and orientational order upon compression (called the “structural anomaly”) are thus correlated only in the second shell. Our findings quantitatively confirm the qualitative idea that the thermodynamic, structural, and hence dynamic anomalies of water are related to changes upon compression in the second shell.
Modeling ready biodegradability of fragrance materials.

PubMed

Ceriani, Lidia; Papa, Ester; Kovarich, Simona; Boethling, Robert; Gramatica, Paola

2015-06-01

In the present study, quantitative structure activity relationships were developed for predicting ready biodegradability of approximately 200 heterogeneous fragrance materials. Two classification methods, classification and regression tree (CART) and k-nearest neighbors (kNN), were applied to perform the modeling. The models were validated with multiple external prediction sets, and the structural applicability domain was verified by the leverage approach. The best models had good sensitivity (internal ≥80%; external ≥68%), specificity (internal ≥80%; external 73%), and overall accuracy (≥75%). Results from the comparison with BIOWIN global models, based on group contribution method, show that specific models developed in the present study perform better in prediction than BIOWIN6, in particular for the correct classification of not readily biodegradable fragrance materials. © 2015 SETAC.
Detecting epileptic seizure with different feature extracting strategies using robust machine learning classification techniques by applying advance parameter optimization approach.

PubMed

Hussain, Lal

2018-06-01

Epilepsy is a neurological disorder produced due to abnormal excitability of neurons in the brain. The research reveals that brain activity is monitored through electroencephalogram (EEG) of patients suffered from seizure to detect the epileptic seizure. The performance of EEG detection based epilepsy require feature extracting strategies. In this research, we have extracted varying features extracting strategies based on time and frequency domain characteristics, nonlinear, wavelet based entropy and few statistical features. A deeper study was undertaken using novel machine learning classifiers by considering multiple factors. The support vector machine kernels are evaluated based on multiclass kernel and box constraint level. Likewise, for K-nearest neighbors (KNN), we computed the different distance metrics, Neighbor weights and Neighbors. Similarly, the decision trees we tuned the paramours based on maximum splits and split criteria and ensemble classifiers are evaluated based on different ensemble methods and learning rate. For training/testing tenfold Cross validation was employed and performance was evaluated in form of TPR, NPR, PPV, accuracy and AUC. In this research, a deeper analysis approach was performed using diverse features extracting strategies using robust machine learning classifiers with more advanced optimal options. Support Vector Machine linear kernel and KNN with City block distance metric give the overall highest accuracy of 99.5% which was higher than using the default parameters for these classifiers. Moreover, highest separation (AUC = 0.9991, 0.9990) were obtained at different kernel scales using SVM. Additionally, the K-nearest neighbors with inverse squared distance weight give higher performance at different Neighbors. Moreover, to distinguish the postictal heart rate oscillations from epileptic ictal subjects, and highest performance of 100% was obtained using different machine learning classifiers.
Automatic classification and detection of clinically relevant images for diabetic retinopathy

NASA Astrophysics Data System (ADS)

Xu, Xinyu; Li, Baoxin

2008-03-01

We proposed a novel approach to automatic classification of Diabetic Retinopathy (DR) images and retrieval of clinically-relevant DR images from a database. Given a query image, our approach first classifies the image into one of the three categories: microaneurysm (MA), neovascularization (NV) and normal, and then it retrieves DR images that are clinically-relevant to the query image from an archival image database. In the classification stage, the query DR images are classified by the Multi-class Multiple-Instance Learning (McMIL) approach, where images are viewed as bags, each of which contains a number of instances corresponding to non-overlapping blocks, and each block is characterized by low-level features including color, texture, histogram of edge directions, and shape. McMIL first learns a collection of instance prototypes for each class that maximizes the Diverse Density function using Expectation- Maximization algorithm. A nonlinear mapping is then defined using the instance prototypes and maps every bag to a point in a new multi-class bag feature space. Finally a multi-class Support Vector Machine is trained in the multi-class bag feature space. In the retrieval stage, we retrieve images from the archival database who bear the same label with the query image, and who are the top K nearest neighbors of the query image in terms of similarity in the multi-class bag feature space. The classification approach achieves high classification accuracy, and the retrieval of clinically-relevant images not only facilitates utilization of the vast amount of hidden diagnostic knowledge in the database, but also improves the efficiency and accuracy of DR lesion diagnosis and assessment.
Automatic morphological classification of galaxy images

PubMed Central

Shamir, Lior

2009-01-01

We describe an image analysis supervised learning algorithm that can automatically classify galaxy images. The algorithm is first trained using a manually classified images of elliptical, spiral, and edge-on galaxies. A large set of image features is extracted from each image, and the most informative features are selected using Fisher scores. Test images can then be classified using a simple Weighted Nearest Neighbor rule such that the Fisher scores are used as the feature weights. Experimental results show that galaxy images from Galaxy Zoo can be classified automatically to spiral, elliptical and edge-on galaxies with accuracy of ~90% compared to classifications carried out by the author. Full compilable source code of the algorithm is available for free download, and its general-purpose nature makes it suitable for other uses that involve automatic image analysis of celestial objects. PMID:20161594
Automatic Recognition of Breathing Route During Sleep Using Snoring Sounds

NASA Astrophysics Data System (ADS)

Mikami, Tsuyoshi; Kojima, Yohichiro

This letter classifies snoring sounds into three breathing routes (oral, nasal, and oronasal) with discriminant analysis of the power spectra and k-nearest neighbor method. It is necessary to recognize breathing route during snoring, because oral snoring is a typical symptom of sleep apnea but we cannot know our own breathing and snoring condition during sleep. As a result, about 98.8% classification rate is obtained by using leave-one-out test for performance evaluation.
Rapid differentiation of Ghana cocoa beans by FT-NIR spectroscopy coupled with multivariate classification

NASA Astrophysics Data System (ADS)

Teye, Ernest; Huang, Xingyi; Dai, Huang; Chen, Quansheng

2013-10-01

Quick, accurate and reliable technique for discrimination of cocoa beans according to geographical origin is essential for quality control and traceability management. This current study presents the application of Near Infrared Spectroscopy technique and multivariate classification for the differentiation of Ghana cocoa beans. A total of 194 cocoa bean samples from seven cocoa growing regions were used. Principal component analysis (PCA) was used to extract relevant information from the spectral data and this gave visible cluster trends. The performance of four multivariate classification methods: Linear discriminant analysis (LDA), K-nearest neighbors (KNN), Back propagation artificial neural network (BPANN) and Support vector machine (SVM) were compared. The performances of the models were optimized by cross validation. The results revealed that; SVM model was superior to all the mathematical methods with a discrimination rate of 100% in both the training and prediction set after preprocessing with Mean centering (MC). BPANN had a discrimination rate of 99.23% for the training set and 96.88% for prediction set. While LDA model had 96.15% and 90.63% for the training and prediction sets respectively. KNN model had 75.01% for the training set and 72.31% for prediction set. The non-linear classification methods used were superior to the linear ones. Generally, the results revealed that NIR Spectroscopy coupled with SVM model could be used successfully to discriminate cocoa beans according to their geographical origins for effective quality assurance.
Multi-color space threshold segmentation and self-learning k-NN algorithm for surge test EUT status identification

NASA Astrophysics Data System (ADS)

Huang, Jian; Liu, Gui-xiong

2016-09-01

The identification of targets varies in different surge tests. A multi-color space threshold segmentation and self-learning k-nearest neighbor algorithm ( k-NN) for equipment under test status identification was proposed after using feature matching to identify equipment status had to train new patterns every time before testing. First, color space (L*a*b*, hue saturation lightness (HSL), hue saturation value (HSV)) to segment was selected according to the high luminance points ratio and white luminance points ratio of the image. Second, the unknown class sample S r was classified by the k-NN algorithm with training set T z according to the feature vector, which was formed from number of pixels, eccentricity ratio, compactness ratio, and Euler's numbers. Last, while the classification confidence coefficient equaled k, made S r as one sample of pre-training set T z '. The training set T z increased to T z+1 by T z ' if T z ' was saturated. In nine series of illuminant, indicator light, screen, and disturbances samples (a total of 21600 frames), the algorithm had a 98.65%identification accuracy, also selected five groups of samples to enlarge the training set from T 0 to T 5 by itself.
Aided diagnosis methods of breast cancer based on machine learning

NASA Astrophysics Data System (ADS)

Zhao, Yue; Wang, Nian; Cui, Xiaoyu

2017-08-01

In the field of medicine, quickly and accurately determining whether the patient is malignant or benign is the key to treatment. In this paper, K-Nearest Neighbor, Linear Discriminant Analysis, Logistic Regression were applied to predict the classification of thyroid,Her-2,PR,ER,Ki67,metastasis and lymph nodes in breast cancer, in order to recognize the benign and malignant breast tumors and achieve the purpose of aided diagnosis of breast cancer. The results showed that the highest classification accuracy of LDA was 88.56%, while the classification effect of KNN and Logistic Regression were better than that of LDA, the best accuracy reached 96.30%.
Adaptive phase k-means algorithm for waveform classification

NASA Astrophysics Data System (ADS)

Song, Chengyun; Liu, Zhining; Wang, Yaojun; Xu, Feng; Li, Xingming; Hu, Guangmin

2018-01-01

Waveform classification is a powerful technique for seismic facies analysis that describes the heterogeneity and compartments within a reservoir. Horizon interpretation is a critical step in waveform classification. However, the horizon often produces inconsistent waveform phase, and thus results in an unsatisfied classification. To alleviate this problem, an adaptive phase waveform classification method called the adaptive phase k-means is introduced in this paper. Our method improves the traditional k-means algorithm using an adaptive phase distance for waveform similarity measure. The proposed distance is a measure with variable phases as it moves from sample to sample along the traces. Model traces are also updated with the best phase interference in the iterative process. Therefore, our method is robust to phase variations caused by the interpretation horizon. We tested the effectiveness of our algorithm by applying it to synthetic and real data. The satisfactory results reveal that the proposed method tolerates certain waveform phase variation and is a good tool for seismic facies analysis.
A k-nearest neighbor approach for estimation of single-tree biomass

Treesearch

Lutz Fehrmann; Christoph Kleinn

2007-01-01

Allometric biomass models are typically site and species specific. They are mostly based on a low number of independent variables such as diameter at breast height and tree height. Because of relatively small datasets, their validity is limited to the set of conditions of the study, such as site conditions and diameter range. One challenge in the context of the current...
Impact of distance-based metric learning on classification and visualization model performance and structure-activity landscapes.

PubMed

Kireeva, Natalia V; Ovchinnikova, Svetlana I; Kuznetsov, Sergey L; Kazennov, Andrey M; Tsivadze, Aslan Yu

2014-02-01

This study concerns large margin nearest neighbors classifier and its multi-metric extension as the efficient approaches for metric learning which aimed to learn an appropriate distance/similarity function for considered case studies. In recent years, many studies in data mining and pattern recognition have demonstrated that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. The paper describes application of the metric learning approach to in silico assessment of chemical liabilities. Chemical liabilities, such as adverse effects and toxicity, play a significant role in drug discovery process, in silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Here, to our knowledge for the first time, a distance-based metric learning procedures have been applied for in silico assessment of chemical liabilities, the impact of metric learning on structure-activity landscapes and predictive performance of developed models has been analyzed, the learned metric was used in support vector machines. The metric learning results have been illustrated using linear and non-linear data visualization techniques in order to indicate how the change of metrics affected nearest neighbors relations and descriptor space.
Impact of distance-based metric learning on classification and visualization model performance and structure-activity landscapes

NASA Astrophysics Data System (ADS)

Kireeva, Natalia V.; Ovchinnikova, Svetlana I.; Kuznetsov, Sergey L.; Kazennov, Andrey M.; Tsivadze, Aslan Yu.

2014-02-01

This study concerns large margin nearest neighbors classifier and its multi-metric extension as the efficient approaches for metric learning which aimed to learn an appropriate distance/similarity function for considered case studies. In recent years, many studies in data mining and pattern recognition have demonstrated that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. The paper describes application of the metric learning approach to in silico assessment of chemical liabilities. Chemical liabilities, such as adverse effects and toxicity, play a significant role in drug discovery process, in silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Here, to our knowledge for the first time, a distance-based metric learning procedures have been applied for in silico assessment of chemical liabilities, the impact of metric learning on structure-activity landscapes and predictive performance of developed models has been analyzed, the learned metric was used in support vector machines. The metric learning results have been illustrated using linear and non-linear data visualization techniques in order to indicate how the change of metrics affected nearest neighbors relations and descriptor space.
Chaotic Particle Swarm Optimization with Mutation for Classification

PubMed Central

Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza

2015-01-01

In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms. PMID:25709937
Chaotic particle swarm optimization with mutation for classification.

PubMed

Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza

2015-01-01

In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms.
Local classification: Locally weighted-partial least squares-discriminant analysis (LW-PLS-DA).

PubMed

Bevilacqua, Marta; Marini, Federico

2014-08-01

The possibility of devising a simple, flexible and accurate non-linear classification method, by extending the locally weighted partial least squares (LW-PLS) approach to the cases where the algorithm is used in a discriminant way (partial least squares discriminant analysis, PLS-DA), is presented. In particular, to assess which category an unknown sample belongs to, the proposed algorithm operates by identifying which training objects are most similar to the one to be predicted and building a PLS-DA model using these calibration samples only. Moreover, the influence of the selected training samples on the local model can be further modulated by adopting a not uniform distance-based weighting scheme which allows the farthest calibration objects to have less impact than the closest ones. The performances of the proposed locally weighted-partial least squares-discriminant analysis (LW-PLS-DA) algorithm have been tested on three simulated data sets characterized by a varying degree of non-linearity: in all cases, a classification accuracy higher than 99% on external validation samples was achieved. Moreover, when also applied to a real data set (classification of rice varieties), characterized by a high extent of non-linearity, the proposed method provided an average correct classification rate of about 93% on the test set. By the preliminary results, showed in this paper, the performances of the proposed LW-PLS-DA approach have proved to be comparable and in some cases better than those obtained by other non-linear methods (k nearest neighbors, kernel-PLS-DA and, in the case of rice, counterpropagation neural networks). Copyright © 2014 Elsevier B.V. All rights reserved.

An optimized video system for augmented reality in endodontics: a feasibility study.

PubMed

Bruellmann, D D; Tjaden, H; Schwanecke, U; Barth, P

2013-03-01

We propose an augmented reality system for the reliable detection of root canals in video sequences based on a k-nearest neighbor color classification and introduce a simple geometric criterion for teeth. The new software was implemented using C++, Qt, and the image processing library OpenCV. Teeth are detected in video images to restrict the segmentation of the root canal orifices by using a k-nearest neighbor algorithm. The location of the root canal orifices were determined using Euclidean distance-based image segmentation. A set of 126 human teeth with known and verified locations of the root canal orifices was used for evaluation. The software detects root canals orifices for automatic classification of the teeth in video images and stores location and size of the found structures. Overall 287 of 305 root canals were correctly detected. The overall sensitivity was about 94 %. Classification accuracy for molars ranged from 65.0 to 81.2 % and from 85.7 to 96.7 % for premolars. The realized software shows that observations made in anatomical studies can be exploited to automate real-time detection of root canal orifices and tooth classification with a software system. Automatic storage of location, size, and orientation of the found structures with this software can be used for future anatomical studies. Thus, statistical tables with canal locations will be derived, which can improve anatomical knowledge of the teeth to alleviate root canal detection in the future. For this purpose the software is freely available at: http://www.dental-imaging.zahnmedizin.uni-mainz.de/.
Functional Basis of Microorganism Classification.

PubMed

Zhu, Chengsheng; Delmont, Tom O; Vogel, Timothy M; Bromberg, Yana

2015-08-01

Correctly identifying nearest "neighbors" of a given microorganism is important in industrial and clinical applications where close relationships imply similar treatment. Microbial classification based on similarity of physiological and genetic organism traits (polyphasic similarity) is experimentally difficult and, arguably, subjective. Evolutionary relatedness, inferred from phylogenetic markers, facilitates classification but does not guarantee functional identity between members of the same taxon or lack of similarity between different taxa. Using over thirteen hundred sequenced bacterial genomes, we built a novel function-based microorganism classification scheme, functional-repertoire similarity-based organism network (FuSiON; flattened to fusion). Our scheme is phenetic, based on a network of quantitatively defined organism relationships across the known prokaryotic space. It correlates significantly with the current taxonomy, but the observed discrepancies reveal both (1) the inconsistency of functional diversity levels among different taxa and (2) an (unsurprising) bias towards prioritizing, for classification purposes, relatively minor traits of particular interest to humans. Our dynamic network-based organism classification is independent of the arbitrary pairwise organism similarity cut-offs traditionally applied to establish taxonomic identity. Instead, it reveals natural, functionally defined organism groupings and is thus robust in handling organism diversity. Additionally, fusion can use organism meta-data to highlight the specific environmental factors that drive microbial diversification. Our approach provides a complementary view to cladistic assignments and holds important clues for further exploration of microbial lifestyles. Fusion is a more practical fit for biomedical, industrial, and ecological applications, as many of these rely on understanding the functional capabilities of the microbes in their environment and are less concerned with
Method of Grassland Information Extraction Based on Multi-Level Segmentation and Cart Model

NASA Astrophysics Data System (ADS)

Qiao, Y.; Chen, T.; He, J.; Wen, Q.; Liu, F.; Wang, Z.

2018-04-01

It is difficult to extract grassland accurately by traditional classification methods, such as supervised method based on pixels or objects. This paper proposed a new method combing the multi-level segmentation with CART (classification and regression tree) model. The multi-level segmentation which combined the multi-resolution segmentation and the spectral difference segmentation could avoid the over and insufficient segmentation seen in the single segmentation mode. The CART model was established based on the spectral characteristics and texture feature which were excavated from training sample data. Xilinhaote City in Inner Mongolia Autonomous Region was chosen as the typical study area and the proposed method was verified by using visual interpretation results as approximate truth value. Meanwhile, the comparison with the nearest neighbor supervised classification method was obtained. The experimental results showed that the total precision of classification and the Kappa coefficient of the proposed method was 95 % and 0.9, respectively. However, the total precision of classification and the Kappa coefficient of the nearest neighbor supervised classification method was 80 % and 0.56, respectively. The result suggested that the accuracy of classification proposed in this paper was higher than the nearest neighbor supervised classification method. The experiment certificated that the proposed method was an effective extraction method of grassland information, which could enhance the boundary of grassland classification and avoid the restriction of grassland distribution scale. This method was also applicable to the extraction of grassland information in other regions with complicated spatial features, which could avoid the interference of woodland, arable land and water body effectively.
Similarity-balanced discriminant neighbor embedding and its application to cancer classification based on gene expression data.

PubMed

Zhang, Li; Qian, Liqiang; Ding, Chuntao; Zhou, Weida; Li, Fanzhang

2015-09-01

The family of discriminant neighborhood embedding (DNE) methods is typical graph-based methods for dimension reduction, and has been successfully applied to face recognition. This paper proposes a new variant of DNE, called similarity-balanced discriminant neighborhood embedding (SBDNE) and applies it to cancer classification using gene expression data. By introducing a novel similarity function, SBDNE deals with two data points in the same class and the different classes with different ways. The homogeneous and heterogeneous neighbors are selected according to the new similarity function instead of the Euclidean distance. SBDNE constructs two adjacent graphs, or between-class adjacent graph and within-class adjacent graph, using the new similarity function. According to these two adjacent graphs, we can generate the local between-class scatter and the local within-class scatter, respectively. Thus, SBDNE can maximize the between-class scatter and simultaneously minimize the within-class scatter to find the optimal projection matrix. Experimental results on six microarray datasets show that SBDNE is a promising method for cancer classification. Copyright © 2015 Elsevier Ltd. All rights reserved.
Effectiveness of Global Features for Automatic Medical Image Classification and Retrieval – the experiences of OHSU at ImageCLEFmed

PubMed Central

Kalpathy-Cramer, Jayashree; Hersh, William

2008-01-01

In 2006 and 2007, Oregon Health & Science University (OHSU) participated in the automatic image annotation task for medical images at ImageCLEF, an annual international benchmarking event that is part of the Cross Language Evaluation Forum (CLEF). The goal of the automatic annotation task was to classify 1000 test images based on the Image Retrieval in Medical Applications (IRMA) code, given a set of 10,000 training images. There were 116 distinct classes in 2006 and 2007. We evaluated the efficacy of a variety of primarily global features for this classification task. These included features based on histograms, gray level correlation matrices and the gist technique. A multitude of classifiers including k-nearest neighbors, two-level neural networks, support vector machines, and maximum likelihood classifiers were evaluated. Our official error rates for the 1000 test images were 26% in 2006 using the flat classification structure. The error count in 2007 was 67.8 using the hierarchical classification error computation based on the IRMA code in 2007. Confusion matrices as well as clustering experiments were used to identify visually similar classes. The use of the IRMA code did not help us in the classification task as the semantic hierarchy of the IRMA classes did not correspond well with the hierarchy based on clustering of image features that we used. Our most frequent misclassification errors were along the view axis. Subsequent experiments based on a two-stage classification system decreased our error rate to 19.8% for the 2006 dataset and our error count to 55.4 for the 2007 data. PMID:19884953
SVM classification of microaneurysms with imbalanced dataset based on borderline-SMOTE and data cleaning techniques

NASA Astrophysics Data System (ADS)

Wang, Qingjie; Xin, Jingmin; Wu, Jiayi; Zheng, Nanning

2017-03-01

Microaneurysms are the earliest clinic signs of diabetic retinopathy, and many algorithms were developed for the automatic classification of these specific pathology. However, the imbalanced class distribution of dataset usually causes the classification accuracy of true microaneurysms be low. Therefore, by combining the borderline synthetic minority over-sampling technique (BSMOTE) with the data cleaning techniques such as Tomek links and Wilson's edited nearest neighbor rule (ENN) to resample the imbalanced dataset, we propose two new support vector machine (SVM) classification algorithms for the microaneurysms. The proposed BSMOTE-Tomek and BSMOTE-ENN algorithms consist of: 1) the adaptive synthesis of the minority samples in the neighborhood of the borderline, and 2) the remove of redundant training samples for improving the efficiency of data utilization. Moreover, the modified SVM classifier with probabilistic outputs is used to divide the microaneurysm candidates into two groups: true microaneurysms and false microaneurysms. The experiments with a public microaneurysms database shows that the proposed algorithms have better classification performance including the receiver operating characteristic (ROC) curve and the free-response receiver operating characteristic (FROC) curve.
Parallel exploitation of a spatial-spectral classification approach for hyperspectral images on RVC-CAL

NASA Astrophysics Data System (ADS)

Lazcano, R.; Madroñal, D.; Fabelo, H.; Ortega, S.; Salvador, R.; Callicó, G. M.; Juárez, E.; Sanz, C.

2017-10-01

Hyperspectral Imaging (HI) assembles high resolution spectral information from hundreds of narrow bands across the electromagnetic spectrum, thus generating 3D data cubes in which each pixel gathers the spectral information of the reflectance of every spatial pixel. As a result, each image is composed of large volumes of data, which turns its processing into a challenge, as performance requirements have been continuously tightened. For instance, new HI applications demand real-time responses. Hence, parallel processing becomes a necessity to achieve this requirement, so the intrinsic parallelism of the algorithms must be exploited. In this paper, a spatial-spectral classification approach has been implemented using a dataflow language known as RVCCAL. This language represents a system as a set of functional units, and its main advantage is that it simplifies the parallelization process by mapping the different blocks over different processing units. The spatial-spectral classification approach aims at refining the classification results previously obtained by using a K-Nearest Neighbors (KNN) filtering process, in which both the pixel spectral value and the spatial coordinates are considered. To do so, KNN needs two inputs: a one-band representation of the hyperspectral image and the classification results provided by a pixel-wise classifier. Thus, spatial-spectral classification algorithm is divided into three different stages: a Principal Component Analysis (PCA) algorithm for computing the one-band representation of the image, a Support Vector Machine (SVM) classifier, and the KNN-based filtering algorithm. The parallelization of these algorithms shows promising results in terms of computational time, as the mapping of them over different cores presents a speedup of 2.69x when using 3 cores. Consequently, experimental results demonstrate that real-time processing of hyperspectral images is achievable.
Comparison of EEG-Features and Classification Methods for Motor Imagery in Patients with Disorders of Consciousness

PubMed Central

Höller, Yvonne; Bergmann, Jürgen; Thomschewski, Aljoscha; Kronbichler, Martin; Höller, Peter; Crone, Julia S.; Schmid, Elisabeth V.; Butz, Kevin; Nardone, Raffaele; Trinka, Eugen

2013-01-01

Current research aims at identifying voluntary brain activation in patients who are behaviorally diagnosed as being unconscious, but are able to perform commands by modulating their brain activity patterns. This involves machine learning techniques and feature extraction methods such as applied in brain computer interfaces. In this study, we try to answer the question if features/classification methods which show advantages in healthy participants are also accurate when applied to data of patients with disorders of consciousness. A sample of healthy participants (N = 22), patients in a minimally conscious state (MCS; N = 5), and with unresponsive wakefulness syndrome (UWS; N = 9) was examined with a motor imagery task which involved imagery of moving both hands and an instruction to hold both hands firm. We extracted a set of 20 features from the electroencephalogram and used linear discriminant analysis, k-nearest neighbor classification, and support vector machines (SVM) as classification methods. In healthy participants, the best classification accuracies were seen with coherences (mean = .79; range = .53−.94) and power spectra (mean = .69; range = .40−.85). The coherence patterns in healthy participants did not match the expectation of central modulated -rhythm. Instead, coherence involved mainly frontal regions. In healthy participants, the best classification tool was SVM. Five patients had at least one feature-classifier outcome with p0.05 (none of which were coherence or power spectra), though none remained significant after false-discovery rate correction for multiple comparisons. The present work suggests the use of coherences in patients with disorders of consciousness because they show high reliability among healthy subjects and patient groups. However, feature extraction and classification is a challenging task in unresponsive patients because there is no ground truth to validate the results. PMID:24282545
Android Malware Classification Using K-Means Clustering Algorithm

NASA Astrophysics Data System (ADS)

Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah

2017-08-01

Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.
Mapping forested wetlands in the Great Zhan River Basin through integrating optical, radar, and topographical data classification techniques.

PubMed

Na, X D; Zang, S Y; Wu, C S; Li, W L

2015-11-01

Knowledge of the spatial extent of forested wetlands is essential to many studies including wetland functioning assessment, greenhouse gas flux estimation, and wildlife suitable habitat identification. For discriminating forested wetlands from their adjacent land cover types, researchers have resorted to image analysis techniques applied to numerous remotely sensed data. While with some success, there is still no consensus on the optimal approaches for mapping forested wetlands. To address this problem, we examined two machine learning approaches, random forest (RF) and K-nearest neighbor (KNN) algorithms, and applied these two approaches to the framework of pixel-based and object-based classifications. The RF and KNN algorithms were constructed using predictors derived from Landsat 8 imagery, Radarsat-2 advanced synthetic aperture radar (SAR), and topographical indices. The results show that the objected-based classifications performed better than per-pixel classifications using the same algorithm (RF) in terms of overall accuracy and the difference of their kappa coefficients are statistically significant (p<0.01). There were noticeably omissions for forested and herbaceous wetlands based on the per-pixel classifications using the RF algorithm. As for the object-based image analysis, there were also statistically significant differences (p<0.01) of Kappa coefficient between results performed based on RF and KNN algorithms. The object-based classification using RF provided a more visually adequate distribution of interested land cover types, while the object classifications based on the KNN algorithm showed noticeably commissions for forested wetlands and omissions for agriculture land. This research proves that the object-based classification with RF using optical, radar, and topographical data improved the mapping accuracy of land covers and provided a feasible approach to discriminate the forested wetlands from the other land cover types in forestry area.
Classification of sodium MRI data of cartilage using machine learning.

PubMed

Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R

2015-11-01

To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.
Machine Learning Methods for Production Cases Analysis

NASA Astrophysics Data System (ADS)

Mokrova, Nataliya V.; Mokrov, Alexander M.; Safonova, Alexandra V.; Vishnyakov, Igor V.

2018-03-01

Approach to analysis of events occurring during the production process were proposed. Described machine learning system is able to solve classification tasks related to production control and hazard identification at an early stage. Descriptors of the internal production network data were used for training and testing of applied models. k-Nearest Neighbors and Random forest methods were used to illustrate and analyze proposed solution. The quality of the developed classifiers was estimated using standard statistical metrics, such as precision, recall and accuracy.
Classification of mathematics deficiency using shape and scale analysis of 3D brain structures

NASA Astrophysics Data System (ADS)

Kurtek, Sebastian; Klassen, Eric; Gore, John C.; Ding, Zhaohua; Srivastava, Anuj

2011-03-01

We investigate the use of a recent technique for shape analysis of brain substructures in identifying learning disabilities in third-grade children. This Riemannian technique provides a quantification of differences in shapes of parameterized surfaces, using a distance that is invariant to rigid motions and re-parameterizations. Additionally, it provides an optimal registration across surfaces for improved matching and comparisons. We utilize an efficient gradient based method to obtain the optimal re-parameterizations of surfaces. In this study we consider 20 different substructures in the human brain and correlate the differences in their shapes with abnormalities manifested in deficiency of mathematical skills in 106 subjects. The selection of these structures is motivated in part by the past links between their shapes and cognitive skills, albeit in broader contexts. We have studied the use of both individual substructures and multiple structures jointly for disease classification. Using a leave-one-out nearest neighbor classifier, we obtained a 62.3% classification rate based on the shape of the left hippocampus. The use of multiple structures resulted in an improved classification rate of 71.4%.
Predicting Flavonoid UGT Regioselectivity

PubMed Central

Jackson, Rhydon; Knisley, Debra; McIntosh, Cecilia; Pfeiffer, Phillip

2011-01-01

Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities. PMID:21747849
Comparison of GOES Cloud Classification Algorithms Employing Explicit and Implicit Physics

NASA Technical Reports Server (NTRS)

Bankert, Richard L.; Mitrescu, Cristian; Miller, Steven D.; Wade, Robert H.

2009-01-01

Cloud-type classification based on multispectral satellite imagery data has been widely researched and demonstrated to be useful for distinguishing a variety of classes using a wide range of methods. The research described here is a comparison of the classifier output from two very different algorithms applied to Geostationary Operational Environmental Satellite (GOES) data over the course of one year. The first algorithm employs spectral channel thresholding and additional physically based tests. The second algorithm was developed through a supervised learning method with characteristic features of expertly labeled image samples used as training data for a 1-nearest-neighbor classification. The latter's ability to identify classes is also based in physics, but those relationships are embedded implicitly within the algorithm. A pixel-to-pixel comparison analysis was done for hourly daytime scenes within a region in the northeastern Pacific Ocean. Considerable agreement was found in this analysis, with many of the mismatches or disagreements providing insight to the strengths and limitations of each classifier. Depending upon user needs, a rule-based or other postprocessing system that combines the output from the two algorithms could provide the most reliable cloud-type classification.
Identifying the most influential spreaders in complex networks by an Extended Local K-Shell Sum

NASA Astrophysics Data System (ADS)

Yang, Fan; Zhang, Ruisheng; Yang, Zhao; Hu, Rongjing; Li, Mengtian; Yuan, Yongna; Li, Keqin

Identifying influential spreaders is crucial for developing strategies to control the spreading process on complex networks. Following the well-known K-Shell (KS) decomposition, several improved measures are proposed. However, these measures cannot identify the most influential spreaders accurately. In this paper, we define a Local K-Shell Sum (LKSS) by calculating the sum of the K-Shell indices of the neighbors within 2-hops of a given node. Based on the LKSS, we propose an Extended Local K-Shell Sum (ELKSS) centrality to rank spreaders. The ELKSS is defined as the sum of the LKSS of the nearest neighbors of a given node. By assuming that the spreading process on networks follows the Susceptible-Infectious-Recovered (SIR) model, we perform extensive simulations on a series of real networks to compare the performance between the ELKSS centrality and other six measures. The results show that the ELKSS centrality has a better performance than the six measures to distinguish the spreading ability of nodes and to identify the most influential spreaders accurately.
Object-based land cover classification based on fusion of multifrequency SAR data and THAICHOTE optical imagery

NASA Astrophysics Data System (ADS)

Sukawattanavijit, Chanika; Srestasathiern, Panu

2017-10-01

Land Use and Land Cover (LULC) information are significant to observe and evaluate environmental change. LULC classification applying remotely sensed data is a technique popularly employed on a global and local dimension particularly, in urban areas which have diverse land cover types. These are essential components of the urban terrain and ecosystem. In the present, object-based image analysis (OBIA) is becoming widely popular for land cover classification using the high-resolution image. COSMO-SkyMed SAR data was fused with THAICHOTE (namely, THEOS: Thailand Earth Observation Satellite) optical data for land cover classification using object-based. This paper indicates a comparison between object-based and pixel-based approaches in image fusion. The per-pixel method, support vector machines (SVM) was implemented to the fused image based on Principal Component Analysis (PCA). For the objectbased classification was applied to the fused images to separate land cover classes by using nearest neighbor (NN) classifier. Finally, the accuracy assessment was employed by comparing with the classification of land cover mapping generated from fused image dataset and THAICHOTE image. The object-based data fused COSMO-SkyMed with THAICHOTE images demonstrated the best classification accuracies, well over 85%. As the results, an object-based data fusion provides higher land cover classification accuracy than per-pixel data fusion.
Feature weight estimation for gene selection: a local hyperlinear learning approach

PubMed Central

2014-01-01

Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071
Analysis and implementation of cross lingual short message service spam filtering using graph-based k-nearest neighbor

NASA Astrophysics Data System (ADS)

Ayu Cyntya Dewi, Dyah; Shaufiah; Asror, Ibnu

2018-03-01

SMS (Short Message Service) is on e of the communication services that still be the main choice, although now the phone grow with various applications. Along with the development of various other communication media, some countries lowered SMS rates to keep the interest of mobile users. It resulted in increased spam SMS that used by several parties, one of them for advertisement. Given the kind of multi-lingual documents in a message SMS, the Web, and others, necessary for effective multilingual or cross-lingual processing techniques is becoming increasingly important. The steps that performed in this research is data / messages first preprocessing then represented into a graph model. Then calculated using GKNN method. From this research we get the maximum accuracy is 98.86 with training data in Indonesian language and testing data in indonesian language with K 10 and threshold 0.001.
MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering

PubMed Central

Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu

2009-01-01

Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors. PMID:19698124

Classification and global distribution of ocean precipitation types based on satellite passive microwave signatures

NASA Astrophysics Data System (ADS)

Gautam, Nitin

The main objectives of this thesis are to develop a robust statistical method for the classification of ocean precipitation based on physical properties to which the SSM/I is sensitive and to examine how these properties vary globally and seasonally. A two step approach is adopted for the classification of oceanic precipitation classes from multispectral SSM/I data: (1)we subjectively define precipitation classes using a priori information about the precipitating system and its possible distinct signature on SSM/I data such as scattering by ice particles aloft in the precipitating cloud, emission by liquid rain water below freezing level, the difference of polarization at 19 GHz-an indirect measure of optical depth, etc.; (2)we then develop an objective classification scheme which is found to reproduce the subjective classification with high accuracy. This hybrid strategy allows us to use the characteristics of the data to define and encode classes and helps retain the physical interpretation of classes. The classification methods based on k-nearest neighbor and neural network are developed to objectively classify six precipitation classes. It is found that the classification method based neural network yields high accuracy for all precipitation classes. An inversion method based on minimum variance approach was used to retrieve gross microphysical properties of these precipitation classes such as column integrated liquid water path, column integrated ice water path, and column integrated min water path. This classification method is then applied to 2 years (1991-92) of SSM/I data to examine and document the seasonal and global distribution of precipitation frequency corresponding to each of these objectively defined six classes. The characteristics of the distribution are found to be consistent with assumptions used in defining these six precipitation classes and also with well known climatological patterns of precipitation regions. The seasonal and global
Modeling occupancy distribution in large spaces with multi-feature classification algorithm

DOE PAGES

Wang, Wei; Chen, Jiayu; Hong, Tianzhen

2018-04-07

We present that occupancy information enables robust and flexible control of heating, ventilation, and air-conditioning (HVAC) systems in buildings. In large spaces, multiple HVAC terminals are typically installed to provide cooperative services for different thermal zones, and the occupancy information determines the cooperation among terminals. However, a person count at room-level does not adequately optimize HVAC system operation due to the movement of occupants within the room that creates uneven load distribution. Without accurate knowledge of the occupants’ spatial distribution, the uneven distribution of occupants often results in under-cooling/heating or over-cooling/heating in some thermal zones. Therefore, the lack of high-resolutionmore » occupancy distribution is often perceived as a bottleneck for future improvements to HVAC operation efficiency. To fill this gap, this study proposes a multi-feature k-Nearest-Neighbors (k-NN) classification algorithm to extract occupancy distribution through reliable, low-cost Bluetooth Low Energy (BLE) networks. An on-site experiment was conducted in a typical office of an institutional building to demonstrate the proposed methods, and the experiment outcomes of three case studies were examined to validate detection accuracy. One method based on City Block Distance (CBD) was used to measure the distance between detected occupancy distribution and ground truth and assess the results of occupancy distribution. Finally, the results show the accuracy when CBD = 1 is over 71.4% and the accuracy when CBD = 2 can reach up to 92.9%.« less
Modeling occupancy distribution in large spaces with multi-feature classification algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Wei; Chen, Jiayu; Hong, Tianzhen

We present that occupancy information enables robust and flexible control of heating, ventilation, and air-conditioning (HVAC) systems in buildings. In large spaces, multiple HVAC terminals are typically installed to provide cooperative services for different thermal zones, and the occupancy information determines the cooperation among terminals. However, a person count at room-level does not adequately optimize HVAC system operation due to the movement of occupants within the room that creates uneven load distribution. Without accurate knowledge of the occupants’ spatial distribution, the uneven distribution of occupants often results in under-cooling/heating or over-cooling/heating in some thermal zones. Therefore, the lack of high-resolutionmore » occupancy distribution is often perceived as a bottleneck for future improvements to HVAC operation efficiency. To fill this gap, this study proposes a multi-feature k-Nearest-Neighbors (k-NN) classification algorithm to extract occupancy distribution through reliable, low-cost Bluetooth Low Energy (BLE) networks. An on-site experiment was conducted in a typical office of an institutional building to demonstrate the proposed methods, and the experiment outcomes of three case studies were examined to validate detection accuracy. One method based on City Block Distance (CBD) was used to measure the distance between detected occupancy distribution and ground truth and assess the results of occupancy distribution. Finally, the results show the accuracy when CBD = 1 is over 71.4% and the accuracy when CBD = 2 can reach up to 92.9%.« less
Impact of corpus domain for sentiment classification: An evaluation study using supervised machine learning techniques

NASA Astrophysics Data System (ADS)

Karsi, Redouane; Zaim, Mounia; El Alami, Jamila

2017-07-01

Thanks to the development of the internet, a large community now has the possibility to communicate and express its opinions and preferences through multiple media such as blogs, forums, social networks and e-commerce sites. Today, it becomes clearer that opinions published on the web are a very valuable source for decision-making, so a rapidly growing field of research called “sentiment analysis” is born to address the problem of automatically determining the polarity (Positive, negative, neutral,…) of textual opinions. People expressing themselves in a particular domain often use specific domain language expressions, thus, building a classifier, which performs well in different domains is a challenging problem. The purpose of this paper is to evaluate the impact of domain for sentiment classification when using machine learning techniques. In our study three popular machine learning techniques: Support Vector Machines (SVM), Naive Bayes and K nearest neighbors(KNN) were applied on datasets collected from different domains. Experimental results show that Support Vector Machines outperforms other classifiers in all domains, since it achieved at least 74.75% accuracy with a standard deviation of 4,08.
Fast segmentation of industrial quality pavement images using Laws texture energy measures and k -means clustering

NASA Astrophysics Data System (ADS)

Mathavan, Senthan; Kumar, Akash; Kamal, Khurram; Nieminen, Michael; Shah, Hitesh; Rahman, Mujib

2016-09-01

Thousands of pavement images are collected by road authorities daily for condition monitoring surveys. These images typically have intensity variations and texture nonuniformities that make their segmentation challenging. The automated segmentation of such pavement images is crucial for accurate, thorough, and expedited health monitoring of roads. In the pavement monitoring area, well-known texture descriptors, such as gray-level co-occurrence matrices and local binary patterns, are often used for surface segmentation and identification. These, despite being the established methods for texture discrimination, are inherently slow. This work evaluates Laws texture energy measures as a viable alternative for pavement images for the first time. k-means clustering is used to partition the feature space, limiting the human subjectivity in the process. Data classification, hence image segmentation, is performed by the k-nearest neighbor method. Laws texture energy masks are shown to perform well with resulting accuracy and precision values of more than 80%. The implementations of the algorithm, in both MATLAB® and OpenCV/C++, are extensively compared against the state of the art for execution speed, clearly showing the advantages of the proposed method. Furthermore, the OpenCV-based segmentation shows a 100% increase in processing speed when compared to the fastest algorithm available in literature.
Geographical traceability of Marsdenia tenacissima by Fourier transform infrared spectroscopy and chemometrics

NASA Astrophysics Data System (ADS)

Li, Chao; Yang, Sheng-Chao; Guo, Qiao-Sheng; Zheng, Kai-Yan; Wang, Ping-Li; Meng, Zhen-Gui

2016-01-01

A combination of Fourier transform infrared spectroscopy with chemometrics tools provided an approach for studying Marsdenia tenacissima according to its geographical origin. A total of 128 M. tenacissima samples from four provinces in China were analyzed with FTIR spectroscopy. Six pattern recognition methods were used to construct the discrimination models: support vector machine-genetic algorithms, support vector machine-particle swarm optimization, K-nearest neighbors, radial basis function neural network, random forest and support vector machine-grid search. Experimental results showed that K-nearest neighbors was superior to other mathematical algorithms after data were preprocessed with wavelet de-noising, with a discrimination rate of 100% in both the training and prediction sets. This study demonstrated that FTIR spectroscopy coupled with K-nearest neighbors could be successfully applied to determine the geographical origins of M. tenacissima samples, thereby providing reliable authentication in a rapid, cheap and noninvasive way.
Voice based gender classification using machine learning

NASA Astrophysics Data System (ADS)

Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

2017-11-01

Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.
Integrated Sensing and Processing (ISP) Phase II: Demonstration and Evaluation for Distributed Sensor Netowrks and Missile Seeker Systems

DTIC Science & Technology

2007-02-28

Shah, D. Waagen, H. Schmitt, S. Bellofiore, A. Spanias, and D. Cochran, 32nd International Conference on Acoustics, Speech , and Signal Processing...Information Exploitation Office kNN k-Nearest Neighbor LEAN Laplacian Eigenmap Adaptive Neighbor LIP Linear Integer Programming ISP
Feature extraction and classification of clouds in high resolution panchromatic satellite imagery

NASA Astrophysics Data System (ADS)

Sharghi, Elan

The development of sophisticated remote sensing sensors is rapidly increasing, and the vast amount of satellite imagery collected is too much to be analyzed manually by a human image analyst. It has become necessary for a tool to be developed to automate the job of an image analyst. This tool would need to intelligently detect and classify objects of interest through computer vision algorithms. Existing software called the Rapid Image Exploitation Resource (RAPIER®) was designed by engineers at Space and Naval Warfare Systems Center Pacific (SSC PAC) to perform exactly this function. This software automatically searches for anomalies in the ocean and reports the detections as a possible ship object. However, if the image contains a high percentage of cloud coverage, a high number of false positives are triggered by the clouds. The focus of this thesis is to explore various feature extraction and classification methods to accurately distinguish clouds from ship objects. An examination of a texture analysis method, line detection using the Hough transform, and edge detection using wavelets are explored as possible feature extraction methods. The features are then supplied to a K-Nearest Neighbors (KNN) or Support Vector Machine (SVM) classifier. Parameter options for these classifiers are explored and the optimal parameters are determined.
Automatic segmentation and classification of gestational sac based on mean sac diameter using medical ultrasound image

NASA Astrophysics Data System (ADS)

Khazendar, Shan; Farren, Jessica; Al-Assam, Hisham; Sayasneh, Ahmed; Du, Hongbo; Bourne, Tom; Jassim, Sabah A.

2014-05-01

Ultrasound is an effective multipurpose imaging modality that has been widely used for monitoring and diagnosing early pregnancy events. Technology developments coupled with wide public acceptance has made ultrasound an ideal tool for better understanding and diagnosing of early pregnancy. The first measurable signs of an early pregnancy are the geometric characteristics of the Gestational Sac (GS). Currently, the size of the GS is manually estimated from ultrasound images. The manual measurement involves multiple subjective decisions, in which dimensions are taken in three planes to establish what is known as Mean Sac Diameter (MSD). The manual measurement results in inter- and intra-observer variations, which may lead to difficulties in diagnosis. This paper proposes a fully automated diagnosis solution to accurately identify miscarriage cases in the first trimester of pregnancy based on automatic quantification of the MSD. Our study shows a strong positive correlation between the manual and the automatic MSD estimations. Our experimental results based on a dataset of 68 ultrasound images illustrate the effectiveness of the proposed scheme in identifying early miscarriage cases with classification accuracies comparable with those of domain experts using K nearest neighbor classifier on automatically estimated MSDs.
Non-parametric analysis of LANDSAT maps using neural nets and parallel computers

NASA Technical Reports Server (NTRS)

Salu, Yehuda; Tilton, James

1991-01-01

Nearest neighbor approaches and a new neural network, the Binary Diamond, are used for the classification of images of ground pixels obtained by LANDSAT satellite. The performances are evaluated by comparing classifications of a scene in the vicinity of Washington DC. The problem of optimal selection of categories is addressed as a step in the classification process.
Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.

PubMed

Kruppa, Jochen; Liu, Yufeng; Biau, Gérard; Kohler, Michael; König, Inke R; Malley, James D; Ziegler, Andreas

2014-07-01

Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Estimating persistence of brominated and chlorinated organic pollutants in air, water, soil, and sediments with the QSPR-based classification scheme.

PubMed

Puzyn, T; Haranczyk, M; Suzuki, N; Sakurai, T

2011-02-01

We have estimated degradation half-lives of both brominated and chlorinated dibenzo-p-dioxins (PBDDs and PCDDs), furans (PBDFs and PCDFs), biphenyls (PBBs and PCBs), naphthalenes (PBNs and PCNs), diphenyl ethers (PBDEs and PCDEs) as well as selected unsubstituted polycyclic aromatic hydrocarbons (PAHs) in air, surface water, surface soil, and sediments (in total of 1,431 compounds in four compartments). Next, we compared the persistence between chloro- (relatively well-studied) and bromo- (less studied) analogs. The predictions have been performed based on the quantitative structure-property relationship (QSPR) scheme with use of k-nearest neighbors (kNN) classifier and the semi-quantitative system of persistence classes. The classification models utilized principal components derived from the principal component analysis of a set of 24 constitutional and quantum mechanical descriptors as input variables. Accuracies of classification (based on an external validation) were 86, 85, 87, and 75% for air, surface water, surface soil, and sediments, respectively. The persistence of all chlorinated species increased with increasing halogenation degree. In the case of brominated organic pollutants (Br-OPs), the trend was the same for air and sediments. However, we noticed that the opposite trend for persistence in surface water and soil. The results suggest that, due to high photoreactivity of C-Br chemical bonds, photolytic processes occurring in surface water and soil are able to play significant role in transforming and removing Br-OPs from these compartments. This contribution is the first attempt of classifying together Br-OPs and Cl-OPs according to their persistence, in particular, environmental compartments.
Use of feature extraction techniques for the texture and context information in ERTS imagery: Spectral and textural processing of ERTS imagery. [classification of Kansas land use

NASA Technical Reports Server (NTRS)

Haralick, R. H. (Principal Investigator); Bosley, R. J.

1974-01-01

The author has identified the following significant results. A procedure was developed to extract cross-band textural features from ERTS MSS imagery. Evolving from a single image texture extraction procedure which uses spatial dependence matrices to measure relative co-occurrence of nearest neighbor grey tones, the cross-band texture procedure uses the distribution of neighboring grey tone N-tuple differences to measure the spatial interrelationships, or co-occurrences, of the grey tone N-tuples present in a texture pattern. In both procedures, texture is characterized in such a way as to be invariant under linear grey tone transformations. However, the cross-band procedure complements the single image procedure by extracting texture information and spectral information contained in ERTS multi-images. Classification experiments show that when used alone, without spectral processing, the cross-band texture procedure extracts more information than the single image texture analysis. Results show an improvement in average correct classification from 86.2% to 88.8% for ERTS image no. 1021-16333 with the cross-band texture procedure. However, when used together with spectral features, the single image texture plus spectral features perform better than the cross-band texture plus spectral features, with an average correct classification of 93.8% and 91.6%, respectively.
Classification of damage in structural systems using time series analysis and supervised and unsupervised pattern recognition techniques

NASA Astrophysics Data System (ADS)

Omenzetter, Piotr; de Lautour, Oliver R.

2010-04-01

Developed for studying long, periodic records of various measured quantities, time series analysis methods are inherently suited and offer interesting possibilities for Structural Health Monitoring (SHM) applications. However, their use in SHM can still be regarded as an emerging application and deserves more studies. In this research, Autoregressive (AR) models were used to fit experimental acceleration time histories from two experimental structural systems, a 3- storey bookshelf-type laboratory structure and the ASCE Phase II SHM Benchmark Structure, in healthy and several damaged states. The coefficients of the AR models were chosen as damage sensitive features. Preliminary visual inspection of the large, multidimensional sets of AR coefficients to check the presence of clusters corresponding to different damage severities was achieved using Sammon mapping - an efficient nonlinear data compression technique. Systematic classification of damage into states based on the analysis of the AR coefficients was achieved using two supervised classification techniques: Nearest Neighbor Classification (NNC) and Learning Vector Quantization (LVQ), and one unsupervised technique: Self-organizing Maps (SOM). This paper discusses the performance of AR coefficients as damage sensitive features and compares the efficiency of the three classification techniques using experimental data.
Fall Detection System for the Elderly Based on the Classification of Shimmer Sensor Prototype Data

PubMed Central

Ahmed, Moiz; Mehmood, Nadeem; Mehmood, Amir; Rizwan, Kashif

2017-01-01

Objectives Falling in the elderly is considered a major cause of death. In recent years, ambient and wireless sensor platforms have been extensively used in developed countries for the detection of falls in the elderly. However, we believe extra efforts are required to address this issue in developing countries, such as Pakistan, where most deaths due to falls are not even reported. Considering this, in this paper, we propose a fall detection system prototype that s based on the classification on real time shimmer sensor data. Methods We first developed a data set, ‘SMotion’ of certain postures that could lead to falls in the elderly by using a body area network of Shimmer sensors and categorized the items in this data set into age and weight groups. We developed a feature selection and classification system using three classifiers, namely, support vector machine (SVM), K-nearest neighbor (KNN), and neural network (NN). Finally, a prototype was fabricated to generate alerts to caregivers, health experts, or emergency services in case of fall. Results To evaluate the proposed system, SVM, KNN, and NN were used. The results of this study identified KNN as the most accurate classifier with maximum accuracy of 96% for age groups and 93% for weight groups. Conclusions In this paper, a classification-based fall detection system is proposed. For this purpose, the SMotion data set was developed and categorized into two groups (age and weight groups). The proposed fall detection system for the elderly is implemented through a body area sensor network using third-generation sensors. The evaluation results demonstrate the reasonable performance of the proposed fall detection prototype system in the tested scenarios. PMID:28875049
Corrigendum to "Nearest neighbor imputation of species-level, plot-scale forest structure attributes from LiDAR data"

Treesearch

Andrew T. Hudak; Nicholas L. Crookston; Jeffrey S. Evans; David E. hall; Michael J. Falkowski

2009-01-01

The authors regret that an error was discovered in the code within the R software package, yaImpute (Crookston & Finley, 2008), which led to incorrect results reported in the above article. The Most Similar Neighbor (MSN) method computes the distance between reference observations and target observations in a projected space defined using canonical correlation...
Imputed forest structure uncertainty varies across elevational and longitudinal gradients in the western Cascade mountains, Oregon, USA

Treesearch

David M. Bell; Matthew J. Gregory; Janet L. Ohmann

2015-01-01

Imputation provides a useful method for mapping forest attributes across broad geographic areas based on field plot measurements and Landsat multi-spectral data, but the resulting map products may be of limited use without corresponding analyses of uncertainties in predictions. In the case of k-nearest neighbor (kNN) imputation with k = 1, such as the Gradient Nearest...
A comparative analysis of swarm intelligence techniques for feature selection in cancer classification.

PubMed

Gunavathi, Chellamuthu; Premalatha, Kandasamy

2014-01-01

Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.
Thermography based diagnosis of ruptured anterior cruciate ligament (ACL) in canines

NASA Astrophysics Data System (ADS)

Lama, Norsang; Umbaugh, Scott E.; Mishra, Deependra; Dahal, Rohini; Marino, Dominic J.; Sackman, Joseph

2016-09-01

Anterior cruciate ligament (ACL) rupture in canines is a common orthopedic injury in veterinary medicine. Veterinarians use both imaging and non-imaging methods to diagnose the disease. Common imaging methods such as radiography, computed tomography (CT scan) and magnetic resonance imaging (MRI) have some disadvantages: expensive setup, high dose of radiation, and time-consuming. In this paper, we present an alternative diagnostic method based on feature extraction and pattern classification (FEPC) to diagnose abnormal patterns in ACL thermograms. The proposed method was experimented with a total of 30 thermograms for each camera view (anterior, lateral and posterior) including 14 disease and 16 non-disease cases provided from Long Island Veterinary Specialists. The normal and abnormal patterns in thermograms are analyzed in two steps: feature extraction and pattern classification. Texture features based on gray level co-occurrence matrices (GLCM), histogram features and spectral features are extracted from the color normalized thermograms and the computed feature vectors are applied to Nearest Neighbor (NN) classifier, K-Nearest Neighbor (KNN) classifier and Support Vector Machine (SVM) classifier with leave-one-out validation method. The algorithm gives the best classification success rate of 86.67% with a sensitivity of 85.71% and a specificity of 87.5% in ACL rupture detection using NN classifier for the lateral view and Norm-RGB-Lum color normalization method. Our results show that the proposed method has the potential to detect ACL rupture in canines.

The most detailed high-energy picture of Proxima Centauri, our nearest extrasolar neighbor

NASA Astrophysics Data System (ADS)

Schneider, Christian

2016-10-01

Proxima Centauri b is the nearest exoplanet to the Sun. It orbits an M5.5 dwarf and is potentially habitable. The latter statement, however, depends sensitively on the high-energy irradiation on the planet. Ribas et al. (2016) estimated the high-energy flux of the host star by collecting archival data from the X-ray to the FUV regime, but explicitly state that one unavoidable complication of estimating XUV fluxes is [...] intrinsic [stellar] variability. Here, we propose to greatly improve upon this unavoidable complication by obtaining simultaneous X-ray and UV observations to measure a high-resolution irradiation spectrum and, thus, to assess the habitability of Proxima b.Our upcoming, very deep Chandra grating observation of Proxima Cen (175 ks, LETGS, PI: P. Predehl) provides a great opportunity to obtain simultaneous coverage at X-ray and UV wavelengths, i.e., to measure most of the stellar high-energy flux in a coherent way. The reason for proposing a HST DDT is that the Chandra observation is a GTO and, thus, could not be augmented by simultaneous HST observations directly as we would have proposedfor in a regular GO.Combining Chandra X-ray and HST UV data allows us to reconstruct a high-resolution spectral energy distribution (SED) including the EUV regime and, thus, a reference irradiation spectrum using the methods developed by us for the MUSCLES project.
Nonlinear features for product inspection

NASA Astrophysics Data System (ADS)

Talukder, Ashit; Casasent, David P.

1999-03-01

Classification of real-time X-ray images of randomly oriented touching pistachio nuts is discussed. The ultimate objective is the development of a system for automated non-invasive detection of defective product items on a conveyor belt. We discuss the extraction of new features that allow better discrimination between damaged and clean items (pistachio nuts). This feature extraction and classification stage is the new aspect of this paper; our new maximum representation and discriminating feature (MRDF) extraction method computes nonlinear features that are used as inputs to a new modified k nearest neighbor classifier. In this work, the MRDF is applied to standard features (rather than iconic data). The MRDF is robust to various probability distributions of the input class and is shown to provide good classification and new ROC (receiver operating characteristic) data.
In silico prediction of ROCK II inhibitors by different classification approaches.

PubMed

Cai, Chuipu; Wu, Qihui; Luo, Yunxia; Ma, Huili; Shen, Jiangang; Zhang, Yongbin; Yang, Lei; Chen, Yunbo; Wen, Zehuai; Wang, Qi

2017-11-01

ROCK II is an important pharmacological target linked to central nervous system disorders such as Alzheimer's disease. The purpose of this research is to generate ROCK II inhibitor prediction models by machine learning approaches. Firstly, four sets of descriptors were calculated with MOE 2010 and PaDEL-Descriptor, and optimized by F-score and linear forward selection methods. In addition, four classification algorithms were used to initially build 16 classifiers with k-nearest neighbors [Formula: see text], naïve Bayes, Random forest, and support vector machine. Furthermore, three sets of structural fingerprint descriptors were introduced to enhance the predictive capacity of classifiers, which were assessed with fivefold cross-validation, test set validation and external test set validation. The best two models, MFK + MACCS and MLR + SubFP, have both MCC values of 0.925 for external test set. After that, a privileged substructure analysis was performed to reveal common chemical features of ROCK II inhibitors. Finally, binding modes were analyzed to identify relationships between molecular descriptors and activity, while main interactions were revealed by comparing the docking interaction of the most potent and the weakest ROCK II inhibitors. To the best of our knowledge, this is the first report on ROCK II inhibitors utilizing machine learning approaches that provides a new method for discovering novel ROCK II inhibitors.
Comprehensive thermodynamic analysis of 3′ double-nucleotide overhangs neighboring Watson–Crick terminal base pairs

PubMed Central

O'Toole, Amanda S.; Miller, Stacy; Haines, Nathan; Zink, M. Coleen; Serra, Martin J.

2006-01-01

Thermodynamic parameters are reported for duplex formation of 48 self-complementary RNA duplexes containing Watson–Crick terminal base pairs (GC, AU and UA) with all 16 possible 3′ double-nucleotide overhangs; mimicking the structures of short interfering RNAs (siRNA) and microRNAs (miRNA). Based on nearest-neighbor analysis, the addition of a second dangling nucleotide to a single 3′ dangling nucleotide increases stability of duplex formation up to 0.8 kcal/mol in a sequence dependent manner. Results from this study in conjunction with data from a previous study [A. S. O'Toole, S. Miller and M. J. Serra (2005) RNA, 11, 512.] allows for the development of a refined nearest-neighbor model to predict the influence of 3′ double-nucleotide overhangs on the stability of duplex formation. The model improves the prediction of free energy and melting temperature when tested against five oligomers with various core duplex sequences. Phylogenetic analysis of naturally occurring miRNAs was performed to support our results. Selection of the effector miR strand of the mature miRNA duplex appears to be dependent upon the identity of the 3′ double-nucleotide overhang. Thermodynamic parameters for 3′ single terminal overhangs adjacent to a UA pair are also presented. PMID:16820533
Phase transition in the spin- 3 / 2 Blume-Emery-Griffiths model with antiferromagnetic second neighbor interactions

NASA Astrophysics Data System (ADS)

Yezli, M.; Bekhechi, S.; Hontinfinde, F.; EZ-Zahraouy, H.

2016-04-01

Two nonperturbative methods such as Monte-Carlo simulation (MC) and Transfer-Matrix Finite-Size-Scaling calculations (TMFSS) have been used to study the phase transition of the spin- 3 / 2 Blume-Emery-Griffiths model (BEG) with quadrupolar and antiferromagnetic next-nearest-neighbor exchange interactions. Ground state and finite temperature phase diagrams are obtained by means of these two methods. New degenerate phases are found and only second order phase transitions occur for all values of the parameter interactions. No sign of the intermediate phase is found from both methods. Critical exponents are also obtained from TMFSS calculations. Ising criticality and nonuniversal behaviors are observed depending on the strength of the second neighbor interaction.
Novel Hyperspectral Anomaly Detection Methods Based on Unsupervised Nearest Regularized Subspace

NASA Astrophysics Data System (ADS)

Hou, Z.; Chen, Y.; Tan, K.; Du, P.

2018-04-01

Anomaly detection has been of great interest in hyperspectral imagery analysis. Most conventional anomaly detectors merely take advantage of spectral and spatial information within neighboring pixels. In this paper, two methods of Unsupervised Nearest Regularized Subspace-based with Outlier Removal Anomaly Detector (UNRSORAD) and Local Summation UNRSORAD (LSUNRSORAD) are proposed, which are based on the concept that each pixel in background can be approximately represented by its spatial neighborhoods, while anomalies cannot. Using a dual window, an approximation of each testing pixel is a representation of surrounding data via a linear combination. The existence of outliers in the dual window will affect detection accuracy. Proposed detectors remove outlier pixels that are significantly different from majority of pixels. In order to make full use of various local spatial distributions information with the neighboring pixels of the pixels under test, we take the local summation dual-window sliding strategy. The residual image is constituted by subtracting the predicted background from the original hyperspectral imagery, and anomalies can be detected in the residual image. Experimental results show that the proposed methods have greatly improved the detection accuracy compared with other traditional detection method.
Prediction of microRNAs Associated with Human Diseases Based on Weighted k Most Similar Neighbors

PubMed Central

Guo, Maozu; Guo, Yahong; Li, Jinbao; Ding, Jian; Liu, Yong; Dai, Qiguo; Li, Jin; Teng, Zhixia; Huang, Yufei

2013-01-01

Background The identification of human disease-related microRNAs (disease miRNAs) is important for further investigating their involvement in the pathogenesis of diseases. More experimentally validated miRNA-disease associations have been accumulated recently. On the basis of these associations, it is essential to predict disease miRNAs for various human diseases. It is useful in providing reliable disease miRNA candidates for subsequent experimental studies. Methodology/Principal Findings It is known that miRNAs with similar functions are often associated with similar diseases and vice versa. Therefore, the functional similarity of two miRNAs has been successfully estimated by measuring the semantic similarity of their associated diseases. To effectively predict disease miRNAs, we calculated the functional similarity by incorporating the information content of disease terms and phenotype similarity between diseases. Furthermore, the members of miRNA family or cluster are assigned higher weight since they are more probably associated with similar diseases. A new prediction method, HDMP, based on weighted k most similar neighbors is presented for predicting disease miRNAs. Experiments validated that HDMP achieved significantly higher prediction performance than existing methods. In addition, the case studies examining prostatic neoplasms, breast neoplasms, and lung neoplasms, showed that HDMP can uncover potential disease miRNA candidates. Conclusions The superior performance of HDMP can be attributed to the accurate measurement of miRNA functional similarity, the weight assignment based on miRNA family or cluster, and the effective prediction based on weighted k most similar neighbors. The online prediction and analysis tool is freely available at http://nclab.hit.edu.cn/hdmpred. PMID:23950912
IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids.

PubMed

Ali, Safdar; Majid, Abdul; Khan, Asifullah

2014-04-01

Development of an accurate and reliable intelligent decision-making method for the construction of cancer diagnosis system is one of the fast growing research areas of health sciences. Such decision-making system can provide adequate information for cancer diagnosis and drug discovery. Descriptors derived from physicochemical properties of protein sequences are very useful for classifying cancerous proteins. Recently, several interesting research studies have been reported on breast cancer classification. To this end, we propose the exploitation of the physicochemical properties of amino acids in protein primary sequences such as hydrophobicity (Hd) and hydrophilicity (Hb) for breast cancer classification. Hd and Hb properties of amino acids, in recent literature, are reported to be quite effective in characterizing the constituent amino acids and are used to study protein foldings, interactions, structures, and sequence-order effects. Especially, using these physicochemical properties, we observed that proline, serine, tyrosine, cysteine, arginine, and asparagine amino acids offer high discrimination between cancerous and healthy proteins. In addition, unlike traditional ensemble classification approaches, the proposed 'IDM-PhyChm-Ens' method was developed by combining the decision spaces of a specific classifier trained on different feature spaces. The different feature spaces used were amino acid composition, split amino acid composition, and pseudo amino acid composition. Consequently, we have exploited different feature spaces using Hd and Hb properties of amino acids to develop an accurate method for classification of cancerous protein sequences. We developed ensemble classifiers using diverse learning algorithms such as random forest (RF), support vector machines (SVM), and K-nearest neighbor (KNN) trained on different feature spaces. We observed that ensemble-RF, in case of cancer classification, performed better than ensemble-SVM and ensemble-KNN. Our
Applying Ancestry and Sex Computation as a Quality Control Tool in Targeted Next-Generation Sequencing.

PubMed

Mathias, Patrick C; Turner, Emily H; Scroggins, Sheena M; Salipante, Stephen J; Hoffman, Noah G; Pritchard, Colin C; Shirts, Brian H

2016-03-01

To apply techniques for ancestry and sex computation from next-generation sequencing (NGS) data as an approach to confirm sample identity and detect sample processing errors. We combined a principal component analysis method with k-nearest neighbors classification to compute the ancestry of patients undergoing NGS testing. By combining this calculation with X chromosome copy number data, we determined the sex and ancestry of patients for comparison with self-report. We also modeled the sensitivity of this technique in detecting sample processing errors. We applied this technique to 859 patient samples with reliable self-report data. Our k-nearest neighbors ancestry screen had an accuracy of 98.7% for patients reporting a single ancestry. Visual inspection of principal component plots was consistent with self-report in 99.6% of single-ancestry and mixed-ancestry patients. Our model demonstrates that approximately two-thirds of potential sample swaps could be detected in our patient population using this technique. Patient ancestry can be estimated from NGS data incidentally sequenced in targeted panels, enabling an inexpensive quality control method when coupled with patient self-report. © American Society for Clinical Pathology, 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of classification models to detect Salmonella Enteritidis and Salmonella Typhimurium found in poultry carcass rinses by visible-near infrared hyperspectral imaging

NASA Astrophysics Data System (ADS)

Seo, Young Wook; Yoon, Seung Chul; Park, Bosoon; Hinton, Arthur; Windham, William R.; Lawrence, Kurt C.

2013-05-01

Salmonella is a major cause of foodborne disease outbreaks resulting from the consumption of contaminated food products in the United States. This paper reports the development of a hyperspectral imaging technique for detecting and differentiating two of the most common Salmonella serotypes, Salmonella Enteritidis (SE) and Salmonella Typhimurium (ST), from background microflora that are often found in poultry carcass rinse. Presumptive positive screening of colonies with a traditional direct plating method is a labor intensive and time consuming task. Thus, this paper is concerned with the detection of differences in spectral characteristics among the pure SE, ST, and background microflora grown on brilliant green sulfa (BGS) and xylose lysine tergitol 4 (XLT4) agar media with a spread plating technique. Visible near-infrared hyperspectral imaging, providing the spectral and spatial information unique to each microorganism, was utilized to differentiate SE and ST from the background microflora. A total of 10 classification models, including five machine learning algorithms, each without and with principal component analysis (PCA), were validated and compared to find the best model in classification accuracy. The five machine learning (classification) algorithms used in this study were Mahalanobis distance (MD), k-nearest neighbor (kNN), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and support vector machine (SVM). The average classification accuracy of all 10 models on a calibration (or training) set of the pure cultures on BGS agar plates was 98% (Kappa coefficient = 0.95) in determining the presence of SE and/or ST although it was difficult to differentiate between SE and ST. The average classification accuracy of all 10 models on a training set for ST detection on XLT4 agar was over 99% (Kappa coefficient = 0.99) although SE colonies on XLT4 agar were difficult to differentiate from background microflora. The average classification
General formulation of long-range degree correlations in complex networks

NASA Astrophysics Data System (ADS)

Fujiki, Yuka; Takaguchi, Taro; Yakubo, Kousuke

2018-06-01

We provide a general framework for analyzing degree correlations between nodes separated by more than one step (i.e., beyond nearest neighbors) in complex networks. One joint and four conditional probability distributions are introduced to fully describe long-range degree correlations with respect to degrees k and k' of two nodes and shortest path length l between them. We present general relations among these probability distributions and clarify the relevance to nearest-neighbor degree correlations. Unlike nearest-neighbor correlations, some of these probability distributions are meaningful only in finite-size networks. Furthermore, as a baseline to determine the existence of intrinsic long-range degree correlations in a network other than inevitable correlations caused by the finite-size effect, the functional forms of these probability distributions for random networks are analytically evaluated within a mean-field approximation. The utility of our argument is demonstrated by applying it to real-world networks.
Classification of glioblastoma and metastasis for neuropathology intraoperative diagnosis: a multi-resolution textural approach to model the background

NASA Astrophysics Data System (ADS)

Ahmad Fauzi, Mohammad Faizal; Gokozan, Hamza Numan; Elder, Brad; Puduvalli, Vinay K.; Otero, Jose J.; Gurcan, Metin N.

2014-03-01

Brain cancer surgery requires intraoperative consultation by neuropathology to guide surgical decisions regarding the extent to which the tumor undergoes gross total resection. In this context, the differential diagnosis between glioblastoma and metastatic cancer is challenging as the decision must be made during surgery in a short time-frame (typically 30 minutes). We propose a method to classify glioblastoma versus metastatic cancer based on extracting textural features from the non-nuclei region of cytologic preparations. For glioblastoma, these regions of interest are filled with glial processes between the nuclei, which appear as anisotropic thin linear structures. For metastasis, these regions correspond to a more homogeneous appearance, thus suitable texture features can be extracted from these regions to distinguish between the two tissue types. In our work, we use the Discrete Wavelet Frames to characterize the underlying texture due to its multi-resolution capability in modeling underlying texture. The textural characterization is carried out in primarily the non-nuclei regions after nuclei regions are segmented by adapting our visually meaningful decomposition segmentation algorithm to this problem. k-nearest neighbor method was then used to classify the features into glioblastoma or metastasis cancer class. Experiment on 53 images (29 glioblastomas and 24 metastases) resulted in average accuracy as high as 89.7% for glioblastoma, 87.5% for metastasis and 88.7% overall. Further studies are underway to incorporate nuclei region features into classification on an expanded dataset, as well as expanding the classification to more types of cancers.
Band nesting, massive Dirac fermions, and valley Landé and Zeeman effects in transition metal dichalcogenides: A tight-binding model

NASA Astrophysics Data System (ADS)

Bieniek, Maciej; Korkusiński, Marek; Szulakowska, Ludmiła; Potasz, Paweł; Ozfidan, Isil; Hawrylak, Paweł

2018-02-01

We present here the minimal tight-binding model for a single layer of transition metal dichalcogenides (TMDCs) MX 2(M , metal; X , chalcogen) which illuminates the physics and captures band nesting, massive Dirac fermions, and valley Landé and Zeeman magnetic field effects. TMDCs share the hexagonal lattice with graphene but their electronic bands require much more complex atomic orbitals. Using symmetry arguments, a minimal basis consisting of three metal d orbitals and three chalcogen dimer p orbitals is constructed. The tunneling matrix elements between nearest-neighbor metal and chalcogen orbitals are explicitly derived at K ,-K , and Γ points of the Brillouin zone. The nearest-neighbor tunneling matrix elements connect specific metal and sulfur orbitals yielding an effective 6 ×6 Hamiltonian giving correct composition of metal and chalcogen orbitals but not the direct gap at K points. The direct gap at K , correct masses, and conduction band minima at Q points responsible for band nesting are obtained by inclusion of next-neighbor Mo-Mo tunneling. The parameters of the next-nearest-neighbor model are successfully fitted to MX 2(M =Mo ; X =S ) density functional ab initio calculations of the highest valence and lowest conduction band dispersion along K -Γ line in the Brillouin zone. The effective two-band massive Dirac Hamiltonian for MoS2, Landé g factors, and valley Zeeman splitting are obtained.
Missing value imputation in DNA microarrays based on conjugate gradient method.

PubMed

Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh

2012-02-01

Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.
The application of k-Nearest Neighbour in the identification of high potential archers based on relative psychological coping skills variables

NASA Astrophysics Data System (ADS)

Taha, Zahari; Muazu Musa, Rabiu; Majeed, Anwar P. P. Abdul; Razali Abdullah, Mohamad; Muaz Alim, Muhammad; Nasir, Ahmad Fakhri Ab

2018-04-01

The present study aims at classifying and predicting high and low potential archers from a collection of psychological coping skills variables trained on different k-Nearest Neighbour (k-NN) kernels. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. Psychological coping skills inventory which evaluates the archers level of related coping skills were filled out by the archers prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed k-NN models, i.e. fine, medium, coarse, cosine, cubic and weighted kernel functions, were trained on the psychological variables. The k-means clustered the archers into high psychologically prepared archers (HPPA) and low psychologically prepared archers (LPPA), respectively. It was demonstrated that the cosine k-NN model exhibited good accuracy and precision throughout the exercise with an accuracy of 94% and considerably fewer error rate for the prediction of the HPPA and the LPPA as compared to the rest of the models. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected psychological coping skills variables examined which would consequently save time and energy during talent identification and development programme.
Image classification independent of orientation and scale

NASA Astrophysics Data System (ADS)

Arsenault, Henri H.; Parent, Sebastien; Moisan, Sylvain

1998-04-01

The recognition of targets independently of orientation has become fairly well developed in recent years for in-plane rotation. The out-of-plane rotation problem is much less advanced. When both out-of-plane rotations and changes of scale are present, the problem becomes very difficult. In this paper we describe our research on the combined out-of- plane rotation problem and the scale invariance problem. The rotations were limited to rotations about an axis perpendicular to the line of sight. The objects to be classified were three kinds of military vehicles. The inputs used were infrared imagery and photographs. We used a variation of a method proposed by Neiberg and Casasent, where a neural network is trained with a subset of the database and a minimum distances from lines in feature space are used for classification instead of nearest neighbors. Each line in the feature space corresponds to one class of objects, and points on one line correspond to different orientations of the same target. We found that the training samples needed to be closer for some orientations than for others, and that the most difficult orientations are where the target is head-on to the observer. By means of some additional training of the neural network, we were able to achieve 100% correct classification for 360 degree rotation and a range of scales over a factor of five.
Inspection of wear particles in oils by using a fuzzy classifier

NASA Astrophysics Data System (ADS)

Hamalainen, Jari J.; Enwald, Petri

1994-11-01

The reliability of stand-alone machines and larger production units can be improved by automated condition monitoring. Analysis of wear particles in lubricating or hydraulic oils helps diagnosing the wear states of machine parts. This paper presents a computer vision system for automated classification of wear particles. Digitized images from experiments with a bearing test bench, a hydraulic system with an industrial company, and oil samples from different industrial sources were used for algorithm development and testing. The wear particles were divided into four classes indicating different wear mechanisms: cutting wear, fatigue wear, adhesive wear, and abrasive wear. The results showed that the fuzzy K-nearest neighbor classifier utilized gave the same distribution of wear particles as the classification by a human expert.
Autonomous target recognition using remotely sensed surface vibration measurements

NASA Astrophysics Data System (ADS)

Geurts, James; Ruck, Dennis W.; Rogers, Steven K.; Oxley, Mark E.; Barr, Dallas N.

1993-09-01

The remotely measured surface vibration signatures of tactical military ground vehicles are investigated for use in target classification and identification friend or foe (IFF) systems. The use of remote surface vibration sensing by a laser radar reduces the effects of partial occlusion, concealment, and camouflage experienced by automatic target recognition systems using traditional imagery in a tactical battlefield environment. Linear Predictive Coding (LPC) efficiently represents the vibration signatures and nearest neighbor classifiers exploit the LPC feature set using a variety of distortion metrics. Nearest neighbor classifiers achieve an 88 percent classification rate in an eight class problem, representing a classification performance increase of thirty percent from previous efforts. A novel confidence figure of merit is implemented to attain a 100 percent classification rate with less than 60 percent rejection. The high classification rates are achieved on a target set which would pose significant problems to traditional image-based recognition systems. The targets are presented to the sensor in a variety of aspects and engine speeds at a range of 1 kilometer. The classification rates achieved demonstrate the benefits of using remote vibration measurement in a ground IFF system. The signature modeling and classification system can also be used to identify rotary and fixed-wing targets.
Identification of Disease Critical Genes Using Collective Meta-heuristic Approaches: An Application to Preeclampsia.

PubMed

Biswas, Surama; Dutta, Subarna; Acharyya, Sriyankar

2017-12-01

Identifying a small subset of disease critical genes out of a large size of microarray gene expression data is a challenge in computational life sciences. This paper has applied four meta-heuristic algorithms, namely, honey bee mating optimization (HBMO), harmony search (HS), differential evolution (DE) and genetic algorithm (basic version GA) to find disease critical genes of preeclampsia which affects women during gestation. Two hybrid algorithms, namely, HBMO-kNN and HS-kNN have been newly proposed here where kNN (k nearest neighbor classifier) is used for sample classification. Performances of these new approaches have been compared with other two hybrid algorithms, namely, DE-kNN and SGA-kNN. Three datasets of different sizes have been used. In a dataset, the set of genes found common in the output of each algorithm is considered here as disease critical genes. In different datasets, the percentage of classification or classification accuracy of meta-heuristic algorithms varied between 92.46 and 100%. HBMO-kNN has the best performance (99.64-100%) in almost all data sets. DE-kNN secures the second position (99.42-100%). Disease critical genes obtained here match with clinically revealed preeclampsia genes to a large extent.
Construction accident narrative classification: An evaluation of text mining techniques.

PubMed

Goh, Yang Miang; Ubeynarayana, C U

2017-11-01

Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. Copyright © 2017 Elsevier Ltd. All rights reserved.

Can a Smartphone Diagnose Parkinson Disease? A Deep Neural Network Method and Telediagnosis System Implementation.

PubMed

Zhang, Y N

2017-01-01

Parkinson's disease (PD) is primarily diagnosed by clinical examinations, such as walking test, handwriting test, and MRI diagnostic. In this paper, we propose a machine learning based PD telediagnosis method for smartphone. Classification of PD using speech records is a challenging task owing to the fact that the classification accuracy is still lower than doctor-level. Here we demonstrate automatic classification of PD using time frequency features, stacked autoencoders (SAE), and K nearest neighbor (KNN) classifier. KNN classifier can produce promising classification results from useful representations which were learned by SAE. Empirical results show that the proposed method achieves better performance with all tested cases across classification tasks, demonstrating machine learning capable of classifying PD with a level of competence comparable to doctor. It concludes that a smartphone can therefore potentially provide low-cost PD diagnostic care. This paper also gives an implementation on browser/server system and reports the running time cost. Both advantages and disadvantages of the proposed telediagnosis system are discussed.
Can a Smartphone Diagnose Parkinson Disease? A Deep Neural Network Method and Telediagnosis System Implementation

PubMed Central

2017-01-01

Parkinson's disease (PD) is primarily diagnosed by clinical examinations, such as walking test, handwriting test, and MRI diagnostic. In this paper, we propose a machine learning based PD telediagnosis method for smartphone. Classification of PD using speech records is a challenging task owing to the fact that the classification accuracy is still lower than doctor-level. Here we demonstrate automatic classification of PD using time frequency features, stacked autoencoders (SAE), and K nearest neighbor (KNN) classifier. KNN classifier can produce promising classification results from useful representations which were learned by SAE. Empirical results show that the proposed method achieves better performance with all tested cases across classification tasks, demonstrating machine learning capable of classifying PD with a level of competence comparable to doctor. It concludes that a smartphone can therefore potentially provide low-cost PD diagnostic care. This paper also gives an implementation on browser/server system and reports the running time cost. Both advantages and disadvantages of the proposed telediagnosis system are discussed. PMID:29075547
Real-Time Classification of Patients with Balance Disorders vs. Normal Subjects Using a Low-Cost Small Wireless Wearable Gait Sensor.

PubMed

Nukala, Bhargava Teja; Nakano, Taro; Rodriguez, Amanda; Tsay, Jerry; Lopez, Jerry; Nguyen, Tam Q; Zupancic, Steven; Lie, Donald Y C

2016-11-29

Gait analysis using wearable wireless sensors can be an economical, convenient and effective way to provide diagnostic and clinical information for various health-related issues. In this work, our custom designed low-cost wireless gait analysis sensor that contains a basic inertial measurement unit (IMU) was used to collect the gait data for four patients diagnosed with balance disorders and additionally three normal subjects, each performing the Dynamic Gait Index (DGI) tests while wearing the custom wireless gait analysis sensor (WGAS). The small WGAS includes a tri-axial accelerometer integrated circuit (IC), two gyroscopes ICs and a Texas Instruments (TI) MSP430 microcontroller and is worn by each subject at the T4 position during the DGI tests. The raw gait data are wirelessly transmitted from the WGAS to a near-by PC for real-time gait data collection and analysis. In order to perform successful classification of patients vs. normal subjects, we used several different classification algorithms, such as the back propagation artificial neural network (BP-ANN), support vector machine (SVM), k -nearest neighbors (KNN) and binary decision trees (BDT), based on features extracted from the raw gait data of the gyroscopes and accelerometers. When the range was used as the input feature, the overall classification accuracy obtained is 100% with BP-ANN, 98% with SVM, 96% with KNN and 94% using BDT. Similar high classification accuracy results were also achieved when the standard deviation or other values were used as input features to these classifiers. These results show that gait data collected from our very low-cost wearable wireless gait sensor can effectively differentiate patients with balance disorders from normal subjects in real time using various classifiers, the success of which may eventually lead to accurate and objective diagnosis of abnormal human gaits and their underlying etiologies in the future, as more patient data are being collected.
Machine learning in APOGEE. Unsupervised spectral classification with K-means

NASA Astrophysics Data System (ADS)

Garcia-Dias, Rafael; Allende Prieto, Carlos; Sánchez Almeida, Jorge; Ordovás-Pascual, Ignacio

2018-05-01

Context. The volume of data generated by astronomical surveys is growing rapidly. Traditional analysis techniques in spectroscopy either demand intensive human interaction or are computationally expensive. In this scenario, machine learning, and unsupervised clustering algorithms in particular, offer interesting alternatives. The Apache Point Observatory Galactic Evolution Experiment (APOGEE) offers a vast data set of near-infrared stellar spectra, which is perfect for testing such alternatives. Aims: Our research applies an unsupervised classification scheme based on K-means to the massive APOGEE data set. We explore whether the data are amenable to classification into discrete classes. Methods: We apply the K-means algorithm to 153 847 high resolution spectra (R ≈ 22 500). We discuss the main virtues and weaknesses of the algorithm, as well as our choice of parameters. Results: We show that a classification based on normalised spectra captures the variations in stellar atmospheric parameters, chemical abundances, and rotational velocity, among other factors. The algorithm is able to separate the bulge and halo populations, and distinguish dwarfs, sub-giants, RC, and RGB stars. However, a discrete classification in flux space does not result in a neat organisation in the parameters' space. Furthermore, the lack of obvious groups in flux space causes the results to be fairly sensitive to the initialisation, and disrupts the efficiency of commonly-used methods to select the optimal number of clusters. Our classification is publicly available, including extensive online material associated with the APOGEE Data Release 12 (DR12). Conclusions: Our description of the APOGEE database can help greatly with the identification of specific types of targets for various applications. We find a lack of obvious groups in flux space, and identify limitations of the K-means algorithm in dealing with this kind of data. Full Tables B.1-B.4 are only available at the CDS via
Linear Discriminant Analysis Achieves High Classification Accuracy for the BOLD fMRI Response to Naturalistic Movie Stimuli

PubMed Central

Mandelkow, Hendrik; de Zwart, Jacco A.; Duyn, Jeff H.

2016-01-01

Naturalistic stimuli like movies evoke complex perceptual processes, which are of great interest in the study of human cognition by functional MRI (fMRI). However, conventional fMRI analysis based on statistical parametric mapping (SPM) and the general linear model (GLM) is hampered by a lack of accurate parametric models of the BOLD response to complex stimuli. In this situation, statistical machine-learning methods, a.k.a. multivariate pattern analysis (MVPA), have received growing attention for their ability to generate stimulus response models in a data-driven fashion. However, machine-learning methods typically require large amounts of training data as well as computational resources. In the past, this has largely limited their application to fMRI experiments involving small sets of stimulus categories and small regions of interest in the brain. By contrast, the present study compares several classification algorithms known as Nearest Neighbor (NN), Gaussian Naïve Bayes (GNB), and (regularized) Linear Discriminant Analysis (LDA) in terms of their classification accuracy in discriminating the global fMRI response patterns evoked by a large number of naturalistic visual stimuli presented as a movie. Results show that LDA regularized by principal component analysis (PCA) achieved high classification accuracies, above 90% on average for single fMRI volumes acquired 2 s apart during a 300 s movie (chance level 0.7% = 2 s/300 s). The largest source of classification errors were autocorrelations in the BOLD signal compounded by the similarity of consecutive stimuli. All classifiers performed best when given input features from a large region of interest comprising around 25% of the voxels that responded significantly to the visual stimulus. Consistent with this, the most informative principal components represented widespread distributions of co-activated brain regions that were similar between subjects and may represent functional networks. In light of these
Texture analysis applied to second harmonic generation image data for ovarian cancer classification

NASA Astrophysics Data System (ADS)

Wen, Bruce L.; Brewer, Molly A.; Nadiarnykh, Oleg; Hocker, James; Singh, Vikas; Mackie, Thomas R.; Campagnola, Paul J.

2014-09-01

Remodeling of the extracellular matrix has been implicated in ovarian cancer. To quantitate the remodeling, we implement a form of texture analysis to delineate the collagen fibrillar morphology observed in second harmonic generation microscopy images of human normal and high grade malignant ovarian tissues. In the learning stage, a dictionary of "textons"-frequently occurring texture features that are identified by measuring the image response to a filter bank of various shapes, sizes, and orientations-is created. By calculating a representative model based on the texton distribution for each tissue type using a training set of respective second harmonic generation images, we then perform classification between images of normal and high grade malignant ovarian tissues. By optimizing the number of textons and nearest neighbors, we achieved classification accuracy up to 97% based on the area under receiver operating characteristic curves (true positives versus false positives). The local analysis algorithm is a more general method to probe rapidly changing fibrillar morphologies than global analyses such as FFT. It is also more versatile than other texture approaches as the filter bank can be highly tailored to specific applications (e.g., different disease states) by creating customized libraries based on common image features.
Classification of speech dysfluencies using LPC based parameterization techniques.

PubMed

Hariharan, M; Chee, Lim Sin; Ai, Ooi Chia; Yaacob, Sazali

2012-06-01

The goal of this paper is to discuss and compare three feature extraction methods: Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC) and Weighted Linear Prediction Cepstral Coefficients (WLPCC) for recognizing the stuttered events. Speech samples from the University College London Archive of Stuttered Speech (UCLASS) were used for our analysis. The stuttered events were identified through manual segmentation and were used for feature extraction. Two simple classifiers namely, k-nearest neighbour (kNN) and Linear Discriminant Analysis (LDA) were employed for speech dysfluencies classification. Conventional validation method was used for testing the reliability of the classifier results. The study on the effect of different frame length, percentage of overlapping, value of ã in a first order pre-emphasizer and different order p were discussed. The speech dysfluencies classification accuracy was found to be improved by applying statistical normalization before feature extraction. The experimental investigation elucidated LPC, LPCC and WLPCC features can be used for identifying the stuttered events and WLPCC features slightly outperforms LPCC features and LPC features.
Computer Simulation of Energy Parameters and Magnetic Effects in Fe-Si-C Ternary Alloys

NASA Astrophysics Data System (ADS)

Ridnyi, Ya. M.; Mirzoev, A. A.; Mirzaev, D. A.

2018-06-01

The paper presents ab initio simulation with the WIEN2k software package of the equilibrium structure and properties of silicon and carbon atoms dissolved in iron with the body-centered cubic crystal system of the lattice. Silicon and carbon atoms manifest a repulsive interaction in the first two nearest neighbors, in the second neighbor the repulsion being stronger than in the first. In the third and next-nearest neighbors a very weak repulsive interaction occurs and tends to zero with increasing distance between atoms. Silicon and carbon dissolution reduces the magnetic moment of iron atoms.
Biclustering Learning of Trading Rules.

PubMed

Huang, Qinghua; Wang, Ting; Tao, Dacheng; Li, Xuelong

2015-10-01

Technical analysis with numerous indicators and patterns has been regarded as important evidence for making trading decisions in financial markets. However, it is extremely difficult for investors to find useful trading rules based on numerous technical indicators. This paper innovatively proposes the use of biclustering mining to discover effective technical trading patterns that contain a combination of indicators from historical financial data series. This is the first attempt to use biclustering algorithm on trading data. The mined patterns are regarded as trading rules and can be classified as three trading actions (i.e., the buy, the sell, and no-action signals) with respect to the maximum support. A modified K nearest neighborhood ( K -NN) method is applied to classification of trading days in the testing period. The proposed method [called biclustering algorithm and the K nearest neighbor (BIC- K -NN)] was implemented on four historical datasets and the average performance was compared with the conventional buy-and-hold strategy and three previously reported intelligent trading systems. Experimental results demonstrate that the proposed trading system outperforms its counterparts and will be useful for investment in various financial markets.
Physical Human Activity Recognition Using Wearable Sensors.

PubMed

Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine

2015-12-11

This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors' placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject.
Physical Human Activity Recognition Using Wearable Sensors

PubMed Central

Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine

2015-01-01

This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors’ placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject. PMID:26690450
ISE-based sensor array system for classification of foodstuffs

NASA Astrophysics Data System (ADS)

Ciosek, Patrycja; Sobanski, Tomasz; Augustyniak, Ewa; Wróblewski, Wojciech

2006-01-01

A system composed of an array of polymeric membrane ion-selective electrodes and a pattern recognition block—a so-called 'electronic tongue'—was used for the classification of liquid samples: milk, fruit juice and tonic. The task of this system was to automatically recognize a brand of the product. To analyze the measurement set-up responses various non-parametric classifiers such as k-nearest neighbours, a feedforward neural network and a probabilistic neural network were used. In order to enhance the classification ability of the system, standard model solutions of salts were measured (in order to take into account any variation in time of the working parameters of the sensors). This system was capable of recognizing the brand of the products with accuracy ranging from 68% to 100% (in the case of the best classifier).
Discrimination of lymphoma using laser-induced breakdown spectroscopy conducted on whole blood samples

PubMed Central

Chen, Xue; Li, Xiaohui; Yang, Sibo; Yu, Xin; Liu, Aichun

2018-01-01

Lymphoma is a significant cancer that affects the human lymphatic and hematopoietic systems. In this work, discrimination of lymphoma using laser-induced breakdown spectroscopy (LIBS) conducted on whole blood samples is presented. The whole blood samples collected from lymphoma patients and healthy controls are deposited onto standard quantitative filter papers and ablated with a 1064 nm Q-switched Nd:YAG laser. 16 atomic and ionic emission lines of calcium (Ca), iron (Fe), magnesium (Mg), potassium (K) and sodium (Na) are selected to discriminate the cancer disease. Chemometric methods, including principal component analysis (PCA), linear discriminant analysis (LDA) classification, and k nearest neighbor (kNN) classification are used to build the discrimination models. Both LDA and kNN models have achieved very good discrimination performances for lymphoma, with an accuracy of over 99.7%, a sensitivity of over 0.996, and a specificity of over 0.997. These results demonstrate that the whole-blood-based LIBS technique in combination with chemometric methods can serve as a fast, less invasive, and accurate method for detection and discrimination of human malignancies. PMID:29541503
Nearest neighbor: The low-mass Milky Way satellite Tucana III

DOE PAGES

Simon, J. D.; Li, T. S.; Drlica-Wagner, A.; ...

2017-03-17

Here, we present Magellan/IMACS spectroscopy of the recently discovered Milky Way satellite Tucana III (Tuc III). We identify 26 member stars in Tuc III from which we measure a mean radial velocity of v hel = -102.3 ± 0.4 (stat.) ± 2.0 (sys.)more » $$\\mathrm{km}\\,{{\\rm{s}}}^{-1}$$, a velocity dispersion of $${0.1}_{-0.1}^{+0.7}$$ $$\\mathrm{km}\\,{{\\rm{s}}}^{-1}$$, and a mean metallicity of $${\\rm{[Fe/H]}}=-{2.42}_{-0.08}^{+0.07}$$. The upper limit on the velocity dispersion is σ < 1.5 $$\\mathrm{km}\\,{{\\rm{s}}}^{-1}$$ at 95.5% confidence, and the corresponding upper limit on the mass within the half-light radius of Tuc III is 9.0 × 10 4 M ⊙. We cannot rule out mass-to-light ratios as large as 240 M ⊙/L ⊙ for Tuc III, but much lower mass-to-light ratios that would leave the system baryon-dominated are also allowed. We measure an upper limit on the metallicity spread of the stars in Tuc III of 0.19 dex at 95.5% confidence. Tuc III has a smaller metallicity dispersion and likely a smaller velocity dispersion than any known dwarf galaxy, but a larger size and lower surface brightness than any known globular cluster. Its metallicity is also much lower than those of the clusters with similar luminosity. We therefore tentatively suggest that Tuc III is the tidally stripped remnant of a dark matter-dominated dwarf galaxy, but additional precise velocity and metallicity measurements will be necessary for a definitive classification. If Tuc III is indeed a dwarf galaxy, it is one of the closest external galaxies to the Sun. Because of its proximity, the most luminous stars in Tuc III are quite bright, including one star at V = 15.7 that is the brightest known member star of an ultra-faint satellite.« less
Classification with spatio-temporal interpixel class dependency contexts

NASA Technical Reports Server (NTRS)

Jeon, Byeungwoo; Landgrebe, David A.

1992-01-01

A contextual classifier which can utilize both spatial and temporal interpixel dependency contexts is investigated. After spatial and temporal neighbors are defined, a general form of maximum a posterior spatiotemporal contextual classifier is derived. This contextual classifier is simplified under several assumptions. Joint prior probabilities of the classes of each pixel and its spatial neighbors are modeled by the Gibbs random field. The classification is performed in a recursive manner to allow a computationally efficient contextual classification. Experimental results with bitemporal TM data show significant improvement of classification accuracy over noncontextual pixelwise classifiers. This spatiotemporal contextual classifier should find use in many applications of remote sensing, especially when the classification accuracy is important.
Texture Analysis and Machine Learning for Detecting Myocardial Infarction in Noncontrast Low-Dose Computed Tomography: Unveiling the Invisible.

PubMed

Mannil, Manoj; von Spiczak, Jochen; Manka, Robert; Alkadhi, Hatem

2018-06-01

The aim of this study was to test whether texture analysis and machine learning enable the detection of myocardial infarction (MI) on non-contrast-enhanced low radiation dose cardiac computed tomography (CCT) images. In this institutional review board-approved retrospective study, we included non-contrast-enhanced electrocardiography-gated low radiation dose CCT image data (effective dose, 0.5 mSv) acquired for the purpose of calcium scoring of 27 patients with acute MI (9 female patients; mean age, 60 ± 12 years), 30 patients with chronic MI (8 female patients; mean age, 68 ± 13 years), and in 30 subjects (9 female patients; mean age, 44 ± 6 years) without cardiac abnormality, hereafter termed controls. Texture analysis of the left ventricle was performed using free-hand regions of interest, and texture features were classified twice (Model I: controls versus acute MI versus chronic MI; Model II: controls versus acute and chronic MI). For both classifications, 6 commonly used machine learning classifiers were used: decision tree C4.5 (J48), k-nearest neighbors, locally weighted learning, RandomForest, sequential minimal optimization, and an artificial neural network employing deep learning. In addition, 2 blinded, independent readers visually assessed noncontrast CCT images for the presence or absence of MI. In Model I, best classification results were obtained using the k-nearest neighbors classifier (sensitivity, 69%; specificity, 85%; false-positive rate, 0.15). In Model II, the best classification results were found with the locally weighted learning classification (sensitivity, 86%; specificity, 81%; false-positive rate, 0.19) with an area under the curve from receiver operating characteristics analysis of 0.78. In comparison, both readers were not able to identify MI in any of the noncontrast, low radiation dose CCT images. This study indicates the ability of texture analysis and machine learning in detecting MI on noncontrast low radiation dose CCT
The natural neighbor series manuals and source codes

NASA Astrophysics Data System (ADS)

Watson, Dave

1999-05-01

This software series is concerned with reconstruction of spatial functions by interpolating a set of discrete observations having two or three independent variables. There are three components in this series: (1) nngridr: an implementation of natural neighbor interpolation, 1994, (2) modemap: an implementation of natural neighbor interpolation on the sphere, 1998 and (3) orebody: an implementation of natural neighbor isosurface generation (publication incomplete). Interpolation is important to geologists because it can offer graphical insights into significant geological structure and behavior, which, although inherent in the data, may not be otherwise apparent. It also is the first step in numerical integration, which provides a primary avenue to detailed quantification of the observed spatial function. Interpolation is implemented by selecting a surface-generating rule that controls the form of a `bridge' built across the interstices between adjacent observations. The cataloging and classification of the many such rules that have been reported is a subject in itself ( Watson, 1992), and the merits of various approaches have been debated at length. However, for practical purposes, interpolation methods are usually judged on how satisfactorily they handle problematic data sets. Sparse scattered data or traverse data, especially if the functional values are highly variable, generally tests interpolation methods most severely; but one method, natural neighbor interpolation, usually does produce preferable results for such data.
Modeling Liver-Related Adverse Effects of Drugs Using kNN QSAR Method

PubMed Central

Rodgers, Amie D.; Zhu, Hao; Fourches, Dennis; Rusyn, Ivan; Tropsha, Alexander

2010-01-01

Adverse effects of drugs (AEDs) continue to be a major cause of drug withdrawals both in development and post-marketing. While liver-related AEDs are a major concern for drug safety, there are few in silico models for predicting human liver toxicity for drug candidates. We have applied the Quantitative Structure Activity Relationship (QSAR) approach to model liver AEDs. In this study, we aimed to construct a QSAR model capable of binary classification (active vs. inactive) of drugs for liver AEDs based on chemical structure. To build QSAR models, we have employed an FDA spontaneous reporting database of human liver AEDs (elevations in activity of serum liver enzymes), which contains data on approximately 500 approved drugs. Approximately 200 compounds with wide clinical data coverage, structural similarity and balanced (40/60) active/inactive ratio were selected for modeling and divided into multiple training/test and external validation sets. QSAR models were developed using the k nearest neighbor method and validated using external datasets. Models with high sensitivity (>73%) and specificity (>94%) for prediction of liver AEDs in external validation sets were developed. To test applicability of the models, three chemical databases (World Drug Index, Prestwick Chemical Library, and Biowisdom Liver Intelligence Module) were screened in silico and the validity of predictions was determined, where possible, by comparing model-based classification with assertions in publicly available literature. Validated QSAR models of liver AEDs based on the data from the FDA spontaneous reporting system can be employed as sensitive and specific predictors of AEDs in pre-clinical screening of drug candidates for potential hepatotoxicity in humans. PMID:20192250
Prediction of Human Intestinal Absorption of Compounds Using Artificial Intelligence Techniques.

PubMed

Kumar, Rajnish; Sharma, Anju; Siddiqui, Mohammed Haris; Tiwari, Rajesh Kumar

2017-01-01

Information about Pharmacokinetics of compounds is an essential component of drug design and development. Modeling the pharmacokinetic properties require identification of the factors effecting absorption, distribution, metabolism and excretion of compounds. There have been continuous attempts in the prediction of intestinal absorption of compounds using various Artificial intelligence methods in the effort to reduce the attrition rate of drug candidates entering to preclinical and clinical trials. Currently, there are large numbers of individual predictive models available for absorption using machine learning approaches. Six Artificial intelligence methods namely, Support vector machine, k- nearest neighbor, Probabilistic neural network, Artificial neural network, Partial least square and Linear discriminant analysis were used for prediction of absorption of compounds. Prediction accuracy of Support vector machine, k- nearest neighbor, Probabilistic neural network, Artificial neural network, Partial least square and Linear discriminant analysis for prediction of intestinal absorption of compounds was found to be 91.54%, 88.33%, 84.30%, 86.51%, 79.07% and 80.08% respectively. Comparative analysis of all the six prediction models suggested that Support vector machine with Radial basis function based kernel is comparatively better for binary classification of compounds using human intestinal absorption and may be useful at preliminary stages of drug design and development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Accelerating Families of Fuzzy K-Means Algorithms for Vector Quantization Codebook Design

PubMed Central

Mata, Edson; Bandeira, Silvio; de Mattos Neto, Paulo; Lopes, Waslon; Madeiro, Francisco

2016-01-01

The performance of signal processing systems based on vector quantization depends on codebook design. In the image compression scenario, the quality of the reconstructed images depends on the codebooks used. In this paper, alternatives are proposed for accelerating families of fuzzy K-means algorithms for codebook design. The acceleration is obtained by reducing the number of iterations of the algorithms and applying efficient nearest neighbor search techniques. Simulation results concerning image vector quantization have shown that the acceleration obtained so far does not decrease the quality of the reconstructed images. Codebook design time savings up to about 40% are obtained by the accelerated versions with respect to the original versions of the algorithms. PMID:27886061

Accelerating Families of Fuzzy K-Means Algorithms for Vector Quantization Codebook Design.

PubMed

Mata, Edson; Bandeira, Silvio; de Mattos Neto, Paulo; Lopes, Waslon; Madeiro, Francisco

2016-11-23

The performance of signal processing systems based on vector quantization depends on codebook design. In the image compression scenario, the quality of the reconstructed images depends on the codebooks used. In this paper, alternatives are proposed for accelerating families of fuzzy K-means algorithms for codebook design. The acceleration is obtained by reducing the number of iterations of the algorithms and applying efficient nearest neighbor search techniques. Simulation results concerning image vector quantization have shown that the acceleration obtained so far does not decrease the quality of the reconstructed images. Codebook design time savings up to about 40% are obtained by the accelerated versions with respect to the original versions of the algorithms.
Probability machines: consistent probability estimation using nonparametric learning machines.

PubMed

Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A

2012-01-01

Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.
Chemometric and multivariate statistical analysis of time-of-flight secondary ion mass spectrometry spectra from complex Cu-Fe sulfides.

PubMed

Kalegowda, Yogesh; Harmer, Sarah L

2012-03-20

Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.
Absence of long-range order in the frustrated magnet SrDy2O4 due to trapped defects from a dimensionality crossover

NASA Astrophysics Data System (ADS)

Gauthier, N.; Fennell, A.; Prévost, B.; Uldry, A.-C.; Delley, B.; Sibille, R.; Désilets-Benoit, A.; Dabkowska, H. A.; Nilsen, G. J.; Regnault, L.-P.; White, J. S.; Niedermayer, C.; Pomjakushin, V.; Bianchi, A. D.; Kenzelmann, M.

2017-04-01

Magnetic frustration and low dimensionality can prevent long-range magnetic order and lead to exotic correlated ground states. SrDy2O4 consists of magnetic Dy3 + ions forming magnetically frustrated zigzag chains along the c axis and shows no long-range order to temperatures as low as T =60 mK. We carried out neutron scattering and ac magnetic susceptibility measurements using powder and single crystals of SrDy2O4 . Diffuse neutron scattering indicates strong one-dimensional (1D) magnetic correlations along the chain direction that can be qualitatively accounted for by the axial next-nearest-neighbor Ising model with nearest-neighbor and next-nearest-neighbor exchange J1=0.3 meV and J2=0.2 meV, respectively. Three-dimensional (3D) correlations become important below T*≈0.7 K. At T =60 mK, the short-range correlations are characterized by a putative propagation vector k1 /2=(0 ,1/2 ,1/2 ) . We argue that the absence of long-range order arises from the presence of slowly decaying 1D domain walls that are trapped due to 3D correlations. This stabilizes a low-temperature phase without long-range magnetic order, but with well-ordered chain segments separated by slowly moving domain walls.
Classification of Breast Cancer Resistant Protein (BCRP) Inhibitors and Non-Inhibitors Using Machine Learning Approaches.

PubMed

Belekar, Vilas; Lingineni, Karthik; Garg, Prabha

2015-01-01

The breast cancer resistant protein (BCRP) is an important transporter and its inhibitors play an important role in cancer treatment by improving the oral bioavailability as well as blood brain barrier (BBB) permeability of anticancer drugs. In this work, a computational model was developed to predict the compounds as BCRP inhibitors or non-inhibitors. Various machine learning approaches like, support vector machine (SVM), k-nearest neighbor (k-NN) and artificial neural network (ANN) were used to develop the models. The Matthews correlation coefficients (MCC) of developed models using ANN, k-NN and SVM are 0.67, 0.71 and 0.77, and prediction accuracies are 85.2%, 88.3% and 90.8% respectively. The developed models were tested with a test set of 99 compounds and further validated with external set of 98 compounds. Distribution plot analysis and various machine learning models were also developed based on druglikeness descriptors. Applicability domain is used to check the prediction reliability of the new molecules.
Carbon-hydrogen defects with a neighboring oxygen atom in n-type Si

NASA Astrophysics Data System (ADS)

Gwozdz, K.; Stübner, R.; Kolkovsky, Vl.; Weber, J.

2017-07-01

We report on the electrical activation of neutral carbon-oxygen complexes in Si by wet-chemical etching at room temperature. Two deep levels, E65 and E75, are observed by deep level transient spectroscopy in n-type Czochralski Si. The activation enthalpies of E65 and E75 are obtained as EC-0.11 eV (E65) and EC-0.13 eV (E75). The electric field dependence of their emission rates relates both levels to single acceptor states. From the analysis of the depth profiles, we conclude that the levels belong to two different defects, which contain only one hydrogen atom. A configuration is proposed, where the CH1BC defect, with hydrogen in the bond-centered position between neighboring C and Si atoms, is disturbed by interstitial oxygen in the second nearest neighbor position to substitutional carbon. The significant reduction of the CH1BC concentration in samples with high oxygen concentrations limits the use of this defect for the determination of low concentrations of substitutional carbon in Si samples.
Examining change detection approaches for tropical mangrove monitoring

USGS Publications Warehouse

Myint, Soe W.; Franklin, Janet; Buenemann, Michaela; Kim, Won; Giri, Chandra

2014-01-01

This study evaluated the effectiveness of different band combinations and classifiers (unsupervised, supervised, object-oriented nearest neighbor, and object-oriented decision rule) for quantifying mangrove forest change using multitemporal Landsat data. A discriminant analysis using spectra of different vegetation types determined that bands 2 (0.52 to 0.6 μm), 5 (1.55 to 1.75 μm), and 7 (2.08 to 2.35 μm) were the most effective bands for differentiating mangrove forests from surrounding land cover types. A ranking of thirty-six change maps, produced by comparing the classification accuracy of twelve change detection approaches, was used. The object-based Nearest Neighbor classifier produced the highest mean overall accuracy (84 percent) regardless of band combinations. The automated decision rule-based approach (mean overall accuracy of 88 percent) as well as a composite of bands 2, 5, and 7 used with the unsupervised classifier and the same composite or all band difference with the object-oriented Nearest Neighbor classifier were the most effective approaches.
New nonlinear features for inspection, robotics, and face recognition

NASA Astrophysics Data System (ADS)

Casasent, David P.; Talukder, Ashit

1999-10-01

Classification of real-time X-ray images of randomly oriented touching pistachio nuts is discussed. The ultimate objective is the development of a system for automated non- invasive detection of defective product items on a conveyor belt. We discuss the extraction of new features that allow better discrimination between damaged and clean items (pistachio nuts). This feature extraction and classification stage is the new aspect of this paper; our new maximum representation and discriminating feature (MRDF) extraction method computes nonlinear features that are used as inputs to a new modified k nearest neighbor classifier. In this work, the MRDF is applied to standard features (rather than iconic data). The MRDF is robust to various probability distributions of the input class and is shown to provide good classification and new ROC (receiver operating characteristic) data. Other applications of these new feature spaces in robotics and face recognition are also noted.
Breathing Pattern Interpretation as an Alternative and Effective Voice Communication Solution.

PubMed

Elsahar, Yasmin; Bouazza-Marouf, Kaddour; Kerr, David; Gaur, Atul; Kaushik, Vipul; Hu, Sijung

2018-05-15

Augmentative and alternative communication (AAC) systems tend to rely on the interpretation of purposeful gestures for interaction. Existing AAC methods could be cumbersome and limit the solutions in terms of versatility. The study aims to interpret breathing patterns (BPs) to converse with the outside world by means of a unidirectional microphone and researches breathing-pattern interpretation (BPI) to encode messages in an interactive manner with minimal training. We present BP processing work with (1) output synthesized machine-spoken words (SMSW) along with single-channel Weiner filtering (WF) for signal de-noising, and (2) k -nearest neighbor ( k-NN ) classification of BPs associated with embedded dynamic time warping (DTW). An approved protocol to collect analogue modulated BP sets belonging to 4 distinct classes with 10 training BPs per class and 5 live BPs per class was implemented with 23 healthy subjects. An 86% accuracy of k-NN classification was obtained with decreasing error rates of 17%, 14%, and 11% for the live classifications of classes 2, 3, and 4, respectively. The results express a systematic reliability of 89% with increased familiarity. The outcomes from the current AAC setup recommend a durable engineering solution directly beneficial to the sufferers.
Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model

PubMed Central

Mitra, Rajib; Jordan, Michael I.; Dunbrack, Roland L.

2010-01-01

Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp. PMID:20442867
CAVIAR: CLASSIFICATION VIA AGGREGATED REGRESSION AND ITS APPLICATION IN CLASSIFYING OASIS BRAIN DATABASE

PubMed Central

Chen, Ting; Rangarajan, Anand; Vemuri, Baba C.

2010-01-01

This paper presents a novel classification via aggregated regression algorithm – dubbed CAVIAR – and its application to the OASIS MRI brain image database. The CAVIAR algorithm simultaneously combines a set of weak learners based on the assumption that the weight combination for the final strong hypothesis in CAVIAR depends on both the weak learners and the training data. A regularization scheme using the nearest neighbor method is imposed in the testing stage to avoid overfitting. A closed form solution to the cost function is derived for this algorithm. We use a novel feature – the histogram of the deformation field between the MRI brain scan and the atlas which captures the structural changes in the scan with respect to the atlas brain – and this allows us to automatically discriminate between various classes within OASIS [1] using CAVIAR. We empirically show that CAVIAR significantly increases the performance of the weak classifiers by showcasing the performance of our technique on OASIS. PMID:21151847
CAVIAR: CLASSIFICATION VIA AGGREGATED REGRESSION AND ITS APPLICATION IN CLASSIFYING OASIS BRAIN DATABASE.

PubMed

Chen, Ting; Rangarajan, Anand; Vemuri, Baba C

2010-04-14

This paper presents a novel classification via aggregated regression algorithm - dubbed CAVIAR - and its application to the OASIS MRI brain image database. The CAVIAR algorithm simultaneously combines a set of weak learners based on the assumption that the weight combination for the final strong hypothesis in CAVIAR depends on both the weak learners and the training data. A regularization scheme using the nearest neighbor method is imposed in the testing stage to avoid overfitting. A closed form solution to the cost function is derived for this algorithm. We use a novel feature - the histogram of the deformation field between the MRI brain scan and the atlas which captures the structural changes in the scan with respect to the atlas brain - and this allows us to automatically discriminate between various classes within OASIS [1] using CAVIAR. We empirically show that CAVIAR significantly increases the performance of the weak classifiers by showcasing the performance of our technique on OASIS.
A Comparison of Rule-Based, K-Nearest Neighbor, and Neural Net Classifiers for Automated

Treesearch

Tai-Hoon Cho; Richard W. Conners; Philip A. Araman

1991-01-01

Over the last few years the authors have been involved in research aimed at developing a machine vision system for locating and identifying surface defects on materials. The particular problem being studied involves locating surface defects on hardwood lumber in a species independent manner. Obviously, the accurate location and identification of defects is of paramount...
Stratified estimates of forest area using the k-nearest neighbors technique and satellite imagery

Treesearch

Ronald E. McRoberts; Mark D. Nelson; Daniel Wendt

2002-01-01

For two study areas in Minnesota, stratified estimation using Landsat Thematic Mapper satellite imagery as the basis for stratification was used to estimate forest area. Measurements of forest inventory plots obtained for a 12-month period in 1998 and 1999 were used as the source of data for within-strata estimates. These measurements further served as calibration data...
Multisource multibeam backscatter data: developing a strategy for the production of benthic habitat maps using semi-automated seafloor classification methods

NASA Astrophysics Data System (ADS)

Lacharité, Myriam; Brown, Craig J.; Gazzola, Vicki

2018-06-01

The establishment of multibeam echosounders (MBES) as a mainstream tool in ocean mapping has facilitated integrative approaches towards nautical charting, benthic habitat mapping, and seafloor geotechnical surveys. The bathymetric and backscatter information generated by MBES enables marine scientists to present highly accurate bathymetric data with a spatial resolution closely matching that of terrestrial mapping, and can generate customized thematic seafloor maps to meet multiple ocean management needs. However, when a variety of MBES systems are used, the creation of objective habitat maps can be hindered by the lack of backscatter calibration, due for example, to system-specific settings, yielding relative rather than absolute values. Here, we describe an approach using object-based image analysis to combine 4 non-overlapping and uncalibrated (backscatter) MBES coverages to form a seamless habitat map on St. Anns Bank (Atlantic Canada), a marine protected area hosting a diversity of benthic habitats. The benthoscape map was produced by analysing each coverage independently with supervised classification (k-nearest neighbor) of image-objects based on a common suite of 7 benthoscapes (determined with 4214 ground-truthing photographs at 61 stations, and characterized with backscatter, bathymetry, and bathymetric position index). Manual re-classification based on uncertainty in membership values to individual classes—especially at the boundaries between coverages—was used to build the final benthoscape map. Given the costs and scarcity of MBES surveys in offshore marine ecosystems—particularly in large ecosystems in need of adequate conservation strategies, such as in Canadian waters—developing approaches to synthesize multiple datasets to meet management needs is warranted.
A False Alarm Reduction Method for a Gas Sensor Based Electronic Nose

PubMed Central

Rahman, Mohammad Mizanur; Suksompong, Prapun; Toochinda, Pisanu; Taparugssanagorn, Attaphongse

2017-01-01

Electronic noses (E-Noses) are becoming popular for food and fruit quality assessment due to their robustness and repeated usability without fatigue, unlike human experts. An E-Nose equipped with classification algorithms and having open ended classification boundaries such as the k-nearest neighbor (k-NN), support vector machine (SVM), and multilayer perceptron neural network (MLPNN), are found to suffer from false classification errors of irrelevant odor data. To reduce false classification and misclassification errors, and to improve correct rejection performance; algorithms with a hyperspheric boundary, such as a radial basis function neural network (RBFNN) and generalized regression neural network (GRNN) with a Gaussian activation function in the hidden layer should be used. The simulation results presented in this paper show that GRNN has more correct classification efficiency and false alarm reduction capability compared to RBFNN. As the design of a GRNN and RBFNN is complex and expensive due to large numbers of neuron requirements, a simple hyperspheric classification method based on minimum, maximum, and mean (MMM) values of each class of the training dataset was presented. The MMM algorithm was simple and found to be fast and efficient in correctly classifying data of training classes, and correctly rejecting data of extraneous odors, and thereby reduced false alarms. PMID:28895910
A False Alarm Reduction Method for a Gas Sensor Based Electronic Nose.

PubMed

Rahman, Mohammad Mizanur; Charoenlarpnopparut, Chalie; Suksompong, Prapun; Toochinda, Pisanu; Taparugssanagorn, Attaphongse

2017-09-12

Electronic noses (E-Noses) are becoming popular for food and fruit quality assessment due to their robustness and repeated usability without fatigue, unlike human experts. An E-Nose equipped with classification algorithms and having open ended classification boundaries such as the k -nearest neighbor ( k -NN), support vector machine (SVM), and multilayer perceptron neural network (MLPNN), are found to suffer from false classification errors of irrelevant odor data. To reduce false classification and misclassification errors, and to improve correct rejection performance; algorithms with a hyperspheric boundary, such as a radial basis function neural network (RBFNN) and generalized regression neural network (GRNN) with a Gaussian activation function in the hidden layer should be used. The simulation results presented in this paper show that GRNN has more correct classification efficiency and false alarm reduction capability compared to RBFNN. As the design of a GRNN and RBFNN is complex and expensive due to large numbers of neuron requirements, a simple hyperspheric classification method based on minimum, maximum, and mean (MMM) values of each class of the training dataset was presented. The MMM algorithm was simple and found to be fast and efficient in correctly classifying data of training classes, and correctly rejecting data of extraneous odors, and thereby reduced false alarms.
Optimal Detection Range of RFID Tag for RFID-based Positioning System Using the k-NN Algorithm.

PubMed

Han, Soohee; Kim, Junghwan; Park, Choung-Hwan; Yoon, Hee-Cheon; Heo, Joon

2009-01-01

Positioning technology to track a moving object is an important and essential component of ubiquitous computing environments and applications. An RFID-based positioning system using the k-nearest neighbor (k-NN) algorithm can determine the position of a moving reader from observed reference data. In this study, the optimal detection range of an RFID-based positioning system was determined on the principle that tag spacing can be derived from the detection range. It was assumed that reference tags without signal strength information are regularly distributed in 1-, 2- and 3-dimensional spaces. The optimal detection range was determined, through analytical and numerical approaches, to be 125% of the tag-spacing distance in 1-dimensional space. Through numerical approaches, the range was 134% in 2-dimensional space, 143% in 3-dimensional space.
Masked Priming with Orthographic Neighbors: A Test of the Lexical Competition Assumption

ERIC Educational Resources Information Center

Nakayama, Mariko; Sears, Christopher R.; Lupker, Stephen J.

2008-01-01

In models of visual word identification that incorporate inhibitory competition among activated lexical units, a word's higher frequency neighbors will be the word's strongest competitors. Preactivation of these neighbors by a prime is predicted to delay the word's identification. Using the masked priming paradigm (K. I. Forster & C. Davis, 1984,…
A Fast Implementation of the ISOCLUS Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2003-01-01

Unsupervised clustering is a fundamental building block in numerous image processing applications. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute the coordinates of a set of cluster centers in d-space, such that those centers minimize the mean squared distance from each data point to its nearest center. This clustering algorithm is similar to another well-known clustering method, called k-means. One significant feature of ISOCLUS over k-means is that the actual number of clusters reported might be fewer or more than the number supplied as part of the input. The algorithm uses different heuristics to determine whether to merge lor split clusters. As ISOCLUS can run very slowly, particularly on large data sets, there has been a growing .interest in the remote sensing community in computing it efficiently. We have developed a faster implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm of Kanungo, et al. They showed that, by using a kd-tree data structure for storing the data, it is possible to reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm, and we show that it is possible to achieve essentially the same results as ISOCLUS on large data sets, but with significantly lower running times. This adaptation involves computing a number of cluster statistics that are needed for ISOCLUS but not for k-means. Both the k-means and ISOCLUS algorithms are based on iterative schemes, in which nearest neighbors are calculated until some convergence criterion is satisfied. Each iteration requires that the nearest center for each data point be computed. Naively, this requires O

Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization.

PubMed

Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin

2015-08-01

Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.
7 CFR Exhibit K to Subpart A of... - Classifications for Multi-Family Residential Rehabilitation Work

Code of Federal Regulations, 2013 CFR

2013-01-01

... 7 Agriculture 12 2013-01-01 2013-01-01 false Classifications for Multi-Family Residential Rehabilitation Work K Exhibit K to Subpart A of Part 1924 Agriculture Regulations of the Department of... Planning and Performing Construction and Other Development Pt. 1924, Subpt. A, Exh. K Exhibit K to Subpart...
7 CFR Exhibit K to Subpart A of... - Classifications for Multi-Family Residential Rehabilitation Work

Code of Federal Regulations, 2012 CFR

2012-01-01

... 7 Agriculture 12 2012-01-01 2012-01-01 false Classifications for Multi-Family Residential Rehabilitation Work K Exhibit K to Subpart A of Part 1924 Agriculture Regulations of the Department of... Planning and Performing Construction and Other Development Pt. 1924, Subpt. A, Exh. K Exhibit K to Subpart...
7 CFR Exhibit K to Subpart A of... - Classifications for Multi-Family Residential Rehabilitation Work

Code of Federal Regulations, 2011 CFR

2011-01-01

... 7 Agriculture 12 2011-01-01 2011-01-01 false Classifications for Multi-Family Residential Rehabilitation Work K Exhibit K to Subpart A of Part 1924 Agriculture Regulations of the Department of... Planning and Performing Construction and Other Development Pt. 1924, Subpt. A, Exh. K Exhibit K to Subpart...
7 CFR Exhibit K to Subpart A of... - Classifications for Multi-Family Residential Rehabilitation Work

Code of Federal Regulations, 2010 CFR

2010-01-01

... 7 Agriculture 12 2010-01-01 2010-01-01 false Classifications for Multi-Family Residential Rehabilitation Work K Exhibit K to Subpart A of Part 1924 Agriculture Regulations of the Department of... Planning and Performing Construction and Other Development Pt. 1924, Subpt. A, Exh. K Exhibit K to Subpart...
7 CFR Exhibit K to Subpart A of... - Classifications for Multi-Family Residential Rehabilitation Work

Code of Federal Regulations, 2014 CFR

2014-01-01

... 7 Agriculture 12 2014-01-01 2013-01-01 true Classifications for Multi-Family Residential Rehabilitation Work K Exhibit K to Subpart A of Part 1924 Agriculture Regulations of the Department of... Planning and Performing Construction and Other Development Pt. 1924, Subpt. A, Exh. K Exhibit K to Subpart...
One input-class and two input-class classifications for differentiating olive oil from other edible vegetable oils by use of the normal-phase liquid chromatography fingerprint of the methyl-transesterified fraction.

PubMed

Jiménez-Carvelo, Ana M; Pérez-Castaño, Estefanía; González-Casado, Antonio; Cuadros-Rodríguez, Luis

2017-04-15

A new method for differentiation of olive oil (independently of the quality category) from other vegetable oils (canola, safflower, corn, peanut, seeds, grapeseed, palm, linseed, sesame and soybean) has been developed. The analytical procedure for chromatographic fingerprinting of the methyl-transesterified fraction of each vegetable oil, using normal-phase liquid chromatography, is described and the chemometric strategies applied and discussed. Some chemometric methods, such as k-nearest neighbours (kNN), partial least squared-discriminant analysis (PLS-DA), support vector machine classification analysis (SVM-C), and soft independent modelling of class analogies (SIMCA), were applied to build classification models. Performance of the classification was evaluated and ranked using several classification quality metrics. The discriminant analysis, based on the use of one input-class, (plus a dummy class) was applied for the first time in this study. Copyright © 2016 Elsevier Ltd. All rights reserved.
Critical Temperature of Randomly Diluted Two-Dimensional Heisenberg Ferromagnet, K2CuxZn(1-x)F4

NASA Astrophysics Data System (ADS)

Okuda, Yuichi; Tohi, Yasuto; Yamada, Isao; Haseda, Taiichiro

1980-09-01

The susceptibility of randomly diluted two-dimensional Heisenberg-like ferromagnet K2CuxZn(1-x)F4 was measured down to 50 mK, using the 3He-4He dilution refrigerator and a SQUID magnetometer. The ferromagnetic critical temperature Tc(x) was obtained for x{=}0.98, 0.94, 0.85, 0.82, 0.68, 0.60, 0.54, 0.50 and 0.42. The value of [1/Tc(1)][(d/dx)Tc(x)]x=1 was approximately 3.0. The critical temperature versus x curve exhibits a noticeable tail near the critical concentration, which may stem from the second nearest-neighbor interaction. The critical concentration xc, below which concentration there is no long range order down to T{=}0 K, was estimated to be 0.45˜0.50. The susceptibility of sample with x{=}0.42 behaves as if it obeys the Curie law down to 50 mK.
Quantum Correlation Properties in Composite Parity-Conserved Matrix Product States

NASA Astrophysics Data System (ADS)

Zhu, Jing-Min

2016-09-01

We give a new thought for constructing long-range quantum correlation in quantum many-body systems. Our proposed composite parity-conserved matrix product state has long-range quantum correlation only for two spin blocks where their spin-block length larger than 1 compared to any subsystem only having short-range quantum correlation, and we investigate quantum correlation properties of two spin blocks varying with environment parameter and spacing spin number. We also find that the geometry quantum discords of two nearest-neighbor spin blocks and two next-nearest-neighbor spin blocks become smaller and for other conditions the geometry quantum discord becomes larger than that in any subcomponent, i.e., the increase or the production of the long-range quantum correlation is at the cost of reducing the short-range quantum correlation compared to the corresponding classical correlation and total correlation having no any characteristic of regulation. For nearest-neighbor and next-nearest-neighbor all the correlations take their maximal values at the same points, while for other conditions no whether for spacing same spin number or for different spacing spin numbers all the correlations taking their maximal values are respectively at different points which are very close. We believe that our work is helpful to comprehensively and deeply understand the organization and structure of quantum correlation especially for long-range quantum correlation of quantum many-body systems; and further helpful for the classification, the depiction and the measure of quantum correlation of quantum many-body systems.
Chirality dependence of dipole matrix element of carbon nanotubes in axial magnetic field: A third neighbor tight binding approach

NASA Astrophysics Data System (ADS)

Chegel, Raad; Behzad, Somayeh

2014-02-01

We have studied the electronic structure and dipole matrix element, D, of carbon nanotubes (CNTs) under magnetic field, using the third nearest neighbor tight binding model. It is shown that the 1NN and 3NN-TB band structures show differences such as the spacing and mixing of neighbor subbands. Applying the magnetic field leads to breaking the degeneracy behavior in the D transitions and creates new allowed transitions corresponding to the band modifications. It is found that |D| is proportional to the inverse tube radius and chiral angle. Our numerical results show that amount of filed induced splitting for the first optical peak is proportional to the magnetic field by the splitting rate ν11. It is shown that ν11 changes linearly and parabolicly with the chiral angle and radius, respectively.
An automated algorithm for determining photometric redshifts of quasars

NASA Astrophysics Data System (ADS)

Wang, Dan; Zhang, Yanxia; Zhao, Yongheng

2010-07-01

We employ k-nearest neighbor algorithm (KNN) for photometric redshift measurement of quasars with the Fifth Data Release (DR5) of the Sloan Digital Sky Survey (SDSS). KNN is an instance learning algorithm where the result of new instance query is predicted based on the closest training samples. The regressor do not use any model to fit and only based on memory. Given a query quasar, we find the known quasars or (training points) closest to the query point, whose redshift value is simply assigned to be the average of the values of its k nearest neighbors. Three kinds of different colors (PSF, Model or Fiber) and spectral redshifts are used as input parameters, separatively. The combination of the three kinds of colors is also taken as input. The experimental results indicate that the best input pattern is PSF + Model + Fiber colors in all experiments. With this pattern, 59.24%, 77.34% and 84.68% of photometric redshifts are obtained within ▵z < 0.1, 0.2 and 0.3, respectively. If only using one kind of colors as input, the model colors achieve the best performance. However, when using two kinds of colors, the best result is achieved by PSF + Fiber colors. In addition, nearest neighbor method (k = 1) shows its superiority compared to KNN (k ≠ 1) for the given sample.
E-Nose Vapor Identification Based on Dempster-Shafer Fusion of Multiple Classifiers

NASA Technical Reports Server (NTRS)

Li, Winston; Leung, Henry; Kwan, Chiman; Linnell, Bruce R.

2005-01-01

Electronic nose (e-nose) vapor identification is an efficient approach to monitor air contaminants in space stations and shuttles in order to ensure the health and safety of astronauts. Data preprocessing (measurement denoising and feature extraction) and pattern classification are important components of an e-nose system. In this paper, a wavelet-based denoising method is applied to filter the noisy sensor measurements. Transient-state features are then extracted from the denoised sensor measurements, and are used to train multiple classifiers such as multi-layer perceptions (MLP), support vector machines (SVM), k nearest neighbor (KNN), and Parzen classifier. The Dempster-Shafer (DS) technique is used at the end to fuse the results of the multiple classifiers to get the final classification. Experimental analysis based on real vapor data shows that the wavelet denoising method can remove both random noise and outliers successfully, and the classification rate can be improved by using classifier fusion.
X-ray K-edge absorption spectra of Fe minerals and model compounds: II. EXAFS

NASA Astrophysics Data System (ADS)

Waychunas, Glenn A.; Brown, Gordon E.; Apted, Michael J.

1986-01-01

K-edge extended X-ray absorption fine structure (EXAFS) spectra of Fe in varying environments in a suite of well-characterized silicate and oxide minerals were collected using synchrotron radiation and analyzed using single scattering approximation theory to yield nearest neighbor Fe-O distances and coordination numbers. The partial inverse character of synthetic hercynite spinal was verified in this way. Comparison of the results from all samples with structural data from X-ray diffraction crystal structure refinements indicates that EXAFS-derived first neighbor distances are generally accurate to ±0.02 Å using only theoretically generated phase information, and may be improved over this if similar model compounds are used to determine EXAFS phase functions. Coordination numbers are accurate to ±20 percent and can be similarly improved using model compound EXAFS amplitude information. However, in particular cases the EXAFS-derived distances may be shortened, and the coordination number reduced, by the effects of static and thermal disorder or by partial overlap of the longer Fe-O first neighbor distances with second neighbor distances in the EXAFS structure function. In the former case the total information available in the EXAFS is limited by the disorder, while in the latter case more accurate results can in principle be obtained by multiple neighbor EXAFS analysis. The EXAFS and XANES spectra of Fe in Nain, Labrador osumulite and Lakeview, Oregon plagioclase are also analyzed as an example of the application of X-ray absorption spectroscopy to metal ion site occupation determination in minerals.
The nearest relative in mental health law.

PubMed

Andoh, Benjamin; Gogo, Emmanuel

2004-04-01

This article considers the concept of the 'nearest relative' in mental health law in England and Wales and argues, inter alia, for its retention in a way that avoids violation of the European Convention on Human Rights and the Human Rights Act 1998. It looks, first, at the meaning of nearest relative and then focuses on his/her role today, including its link with advance directives for mental health care, and on the tension between nearest relatives and approved social workers and the law. The problem exposed by JT v. United Kingdom in relation to the Human Rights Act 1998 and its implications for the future are considered. The impact of the Mental Health Bill (2002) on the nearest relative is discussed and recommendations to improve the present law are then suggested.
A Feature-based Approach to Big Data Analysis of Medical Images

PubMed Central

Toews, Matthew; Wachinger, Christian; Estepar, Raul San Jose; Wells, William M.

2015-01-01

This paper proposes an inference method well-suited to large sets of medical images. The method is based upon a framework where distinctive 3D scale-invariant features are indexed efficiently to identify approximate nearest-neighbor (NN) feature matches in O(log N) computational complexity in the number of images N. It thus scales well to large data sets, in contrast to methods based on pair-wise image registration or feature matching requiring O(N) complexity. Our theoretical contribution is a density estimator based on a generative model that generalizes kernel density estimation and K-nearest neighbor (KNN) methods. The estimator can be used for on-the-fly queries, without requiring explicit parametric models or an off-line training phase. The method is validated on a large multi-site data set of 95,000,000 features extracted from 19,000 lung CT scans. Subject-level classification identifies all images of the same subjects across the entire data set despite deformation due to breathing state, including unintentional duplicate scans. State-of-the-art performance is achieved in predicting chronic pulmonary obstructive disorder (COPD) severity across the 5-category GOLD clinical rating, with an accuracy of 89% if both exact and one-off predictions are considered correct. PMID:26221685
A Feature-Based Approach to Big Data Analysis of Medical Images.

PubMed

Toews, Matthew; Wachinger, Christian; Estepar, Raul San Jose; Wells, William M

2015-01-01

This paper proposes an inference method well-suited to large sets of medical images. The method is based upon a framework where distinctive 3D scale-invariant features are indexed efficiently to identify approximate nearest-neighbor (NN) feature matches-in O (log N) computational complexity in the number of images N. It thus scales well to large data sets, in contrast to methods based on pair-wise image registration or feature matching requiring O(N) complexity. Our theoretical contribution is a density estimator based on a generative model that generalizes kernel density estimation and K-nearest neighbor (KNN) methods.. The estimator can be used for on-the-fly queries, without requiring explicit parametric models or an off-line training phase. The method is validated on a large multi-site data set of 95,000,000 features extracted from 19,000 lung CT scans. Subject-level classification identifies all images of the same subjects across the entire data set despite deformation due to breathing state, including unintentional duplicate scans. State-of-the-art performance is achieved in predicting chronic pulmonary obstructive disorder (COPD) severity across the 5-category GOLD clinical rating, with an accuracy of 89% if both exact and one-off predictions are considered correct.
Automatic classification of fluorescence and optical diffusion spectroscopy data in neuro-oncology

NASA Astrophysics Data System (ADS)

Savelieva, T. A.; Loshchenov, V. B.; Goryajnov, S. A.; Potapov, A. A.

2018-04-01

The complexity of the biological tissue spectroscopic analysis due to the overlap of biological molecules' absorption spectra, multiple scattering effect, as well as measurement geometry in vivo has caused the relevance of this work. In the neurooncology the problem of tumor boundaries delineation is especially acute and requires the development of new methods of intraoperative diagnosis. Methods of optical spectroscopy allow detecting various diagnostically significant parameters non-invasively. 5-ALA induced protoporphyrin IX is frequently used as fluorescent tumor marker in neurooncology. At the same time analysis of the concentration and the oxygenation level of haemoglobin and significant changes of light scattering in tumor tissues have a high diagnostic value. This paper presents an original method for the simultaneous registration of backward diffuse reflectance and fluorescence spectra, which allows defining all the parameters listed above simultaneously. The clinical studies involving 47 patients with intracranial glial tumors of II-IV Grades were carried out in N.N. Burdenko National Medical Research Center of Neurosurgery. To register the spectral dependences the spectroscopic system LESA- 01-BIOSPEC was used with specially developed w-shaped diagnostic fiber optic probe. The original algorithm of combined spectroscopic signal processing was developed. We have created a software and hardware, which allowed (as compared with the methods currently used in neurosurgical practice) to increase the sensitivity of intraoperative demarcation of intracranial tumors from 78% to 96%, specificity of 60% to 82%. The result of analysis of different techniques of automatic classification shows that in our case the most appropriate is the k Nearest Neighbors algorithm with cubic metrics.
Gap Shape Classification using Landscape Indices and Multivariate Statistics.

PubMed

Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung

2016-11-30

This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks' lambda as satisfactory (p < 0.001). The agreement rate of confusion matrices exceeded 96%. Differences in gap characteristics between the classified gap types that were determined using a one-way ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap.
Gap Shape Classification using Landscape Indices and Multivariate Statistics

PubMed Central

Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung

2016-01-01

This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks’ lambda as satisfactory (p < 0.001). The agreement rate of confusion matrices exceeded 96%. Differences in gap characteristics between the classified gap types that were determined using a one-way ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap. PMID:27901127
Demonstrated Potential of Ion Mobility Spectrometry for Detection of Adulterated Perfumes and Plant Speciation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clark, Jared Matthew; Daum, Keith Alvin; Kalival, J. H.

2003-01-01

This initial study evaluates the use of ion mobility spectrometry (IMS) as a rapid test procedure for potential detection of adulterated perfumes and speciation of plant life. Sample types measured consist of five genuine perfumes, two species of sagebrush, and four species of flowers. Each sample type is treated as a separate classification problem. It is shown that discrimination using principal component analysis with K-nearest neighbors can distinguish one class from another. Discriminatory models generated using principal component regressions are not as effective. Results from this examination are encouraging and represent an initial phase demonstrating that perfumes and plants possessmore » characteristic chemical signatures that can be used for reliable identification.« less

Some links on this page may take you to non-federal websites. Their policies may differ from this site.