K-Nearest Neighbor Algorithm Optimization in Text Categorization
NASA Astrophysics Data System (ADS)
Chen, Shufeng
2018-01-01
K-Nearest Neighbor (KNN) classification algorithm is one of the simplest methods of data mining. It has been widely used in classification, regression and pattern recognition. The traditional KNN method has some shortcomings such as large amount of sample computation and strong dependence on the sample library capacity. In this paper, a method of representative sample optimization based on CURE algorithm is proposed. On the basis of this, presenting a quick algorithm QKNN (Quick k-nearest neighbor) to find the nearest k neighbor samples, which greatly reduces the similarity calculation. The experimental results show that this algorithm can effectively reduce the number of samples and speed up the search for the k nearest neighbor samples to improve the performance of the algorithm.
Wang, Xueyi
2012-02-08
The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.
Scalable Nearest Neighbor Algorithms for High Dimensional Data.
Muja, Marius; Lowe, David G
2014-11-01
For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
Analysis of miRNA expression profile based on SVM algorithm
NASA Astrophysics Data System (ADS)
Ting-ting, Dai; Chang-ji, Shan; Yan-shou, Dong; Yi-duo, Bian
2018-05-01
Based on mirna expression spectrum data set, a new data mining algorithm - tSVM - KNN (t statistic with support vector machine - k nearest neighbor) is proposed. the idea of the algorithm is: firstly, the feature selection of the data set is carried out by the unified measurement method; Secondly, SVM - KNN algorithm, which combines support vector machine (SVM) and k - nearest neighbor (k - nearest neighbor) is used as classifier. Simulation results show that SVM - KNN algorithm has better classification ability than SVM and KNN alone. Tsvm - KNN algorithm only needs 5 mirnas to obtain 96.08 % classification accuracy in terms of the number of mirna " tags" and recognition accuracy. compared with similar algorithms, tsvm - KNN algorithm has obvious advantages.
Fast Demand Forecast of Electric Vehicle Charging Stations for Cell Phone Application
DOE Office of Scientific and Technical Information (OSTI.GOV)
Majidpour, Mostafa; Qiu, Charlie; Chung, Ching-Yen
This paper describes the core cellphone application algorithm which has been implemented for the prediction of energy consumption at Electric Vehicle (EV) Charging Stations at UCLA. For this interactive user application, the total time of accessing database, processing the data and making the prediction, needs to be within a few seconds. We analyze four relatively fast Machine Learning based time series prediction algorithms for our prediction engine: Historical Average, kNearest Neighbor, Weighted k-Nearest Neighbor, and Lazy Learning. The Nearest Neighbor algorithm (k Nearest Neighbor with k=1) shows better performance and is selected to be the prediction algorithm implemented for themore » cellphone application. Two applications have been designed on top of the prediction algorithm: one predicts the expected available energy at the station and the other one predicts the expected charging finishing time. The total time, including accessing the database, data processing, and prediction is about one second for both applications.« less
Applying an efficient K-nearest neighbor search to forest attribute imputation
Andrew O. Finley; Ronald E. McRoberts; Alan R. Ek
2006-01-01
This paper explores the utility of an efficient nearest neighbor (NN) search algorithm for applications in multi-source kNN forest attribute imputation. The search algorithm reduces the number of distance calculations between a given target vector and each reference vector, thereby, decreasing the time needed to discover the NN subset. Results of five trials show gains...
Quantum Algorithm for K-Nearest Neighbors Classification Based on the Metric of Hamming Distance
NASA Astrophysics Data System (ADS)
Ruan, Yue; Xue, Xiling; Liu, Heng; Tan, Jianing; Li, Xi
2017-11-01
K-nearest neighbors (KNN) algorithm is a common algorithm used for classification, and also a sub-routine in various complicated machine learning tasks. In this paper, we presented a quantum algorithm (QKNN) for implementing this algorithm based on the metric of Hamming distance. We put forward a quantum circuit for computing Hamming distance between testing sample and each feature vector in the training set. Taking advantage of this method, we realized a good analog for classical KNN algorithm by setting a distance threshold value t to select k - n e a r e s t neighbors. As a result, QKNN achieves O( n 3) performance which is only relevant to the dimension of feature vectors and high classification accuracy, outperforms Llyod's algorithm (Lloyd et al. 2013) and Wiebe's algorithm (Wiebe et al. 2014).
Ronald E. McRoberts; Grant M. Domke; Qi Chen; Erik Næsset; Terje Gobakken
2016-01-01
The relatively small sampling intensities used by national forest inventories are often insufficient to produce the desired precision for estimates of population parameters unless the estimation process is augmented with auxiliary information, usually in the form of remotely sensed data. The k-Nearest Neighbors (k-NN) technique is a non-parametric,multivariate approach...
Privacy Preserving Nearest Neighbor Search
NASA Astrophysics Data System (ADS)
Shaneck, Mark; Kim, Yongdae; Kumar, Vipin
Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by providing mechanisms to mine the data while giving certain privacy guarantees. In this chapter we address the issue of privacy preserving nearest neighbor search, which forms the kernel of many data mining applications. To this end, we present a novel algorithm based on secure multiparty computation primitives to compute the nearest neighbors of records in horizontally distributed data. We show how this algorithm can be used in three important data mining algorithms, namely LOF outlier detection, SNN clustering, and kNN classification. We prove the security of these algorithms under the semi-honest adversarial model, and describe methods that can be used to optimize their performance. Keywords: Privacy Preserving Data Mining, Nearest Neighbor Search, Outlier Detection, Clustering, Classification, Secure Multiparty Computation
Missing value imputation in DNA microarrays based on conjugate gradient method.
Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh
2012-02-01
Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.
Study of parameters of the nearest neighbour shared algorithm on clustering documents
NASA Astrophysics Data System (ADS)
Mustika Rukmi, Alvida; Budi Utomo, Daryono; Imro’atus Sholikhah, Neni
2018-03-01
Document clustering is one way of automatically managing documents, extracting of document topics and fastly filtering information. Preprocess of clustering documents processed by textmining consists of: keyword extraction using Rapid Automatic Keyphrase Extraction (RAKE) and making the document as concept vector using Latent Semantic Analysis (LSA). Furthermore, the clustering process is done so that the documents with the similarity of the topic are in the same cluster, based on the preprocesing by textmining performed. Shared Nearest Neighbour (SNN) algorithm is a clustering method based on the number of "nearest neighbors" shared. The parameters in the SNN Algorithm consist of: k nearest neighbor documents, ɛ shared nearest neighbor documents and MinT minimum number of similar documents, which can form a cluster. Characteristics The SNN algorithm is based on shared ‘neighbor’ properties. Each cluster is formed by keywords that are shared by the documents. SNN algorithm allows a cluster can be built more than one keyword, if the value of the frequency of appearing keywords in document is also high. Determination of parameter values on SNN algorithm affects document clustering results. The higher parameter value k, will increase the number of neighbor documents from each document, cause similarity of neighboring documents are lower. The accuracy of each cluster is also low. The higher parameter value ε, caused each document catch only neighbor documents that have a high similarity to build a cluster. It also causes more unclassified documents (noise). The higher the MinT parameter value cause the number of clusters will decrease, since the number of similar documents can not form clusters if less than MinT. Parameter in the SNN Algorithm determine performance of clustering result and the amount of noise (unclustered documents ). The Silhouette coeffisient shows almost the same result in many experiments, above 0.9, which means that SNN algorithm works well with different parameter values.
Chikh, Mohamed Amine; Saidi, Meryem; Settouti, Nesma
2012-10-01
The use of expert systems and artificial intelligence techniques in disease diagnosis has been increasing gradually. Artificial Immune Recognition System (AIRS) is one of the methods used in medical classification problems. AIRS2 is a more efficient version of the AIRS algorithm. In this paper, we used a modified AIRS2 called MAIRS2 where we replace the K- nearest neighbors algorithm with the fuzzy K-nearest neighbors to improve the diagnostic accuracy of diabetes diseases. The diabetes disease dataset used in our work is retrieved from UCI machine learning repository. The performances of the AIRS2 and MAIRS2 are evaluated regarding classification accuracy, sensitivity and specificity values. The highest classification accuracy obtained when applying the AIRS2 and MAIRS2 using 10-fold cross-validation was, respectively 82.69% and 89.10%.
NASA Astrophysics Data System (ADS)
Li, Chao; Yang, Sheng-Chao; Guo, Qiao-Sheng; Zheng, Kai-Yan; Wang, Ping-Li; Meng, Zhen-Gui
2016-01-01
A combination of Fourier transform infrared spectroscopy with chemometrics tools provided an approach for studying Marsdenia tenacissima according to its geographical origin. A total of 128 M. tenacissima samples from four provinces in China were analyzed with FTIR spectroscopy. Six pattern recognition methods were used to construct the discrimination models: support vector machine-genetic algorithms, support vector machine-particle swarm optimization, K-nearest neighbors, radial basis function neural network, random forest and support vector machine-grid search. Experimental results showed that K-nearest neighbors was superior to other mathematical algorithms after data were preprocessed with wavelet de-noising, with a discrimination rate of 100% in both the training and prediction sets. This study demonstrated that FTIR spectroscopy coupled with K-nearest neighbors could be successfully applied to determine the geographical origins of M. tenacissima samples, thereby providing reliable authentication in a rapid, cheap and noninvasive way.
Pan, Yongke; Niu, Wenjia
2017-01-01
Semisupervised Discriminant Analysis (SDA) is a semisupervised dimensionality reduction algorithm, which can easily resolve the out-of-sample problem. Relative works usually focus on the geometric relationships of data points, which are not obvious, to enhance the performance of SDA. Different from these relative works, the regularized graph construction is researched here, which is important in the graph-based semisupervised learning methods. In this paper, we propose a novel graph for Semisupervised Discriminant Analysis, which is called combined low-rank and k-nearest neighbor (LRKNN) graph. In our LRKNN graph, we map the data to the LR feature space and then the kNN is adopted to satisfy the algorithmic requirements of SDA. Since the low-rank representation can capture the global structure and the k-nearest neighbor algorithm can maximally preserve the local geometrical structure of the data, the LRKNN graph can significantly improve the performance of SDA. Extensive experiments on several real-world databases show that the proposed LRKNN graph is an efficient graph constructor, which can largely outperform other commonly used baselines. PMID:28316616
A Fast Implementation of the ISOCLUS Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2003-01-01
Unsupervised clustering is a fundamental building block in numerous image processing applications. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute the coordinates of a set of cluster centers in d-space, such that those centers minimize the mean squared distance from each data point to its nearest center. This clustering algorithm is similar to another well-known clustering method, called k-means. One significant feature of ISOCLUS over k-means is that the actual number of clusters reported might be fewer or more than the number supplied as part of the input. The algorithm uses different heuristics to determine whether to merge lor split clusters. As ISOCLUS can run very slowly, particularly on large data sets, there has been a growing .interest in the remote sensing community in computing it efficiently. We have developed a faster implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm of Kanungo, et al. They showed that, by using a kd-tree data structure for storing the data, it is possible to reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm, and we show that it is possible to achieve essentially the same results as ISOCLUS on large data sets, but with significantly lower running times. This adaptation involves computing a number of cluster statistics that are needed for ISOCLUS but not for k-means. Both the k-means and ISOCLUS algorithms are based on iterative schemes, in which nearest neighbors are calculated until some convergence criterion is satisfied. Each iteration requires that the nearest center for each data point be computed. Naively, this requires O(kn) time, where k denotes the current number of centers. Traditional techniques for accelerating nearest neighbor searching involve storing the k centers in a data structure. However, because of the iterative nature of the algorithm, this data structure would need to be rebuilt with each new iteration. Our approach is to store the data points in a kd-tree data structure. The assignment of points to nearest neighbors is carried out by a filtering process, which successively eliminates centers that can not possibly be the nearest neighbor for a given region of space. This algorithm is significantly faster, because large groups of data points can be assigned to their nearest center in a single operation. Preliminary results on a number of real Landsat datasets show that our revised ISOCLUS-like scheme runs about twice as fast.
Improving GPU-accelerated adaptive IDW interpolation algorithm using fast kNN search.
Mei, Gang; Xu, Nengxiong; Xu, Liangliang
2016-01-01
This paper presents an efficient parallel Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm on modern Graphics Processing Unit (GPU). The presented algorithm is an improvement of our previous GPU-accelerated AIDW algorithm by adopting fast k-nearest neighbors (kNN) search. In AIDW, it needs to find several nearest neighboring data points for each interpolated point to adaptively determine the power parameter; and then the desired prediction value of the interpolated point is obtained by weighted interpolating using the power parameter. In this work, we develop a fast kNN search approach based on the space-partitioning data structure, even grid, to improve the previous GPU-accelerated AIDW algorithm. The improved algorithm is composed of the stages of kNN search and weighted interpolating. To evaluate the performance of the improved algorithm, we perform five groups of experimental tests. The experimental results indicate: (1) the improved algorithm can achieve a speedup of up to 1017 over the corresponding serial algorithm; (2) the improved algorithm is at least two times faster than our previous GPU-accelerated AIDW algorithm; and (3) the utilization of fast kNN search can significantly improve the computational efficiency of the entire GPU-accelerated AIDW algorithm.
Nearest Neighbor Algorithms for Pattern Classification
NASA Technical Reports Server (NTRS)
Barrios, J. O.
1972-01-01
A solution of the discrimination problem is considered by means of the minimum distance classifier, commonly referred to as the nearest neighbor (NN) rule. The NN rule is nonparametric, or distribution free, in the sense that it does not depend on any assumptions about the underlying statistics for its application. The k-NN rule is a procedure that assigns an observation vector z to a category F if most of the k nearby observations x sub i are elements of F. The condensed nearest neighbor (CNN) rule may be used to reduce the size of the training set required categorize The Bayes risk serves merely as a reference-the limit of excellence beyond which it is not possible to go. The NN rule is bounded below by the Bayes risk and above by twice the Bayes risk.
Generative Models for Similarity-based Classification
2007-01-01
NC), local nearest centroid (local NC), k-nearest neighbors ( kNN ), and condensed nearest neighbors (CNN) are all similarity-based classifiers which...vector machine to the k nearest neighbors of the test sample [80]. The SVM- KNN method was developed to address the robustness and dimensionality...concerns that afflict nearest neighbors and SVMs. Similarly to the nearest-means classifier, the SVM- KNN is a hybrid local and global classifier developed
An automated algorithm for determining photometric redshifts of quasars
NASA Astrophysics Data System (ADS)
Wang, Dan; Zhang, Yanxia; Zhao, Yongheng
2010-07-01
We employ k-nearest neighbor algorithm (KNN) for photometric redshift measurement of quasars with the Fifth Data Release (DR5) of the Sloan Digital Sky Survey (SDSS). KNN is an instance learning algorithm where the result of new instance query is predicted based on the closest training samples. The regressor do not use any model to fit and only based on memory. Given a query quasar, we find the known quasars or (training points) closest to the query point, whose redshift value is simply assigned to be the average of the values of its k nearest neighbors. Three kinds of different colors (PSF, Model or Fiber) and spectral redshifts are used as input parameters, separatively. The combination of the three kinds of colors is also taken as input. The experimental results indicate that the best input pattern is PSF + Model + Fiber colors in all experiments. With this pattern, 59.24%, 77.34% and 84.68% of photometric redshifts are obtained within ▵z < 0.1, 0.2 and 0.3, respectively. If only using one kind of colors as input, the model colors achieve the best performance. However, when using two kinds of colors, the best result is achieved by PSF + Fiber colors. In addition, nearest neighbor method (k = 1) shows its superiority compared to KNN (k ≠ 1) for the given sample.
Fast Query-Optimized Kernel-Machine Classification
NASA Technical Reports Server (NTRS)
Mazzoni, Dominic; DeCoste, Dennis
2004-01-01
A recently developed algorithm performs kernel-machine classification via incremental approximate nearest support vectors. The algorithm implements support-vector machines (SVMs) at speeds 10 to 100 times those attainable by use of conventional SVM algorithms. The algorithm offers potential benefits for classification of images, recognition of speech, recognition of handwriting, and diverse other applications in which there are requirements to discern patterns in large sets of data. SVMs constitute a subset of kernel machines (KMs), which have become popular as models for machine learning and, more specifically, for automated classification of input data on the basis of labeled training data. While similar in many ways to k-nearest-neighbors (k-NN) models and artificial neural networks (ANNs), SVMs tend to be more accurate. Using representations that scale only linearly in the numbers of training examples, while exploring nonlinear (kernelized) feature spaces that are exponentially larger than the original input dimensionality, KMs elegantly and practically overcome the classic curse of dimensionality. However, the price that one must pay for the power of KMs is that query-time complexity scales linearly with the number of training examples, making KMs often orders of magnitude more computationally expensive than are ANNs, decision trees, and other popular machine learning alternatives. The present algorithm treats an SVM classifier as a special form of a k-NN. The algorithm is based partly on an empirical observation that one can often achieve the same classification as that of an exact KM by using only small fraction of the nearest support vectors (SVs) of a query. The exact KM output is a weighted sum over the kernel values between the query and the SVs. In this algorithm, the KM output is approximated with a k-NN classifier, the output of which is a weighted sum only over the kernel values involving k selected SVs. Before query time, there are gathered statistics about how misleading the output of the k-NN model can be, relative to the outputs of the exact KM for a representative set of examples, for each possible k from 1 to the total number of SVs. From these statistics, there are derived upper and lower thresholds for each step k. These thresholds identify output levels for which the particular variant of the k-NN model already leans so strongly positively or negatively that a reversal in sign is unlikely, given the weaker SV neighbors still remaining. At query time, the partial output of each query is incrementally updated, stopping as soon as it exceeds the predetermined statistical thresholds of the current step. For an easy query, stopping can occur as early as step k = 1. For more difficult queries, stopping might not occur until nearly all SVs are touched. A key empirical observation is that this approach can tolerate very approximate nearest-neighbor orderings. In experiments, SVs and queries were projected to a subspace comprising the top few principal- component dimensions and neighbor orderings were computed in that subspace. This approach ensured that the overhead of the nearest-neighbor computations was insignificant, relative to that of the exact KM computation.
AVNM: A Voting based Novel Mathematical Rule for Image Classification.
Vidyarthi, Ankit; Mittal, Namita
2016-12-01
In machine learning, the accuracy of the system depends upon classification result. Classification accuracy plays an imperative role in various domains. Non-parametric classifier like K-Nearest Neighbor (KNN) is the most widely used classifier for pattern analysis. Besides its easiness, simplicity and effectiveness characteristics, the main problem associated with KNN classifier is the selection of a number of nearest neighbors i.e. "k" for computation. At present, it is hard to find the optimal value of "k" using any statistical algorithm, which gives perfect accuracy in terms of low misclassification error rate. Motivated by the prescribed problem, a new sample space reduction weighted voting mathematical rule (AVNM) is proposed for classification in machine learning. The proposed AVNM rule is also non-parametric in nature like KNN. AVNM uses the weighted voting mechanism with sample space reduction to learn and examine the predicted class label for unidentified sample. AVNM is free from any initial selection of predefined variable and neighbor selection as found in KNN algorithm. The proposed classifier also reduces the effect of outliers. To verify the performance of the proposed AVNM classifier, experiments are made on 10 standard datasets taken from UCI database and one manually created dataset. The experimental result shows that the proposed AVNM rule outperforms the KNN classifier and its variants. Experimentation results based on confusion matrix accuracy parameter proves higher accuracy value with AVNM rule. The proposed AVNM rule is based on sample space reduction mechanism for identification of an optimal number of nearest neighbor selections. AVNM results in better classification accuracy and minimum error rate as compared with the state-of-art algorithm, KNN, and its variants. The proposed rule automates the selection of nearest neighbor selection and improves classification rate for UCI dataset and manually created dataset. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Frog sound identification using extended k-nearest neighbor classifier
NASA Astrophysics Data System (ADS)
Mukahar, Nordiana; Affendi Rosdi, Bakhtiar; Athiar Ramli, Dzati; Jaafar, Haryati
2017-09-01
Frog sound identification based on the vocalization becomes important for biological research and environmental monitoring. As a result, different types of feature extractions and classifiers have been employed to evaluate the accuracy of frog sound identification. This paper presents a frog sound identification with Extended k-Nearest Neighbor (EKNN) classifier. The EKNN classifier integrates the nearest neighbors and mutual sharing of neighborhood concepts, with the aims of improving the classification performance. It makes a prediction based on who are the nearest neighbors of the testing sample and who consider the testing sample as their nearest neighbors. In order to evaluate the classification performance in frog sound identification, the EKNN classifier is compared with competing classifier, k -Nearest Neighbor (KNN), Fuzzy k -Nearest Neighbor (FKNN) k - General Nearest Neighbor (KGNN)and Mutual k -Nearest Neighbor (MKNN) on the recorded sounds of 15 frog species obtained in Malaysia forest. The recorded sounds have been segmented using Short Time Energy and Short Time Average Zero Crossing Rate (STE+STAZCR), sinusoidal modeling (SM), manual and the combination of Energy (E) and Zero Crossing Rate (ZCR) (E+ZCR) while the features are extracted by Mel Frequency Cepstrum Coefficient (MFCC). The experimental results have shown that the EKNCN classifier exhibits the best performance in terms of accuracy compared to the competing classifiers, KNN, FKNN, GKNN and MKNN for all cases.
Vehicle Classification Using an Imbalanced Dataset Based on a Single Magnetic Sensor.
Xu, Chang; Wang, Yingguan; Bao, Xinghe; Li, Fengrong
2018-05-24
This paper aims to improve the accuracy of automatic vehicle classifiers for imbalanced datasets. Classification is made through utilizing a single anisotropic magnetoresistive sensor, with the models of vehicles involved being classified into hatchbacks, sedans, buses, and multi-purpose vehicles (MPVs). Using time domain and frequency domain features in combination with three common classification algorithms in pattern recognition, we develop a novel feature extraction method for vehicle classification. These three common classification algorithms are the k-nearest neighbor, the support vector machine, and the back-propagation neural network. Nevertheless, a problem remains with the original vehicle magnetic dataset collected being imbalanced, and may lead to inaccurate classification results. With this in mind, we propose an approach called SMOTE, which can further boost the performance of classifiers. Experimental results show that the k-nearest neighbor (KNN) classifier with the SMOTE algorithm can reach a classification accuracy of 95.46%, thus minimizing the effect of the imbalance.
NASA Astrophysics Data System (ADS)
Rusdiana, Lili; Marfuah
2017-12-01
K-Nearest Neighbors method is one of methods used for classification which calculate a value to find out the closest in distance. It is used to group a set of data such as students’ graduation status that are got from the amount of course credits taken by them, the grade point average (AVG), and the mini-thesis grade. The study is conducted to know the results of using K-Nearest Neighbors method on the application of determining students’ graduation status, so it can be analyzed from the method used, the data, and the application constructed. The aim of this study is to find out the application results by using K-Nearest Neighbors concept to determine students’ graduation status using the data of STMIK Palangkaraya students. The development of the software used Extreme Programming, since it was appropriate and precise for this study which was to quickly finish the project. The application was created using Microsoft Office Excel 2007 for the training data and Matlab 7 to implement the application. The result of K-Nearest Neighbors method on the application of determining students’ graduation status was 92.5%. It could determine the predicate graduation of 94 data used from the initial data before the processing as many as 136 data which the maximal training data was 50data. The K-Nearest Neighbors method is one of methods used to group a set of data based on the closest value, so that using K-Nearest Neighbors method agreed with this study. The results of K-Nearest Neighbors method on the application of determining students’ graduation status was 92.5% could determine the predicate graduation which is the maximal training data. The K-Nearest Neighbors method is one of methods used to group a set of data based on the closest value, so that using K-Nearest Neighbors method agreed with this study.
Multidimensional k-nearest neighbor model based on EEMD for financial time series forecasting
NASA Astrophysics Data System (ADS)
Zhang, Ningning; Lin, Aijing; Shang, Pengjian
2017-07-01
In this paper, we propose a new two-stage methodology that combines the ensemble empirical mode decomposition (EEMD) with multidimensional k-nearest neighbor model (MKNN) in order to forecast the closing price and high price of the stocks simultaneously. The modified algorithm of k-nearest neighbors (KNN) has an increasingly wide application in the prediction of all fields. Empirical mode decomposition (EMD) decomposes a nonlinear and non-stationary signal into a series of intrinsic mode functions (IMFs), however, it cannot reveal characteristic information of the signal with much accuracy as a result of mode mixing. So ensemble empirical mode decomposition (EEMD), an improved method of EMD, is presented to resolve the weaknesses of EMD by adding white noise to the original data. With EEMD, the components with true physical meaning can be extracted from the time series. Utilizing the advantage of EEMD and MKNN, the new proposed ensemble empirical mode decomposition combined with multidimensional k-nearest neighbor model (EEMD-MKNN) has high predictive precision for short-term forecasting. Moreover, we extend this methodology to the case of two-dimensions to forecast the closing price and high price of the four stocks (NAS, S&P500, DJI and STI stock indices) at the same time. The results indicate that the proposed EEMD-MKNN model has a higher forecast precision than EMD-KNN, KNN method and ARIMA.
Meat and Fish Freshness Inspection System Based on Odor Sensing
Hasan, Najam ul; Ejaz, Naveed; Ejaz, Waleed; Kim, Hyung Seok
2012-01-01
We propose a method for building a simple electronic nose based on commercially available sensors used to sniff in the market and identify spoiled/contaminated meat stocked for sale in butcher shops. Using a metal oxide semiconductor-based electronic nose, we measured the smell signature from two of the most common meat foods (beef and fish) stored at room temperature. Food samples were divided into two groups: fresh beef with decayed fish and fresh fish with decayed beef. The prime objective was to identify the decayed item using the developed electronic nose. Additionally, we tested the electronic nose using three pattern classification algorithms (artificial neural network, support vector machine and k-nearest neighbor), and compared them based on accuracy, sensitivity, and specificity. The results demonstrate that the k-nearest neighbor algorithm has the highest accuracy. PMID:23202222
Finger vein identification using fuzzy-based k-nearest centroid neighbor classifier
NASA Astrophysics Data System (ADS)
Rosdi, Bakhtiar Affendi; Jaafar, Haryati; Ramli, Dzati Athiar
2015-02-01
In this paper, a new approach for personal identification using finger vein image is presented. Finger vein is an emerging type of biometrics that attracts attention of researchers in biometrics area. As compared to other biometric traits such as face, fingerprint and iris, finger vein is more secured and hard to counterfeit since the features are inside the human body. So far, most of the researchers focus on how to extract robust features from the captured vein images. Not much research was conducted on the classification of the extracted features. In this paper, a new classifier called fuzzy-based k-nearest centroid neighbor (FkNCN) is applied to classify the finger vein image. The proposed FkNCN employs a surrounding rule to obtain the k-nearest centroid neighbors based on the spatial distributions of the training images and their distance to the test image. Then, the fuzzy membership function is utilized to assign the test image to the class which is frequently represented by the k-nearest centroid neighbors. Experimental evaluation using our own database which was collected from 492 fingers shows that the proposed FkNCN has better performance than the k-nearest neighbor, k-nearest-centroid neighbor and fuzzy-based-k-nearest neighbor classifiers. This shows that the proposed classifier is able to identify the finger vein image effectively.
NASA Astrophysics Data System (ADS)
Hu, Yifan; Han, Hao; Zhu, Wei; Li, Lihong; Pickhardt, Perry J.; Liang, Zhengrong
2016-03-01
Feature classification plays an important role in differentiation or computer-aided diagnosis (CADx) of suspicious lesions. As a widely used ensemble learning algorithm for classification, random forest (RF) has a distinguished performance for CADx. Our recent study has shown that the location index (LI), which is derived from the well-known kNN (k nearest neighbor) and wkNN (weighted k nearest neighbor) classifier [1], has also a distinguished role in the classification for CADx. Therefore, in this paper, based on the property that the LI will achieve a very high accuracy, we design an algorithm to integrate the LI into RF for improved or higher value of AUC (area under the curve of receiver operating characteristics -- ROC). Experiments were performed by the use of a database of 153 lesions (polyps), including 116 neoplastic lesions and 37 hyperplastic lesions, with comparison to the existing classifiers of RF and wkNN, respectively. A noticeable gain by the proposed integrated classifier was quantified by the AUC measure.
NASA Astrophysics Data System (ADS)
He, Runnan; Wang, Kuanquan; Li, Qince; Yuan, Yongfeng; Zhao, Na; Liu, Yang; Zhang, Henggui
2017-12-01
Cardiovascular diseases are associated with high morbidity and mortality. However, it is still a challenge to diagnose them accurately and efficiently. Electrocardiogram (ECG), a bioelectrical signal of the heart, provides crucial information about the dynamical functions of the heart, playing an important role in cardiac diagnosis. As the QRS complex in ECG is associated with ventricular depolarization, therefore, accurate QRS detection is vital for interpreting ECG features. In this paper, we proposed a real-time, accurate, and effective algorithm for QRS detection. In the algorithm, a proposed preprocessor with a band-pass filter was first applied to remove baseline wander and power-line interference from the signal. After denoising, a method combining K-Nearest Neighbor (KNN) and Particle Swarm Optimization (PSO) was used for accurate QRS detection in ECGs with different morphologies. The proposed algorithm was tested and validated using 48 ECG records from MIT-BIH arrhythmia database (MITDB), achieved a high averaged detection accuracy, sensitivity and positive predictivity of 99.43, 99.69, and 99.72%, respectively, indicating a notable improvement to extant algorithms as reported in literatures.
GPU based cloud system for high-performance arrhythmia detection with parallel k-NN algorithm.
Tae Joon Jun; Hyun Ji Park; Hyuk Yoo; Young-Hak Kim; Daeyoung Kim
2016-08-01
In this paper, we propose an GPU based Cloud system for high-performance arrhythmia detection. Pan-Tompkins algorithm is used for QRS detection and we optimized beat classification algorithm with K-Nearest Neighbor (K-NN). To support high performance beat classification on the system, we parallelized beat classification algorithm with CUDA to execute the algorithm on virtualized GPU devices on the Cloud system. MIT-BIH Arrhythmia database is used for validation of the algorithm. The system achieved about 93.5% of detection rate which is comparable to previous researches while our algorithm shows 2.5 times faster execution time compared to CPU only detection algorithm.
Portable Language-Independent Adaptive Translation from OCR. Phase 1
2009-04-01
including brute-force k-Nearest Neighbors ( kNN ), fast approximate kNN using hashed k-d trees, classification and regression trees, and locality...achieved by refinements in ground-truthing protocols. Recent algorithmic improvements to our approximate kNN classifier using hashed k-D trees allows...recent years discriminative training has been shown to outperform phonetic HMMs estimated using ML for speech recognition. Standard ML estimation
Salari, Nader; Shohaimi, Shamarina; Najafi, Farid; Nallappan, Meenakshii; Karishnarajah, Isthrinayagy
2014-01-01
Among numerous artificial intelligence approaches, k-Nearest Neighbor algorithms, genetic algorithms, and artificial neural networks are considered as the most common and effective methods in classification problems in numerous studies. In the present study, the results of the implementation of a novel hybrid feature selection-classification model using the above mentioned methods are presented. The purpose is benefitting from the synergies obtained from combining these technologies for the development of classification models. Such a combination creates an opportunity to invest in the strength of each algorithm, and is an approach to make up for their deficiencies. To develop proposed model, with the aim of obtaining the best array of features, first, feature ranking techniques such as the Fisher's discriminant ratio and class separability criteria were used to prioritize features. Second, the obtained results that included arrays of the top-ranked features were used as the initial population of a genetic algorithm to produce optimum arrays of features. Third, using a modified k-Nearest Neighbor method as well as an improved method of backpropagation neural networks, the classification process was advanced based on optimum arrays of the features selected by genetic algorithms. The performance of the proposed model was compared with thirteen well-known classification models based on seven datasets. Furthermore, the statistical analysis was performed using the Friedman test followed by post-hoc tests. The experimental findings indicated that the novel proposed hybrid model resulted in significantly better classification performance compared with all 13 classification methods. Finally, the performance results of the proposed model was benchmarked against the best ones reported as the state-of-the-art classifiers in terms of classification accuracy for the same data sets. The substantial findings of the comprehensive comparative study revealed that performance of the proposed model in terms of classification accuracy is desirable, promising, and competitive to the existing state-of-the-art classification models. PMID:25419659
1991-12-01
user using the ’: knn ’ option in the do-scenario command line). An instance of the K-Nearest Neighbor object is first created and initialized before...Navigation Computer HF High Frequency ILS Instrument Landing System KNN K - Nearest Neighbor LRU Line Replaceable Unit MC Mission Computer MTCA...approaches have been investigated here, K-nearest Neighbors ( KNN ) and neural networks (NN). Both approaches require that previously classified examples of
DOE Office of Scientific and Technical Information (OSTI.GOV)
Majidpour, Mostafa; Qiu, Charlie; Chu, Peter
Three algorithms for the forecasting of energy consumption at individual EV charging outlets have been applied to real world data from the UCLA campus. Out of these three algorithms, namely k-Nearest Neighbor (kNN), ARIMA, and Pattern Sequence Forecasting (PSF), kNN with k=1, was the best and PSF was the worst performing algorithm with respect to the SMAPE measure. The advantage of PSF is its increased robustness to noise by substituting the real valued time series with an integer valued one, and the advantage of NN is having the least SMAPE for our data. We propose a Modified PSF algorithm (MPSF)more » which is a combination of PSF and NN; it could be interpreted as NN on integer valued data or as PSF with considering only the most recent neighbor to produce the output. Some other shortcomings of PSF are also addressed in the MPSF. Results show that MPSF has improved the forecast performance.« less
Three Dimensional Object Recognition Using a Complex Autoregressive Model
1993-12-01
3.4.2 Template Matching Algorithm ...................... 3-16 3.4.3 K-Nearest-Neighbor ( KNN ) Techniques ................. 3-25 3.4.4 Hidden Markov Model...Neighbor ( KNN ) Test Results ...................... 4-13 4.2.1 Single-Look 1-NN Testing .......................... 4-14 4.2.2 Multiple-Look 1-NN Testing...4-15 4.2.3 Discussion of KNN Test Results ...................... 4-15 4.3 Hidden Markov Model (HMM) Test Results
Ronald E. McRoberts
2009-01-01
Nearest neighbors techniques have been shown to be useful for predicting multiple forest attributes from forest inventory and Landsat satellite image data. However, in regions lacking good digital land cover information, nearest neighbors selected to predict continuous variables such as tree volume must be selected without regard to relevant categorical variables such...
An RFID Indoor Positioning Algorithm Based on Bayesian Probability and K-Nearest Neighbor.
Xu, He; Ding, Ye; Li, Peng; Wang, Ruchuan; Li, Yizhu
2017-08-05
The Global Positioning System (GPS) is widely used in outdoor environmental positioning. However, GPS cannot support indoor positioning because there is no signal for positioning in an indoor environment. Nowadays, there are many situations which require indoor positioning, such as searching for a book in a library, looking for luggage in an airport, emergence navigation for fire alarms, robot location, etc. Many technologies, such as ultrasonic, sensors, Bluetooth, WiFi, magnetic field, Radio Frequency Identification (RFID), etc., are used to perform indoor positioning. Compared with other technologies, RFID used in indoor positioning is more cost and energy efficient. The Traditional RFID indoor positioning algorithm LANDMARC utilizes a Received Signal Strength (RSS) indicator to track objects. However, the RSS value is easily affected by environmental noise and other interference. In this paper, our purpose is to reduce the location fluctuation and error caused by multipath and environmental interference in LANDMARC. We propose a novel indoor positioning algorithm based on Bayesian probability and K -Nearest Neighbor (BKNN). The experimental results show that the Gaussian filter can filter some abnormal RSS values. The proposed BKNN algorithm has the smallest location error compared with the Gaussian-based algorithm, LANDMARC and an improved KNN algorithm. The average error in location estimation is about 15 cm using our method.
A Proposed Methodology to Classify Frontier Capital Markets
2011-07-31
but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project involves basic...machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ), ensemble...Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F. Horizontal
A Proposed Methodology to Classify Frontier Capital Markets
2011-07-31
out of charity, but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project...identification, and machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ...Support Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F
Mei, Wenjuan; Zeng, Xianping; Yang, Chenglin; Zhou, Xiuyun
2017-01-01
The insulated gate bipolar transistor (IGBT) is a kind of excellent performance switching device used widely in power electronic systems. How to estimate the remaining useful life (RUL) of an IGBT to ensure the safety and reliability of the power electronics system is currently a challenging issue in the field of IGBT reliability. The aim of this paper is to develop a prognostic technique for estimating IGBTs’ RUL. There is a need for an efficient prognostic algorithm that is able to support in-situ decision-making. In this paper, a novel prediction model with a complete structure based on optimally pruned extreme learning machine (OPELM) and Volterra series is proposed to track the IGBT’s degradation trace and estimate its RUL; we refer to this model as Volterra k-nearest neighbor OPELM prediction (VKOPP) model. This model uses the minimum entropy rate method and Volterra series to reconstruct phase space for IGBTs’ ageing samples, and a new weight update algorithm, which can effectively reduce the influence of the outliers and noises, is utilized to establish the VKOPP network; then a combination of the k-nearest neighbor method (KNN) and least squares estimation (LSE) method is used to calculate the output weights of OPELM and predict the RUL of the IGBT. The prognostic results show that the proposed approach can predict the RUL of IGBT modules with small error and achieve higher prediction precision and lower time cost than some classic prediction approaches. PMID:29099811
On the classification techniques in data mining for microarray data classification
NASA Astrophysics Data System (ADS)
Aydadenta, Husna; Adiwijaya
2018-03-01
Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murtazaev, A. K.; Ramazanov, M. K., E-mail: sheikh77@mail.ru; Kassan-Ogly, F. A.
2015-01-15
Phase transitions in the antiferromagnetic Ising model on a body-centered cubic lattice are studied on the basis of the replica algorithm by the Monte Carlo method and histogram analysis taking into account the interaction of next-to-nearest neighbors. The phase diagram of the dependence of the critical temperature on the intensity of interaction of the next-to-nearest neighbors is constructed. It is found that a second-order phase transition is realized in this model in the investigated interval of the intensities of interaction of next-to-nearest neighbors.
A Coupled k-Nearest Neighbor Algorithm for Multi-Label Classification
2015-05-22
classification, an image may contain several concepts simultaneously, such as beach, sunset and kangaroo . Such tasks are usually denoted as multi-label...informatics, a gene can belong to both metabolism and transcription classes; and in music categorization, a song may labeled as Mozart and sad. In the
Electromagnetic Induction Spectroscopy for the Detection of Subsurface Targets
2012-12-01
curves of the proposed method and that of Fails et al.. For the kNN ROC curve, k = 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81...et al. [6] and Ramachandran et al. [7] both demonstrated success in detecting mines using the k-nearest-neighbor ( kNN ) algorithm based on the EMI...error is also included in the feature vector. The kNN labels an unknown target based on the closest targets in a training set. Collins et al. [2] and
An Improvement To The k-Nearest Neighbor Classifier For ECG Database
NASA Astrophysics Data System (ADS)
Jaafar, Haryati; Hidayah Ramli, Nur; Nasir, Aimi Salihah Abdul
2018-03-01
The k nearest neighbor (kNN) is a non-parametric classifier and has been widely used for pattern classification. However, in practice, the performance of kNN often tends to fail due to the lack of information on how the samples are distributed among them. Moreover, kNN is no longer optimal when the training samples are limited. Another problem observed in kNN is regarding the weighting issues in assigning the class label before classification. Thus, to solve these limitations, a new classifier called Mahalanobis fuzzy k-nearest centroid neighbor (MFkNCN) is proposed in this study. Here, a Mahalanobis distance is applied to avoid the imbalance of samples distribition. Then, a surrounding rule is employed to obtain the nearest centroid neighbor based on the distributions of training samples and its distance to the query point. Consequently, the fuzzy membership function is employed to assign the query point to the class label which is frequently represented by the nearest centroid neighbor Experimental studies from electrocardiogram (ECG) signal is applied in this study. The classification performances are evaluated in two experimental steps i.e. different values of k and different sizes of feature dimensions. Subsequently, a comparative study of kNN, kNCN, FkNN and MFkCNN classifier is conducted to evaluate the performances of the proposed classifier. The results show that the performance of MFkNCN consistently exceeds the kNN, kNCN and FkNN with the best classification rates of 96.5%.
A comparison of 12 algorithms for matching on the propensity score.
Austin, Peter C
2014-03-15
Propensity-score matching is increasingly being used to reduce the confounding that can occur in observational studies examining the effects of treatments or interventions on outcomes. We used Monte Carlo simulations to examine the following algorithms for forming matched pairs of treated and untreated subjects: optimal matching, greedy nearest neighbor matching without replacement, and greedy nearest neighbor matching without replacement within specified caliper widths. For each of the latter two algorithms, we examined four different sub-algorithms defined by the order in which treated subjects were selected for matching to an untreated subject: lowest to highest propensity score, highest to lowest propensity score, best match first, and random order. We also examined matching with replacement. We found that (i) nearest neighbor matching induced the same balance in baseline covariates as did optimal matching; (ii) when at least some of the covariates were continuous, caliper matching tended to induce balance on baseline covariates that was at least as good as the other algorithms; (iii) caliper matching tended to result in estimates of treatment effect with less bias compared with optimal and nearest neighbor matching; (iv) optimal and nearest neighbor matching resulted in estimates of treatment effect with negligibly less variability than did caliper matching; (v) caliper matching had amongst the best performance when assessed using mean squared error; (vi) the order in which treated subjects were selected for matching had at most a modest effect on estimation; and (vii) matching with replacement did not have superior performance compared with caliper matching without replacement. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.
A comparison of 12 algorithms for matching on the propensity score
Austin, Peter C
2014-01-01
Propensity-score matching is increasingly being used to reduce the confounding that can occur in observational studies examining the effects of treatments or interventions on outcomes. We used Monte Carlo simulations to examine the following algorithms for forming matched pairs of treated and untreated subjects: optimal matching, greedy nearest neighbor matching without replacement, and greedy nearest neighbor matching without replacement within specified caliper widths. For each of the latter two algorithms, we examined four different sub-algorithms defined by the order in which treated subjects were selected for matching to an untreated subject: lowest to highest propensity score, highest to lowest propensity score, best match first, and random order. We also examined matching with replacement. We found that (i) nearest neighbor matching induced the same balance in baseline covariates as did optimal matching; (ii) when at least some of the covariates were continuous, caliper matching tended to induce balance on baseline covariates that was at least as good as the other algorithms; (iii) caliper matching tended to result in estimates of treatment effect with less bias compared with optimal and nearest neighbor matching; (iv) optimal and nearest neighbor matching resulted in estimates of treatment effect with negligibly less variability than did caliper matching; (v) caliper matching had amongst the best performance when assessed using mean squared error; (vi) the order in which treated subjects were selected for matching had at most a modest effect on estimation; and (vii) matching with replacement did not have superior performance compared with caliper matching without replacement. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24123228
Nearest neighbors by neighborhood counting.
Wang, Hui
2006-06-01
Finding nearest neighbors is a general idea that underlies many artificial intelligence tasks, including machine learning, data mining, natural language understanding, and information retrieval. This idea is explicitly used in the k-nearest neighbors algorithm (kNN), a popular classification method. In this paper, this idea is adopted in the development of a general methodology, neighborhood counting, for devising similarity functions. We turn our focus from neighbors to neighborhoods, a region in the data space covering the data point in question. To measure the similarity between two data points, we consider all neighborhoods that cover both data points. We propose to use the number of such neighborhoods as a measure of similarity. Neighborhood can be defined for different types of data in different ways. Here, we consider one definition of neighborhood for multivariate data and derive a formula for such similarity, called neighborhood counting measure or NCM. NCM was tested experimentally in the framework of kNN. Experiments show that NCM is generally comparable to VDM and its variants, the state-of-the-art distance functions for multivariate data, and, at the same time, is consistently better for relatively large k values. Additionally, NCM consistently outperforms HEOM (a mixture of Euclidean and Hamming distances), the "standard" and most widely used distance function for multivariate data. NCM has a computational complexity in the same order as the standard Euclidean distance function and NCM is task independent and works for numerical and categorical data in a conceptually uniform way. The neighborhood counting methodology is proven sound for multivariate data experimentally. We hope it will work for other types of data.
Nation-Building Modeling and Resource Allocation Via Dynamic Programming
2014-09-01
Figure 2. RAND Study Models[59:98,115] (WMA) and used both the k-Nearest Neighbor ( KNN ) and Nearest Centroid (NC) algorithms to classify future features...The study found that KNN performed bet- ter than NC with 85% or greater accuracy in all test cases. The methodology was adopted for use under the...analysis feature of the model. 3.7.1 The No Surge Alternative. On the 10th of January 2007, President George W. Bush delivered a speech to the American
NASA Technical Reports Server (NTRS)
Salu, Yehuda; Tilton, James
1993-01-01
The classification of multispectral image data obtained from satellites has become an important tool for generating ground cover maps. This study deals with the application of nonparametric pixel-by-pixel classification methods in the classification of pixels, based on their multispectral data. A new neural network, the Binary Diamond, is introduced, and its performance is compared with a nearest neighbor algorithm and a back-propagation network. The Binary Diamond is a multilayer, feed-forward neural network, which learns from examples in unsupervised, 'one-shot' mode. It recruits its neurons according to the actual training set, as it learns. The comparisons of the algorithms were done by using a realistic data base, consisting of approximately 90,000 Landsat 4 Thematic Mapper pixels. The Binary Diamond and the nearest neighbor performances were close, with some advantages to the Binary Diamond. The performance of the back-propagation network lagged behind. An efficient nearest neighbor algorithm, the binned nearest neighbor, is described. Ways for improving the performances, such as merging categories, and analyzing nonboundary pixels, are addressed and evaluated.
Efficient Fingercode Classification
NASA Astrophysics Data System (ADS)
Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang
In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.
Geometry-based populated chessboard recognition
NASA Astrophysics Data System (ADS)
Xie, Youye; Tang, Gongguo; Hoff, William
2018-04-01
Chessboards are commonly used to calibrate cameras, and many robust methods have been developed to recognize the unoccupied boards. However, when the chessboard is populated with chess pieces, such as during an actual game, the problem of recognizing the board is much harder. Challenges include occlusion caused by the chess pieces, the presence of outlier lines and low viewing angles of the chessboard. In this paper, we present a novel approach to address the above challenges and recognize the chessboard. The Canny edge detector and Hough transform are used to capture all possible lines in the scene. The k-means clustering and a k-nearest-neighbors inspired algorithm are applied to cluster and reject the outlier lines based on their Euclidean distances to the nearest neighbors in a scaled Hough transform space. Finally, based on prior knowledge of the chessboard structure, a geometric constraint is used to find the correspondences between image lines and the lines on the chessboard through the homography transformation. The proposed algorithm works for a wide range of the operating angles and achieves high accuracy in experiments.
Sun, Xiyang; Miao, Jiacheng; Wang, You; Luo, Zhiyuan; Li, Guang
2017-01-01
An estimate on the reliability of prediction in the applications of electronic nose is essential, which has not been paid enough attention. An algorithm framework called conformal prediction is introduced in this work for discriminating different kinds of ginsengs with a home-made electronic nose instrument. Nonconformity measure based on k-nearest neighbors (KNN) is implemented separately as underlying algorithm of conformal prediction. In offline mode, the conformal predictor achieves a classification rate of 84.44% based on 1NN and 80.63% based on 3NN, which is better than that of simple KNN. In addition, it provides an estimate of reliability for each prediction. In online mode, the validity of predictions is guaranteed, which means that the error rate of region predictions never exceeds the significance level set by a user. The potential of this framework for detecting borderline examples and outliers in the application of E-nose is also investigated. The result shows that conformal prediction is a promising framework for the application of electronic nose to make predictions with reliability and validity. PMID:28805721
Tighe, Patrick J.; Harle, Christopher A.; Hurley, Robert W.; Aytug, Haldun; Boezaart, Andre P.; Fillingim, Roger B.
2015-01-01
Background Given their ability to process highly dimensional datasets with hundreds of variables, machine learning algorithms may offer one solution to the vexing challenge of predicting postoperative pain. Methods Here, we report on the application of machine learning algorithms to predict postoperative pain outcomes in a retrospective cohort of 8071 surgical patients using 796 clinical variables. Five algorithms were compared in terms of their ability to forecast moderate to severe postoperative pain: Least Absolute Shrinkage and Selection Operator (LASSO), gradient-boosted decision tree, support vector machine, neural network, and k-nearest neighbor, with logistic regression included for baseline comparison. Results In forecasting moderate to severe postoperative pain for postoperative day (POD) 1, the LASSO algorithm, using all 796 variables, had the highest accuracy with an area under the receiver-operating curve (ROC) of 0.704. Next, the gradient-boosted decision tree had an ROC of 0.665 and the k-nearest neighbor algorithm had an ROC of 0.643. For POD 3, the LASSO algorithm, using all variables, again had the highest accuracy, with an ROC of 0.727. Logistic regression had a lower ROC of 0.5 for predicting pain outcomes on POD 1 and 3. Conclusions Machine learning algorithms, when combined with complex and heterogeneous data from electronic medical record systems, can forecast acute postoperative pain outcomes with accuracies similar to methods that rely only on variables specifically collected for pain outcome prediction. PMID:26031220
Quantum realization of the nearest neighbor value interpolation method for INEQR
NASA Astrophysics Data System (ADS)
Zhou, RiGui; Hu, WenWen; Luo, GaoFeng; Liu, XingAo; Fan, Ping
2018-07-01
This paper presents the nearest neighbor value (NNV) interpolation algorithm for the improved novel enhanced quantum representation of digital images (INEQR). It is necessary to use interpolation in image scaling because there is an increase or a decrease in the number of pixels. The difference between the proposed scheme and nearest neighbor interpolation is that the concept applied, to estimate the missing pixel value, is guided by the nearest value rather than the distance. Firstly, a sequence of quantum operations is predefined, such as cyclic shift transformations and the basic arithmetic operations. Then, the feasibility of the nearest neighbor value interpolation method for quantum image of INEQR is proven using the previously designed quantum operations. Furthermore, quantum image scaling algorithm in the form of circuits of the NNV interpolation for INEQR is constructed for the first time. The merit of the proposed INEQR circuit lies in their low complexity, which is achieved by utilizing the unique properties of quantum superposition and entanglement. Finally, simulation-based experimental results involving different classical images and ratios (i.e., conventional or non-quantum) are simulated based on the classical computer's MATLAB 2014b software, which demonstrates that the proposed interpolation method has higher performances in terms of high resolution compared to the nearest neighbor and bilinear interpolation.
yaImpute: An R package for kNN imputation
Nicholas L. Crookston; Andrew O. Finley
2008-01-01
This article introduces yaImpute, an R package for nearest neighbor search and imputation. Although nearest neighbor imputation is used in a host of disciplines, the methods implemented in the yaImpute package are tailored to imputation-based forest attribute estimation and mapping. The impetus to writing the yaImpute is a growing interest in nearest neighbor...
NASA Astrophysics Data System (ADS)
Wang, Dong
2016-03-01
Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.
K-nearest neighbor imputation of forest inventory variables in New Hampshire
Andrew Lister; Michael Hoppus; Raymond L. Czaplewski
2005-01-01
The k-nearest neighbor (kNN) method was used to map stand volume for a mosaic of 4 Landsat scenes covering the state of New Hampshire. Data for gross cubic foot volume and trees per acre were summarized from USDA Forest Service Forest Inventory and Analysis (FIA) plots and used as training for kNN. Six bands of...
2012-03-01
WMA) and used both the k-Nearest Neighbor ( KNN ) and Nearest Centroid 27 (a) Coalition and Regional (b) Indigenous Figure 3. RAND Study Models[32:98,115...NC) algorithms to classify future features. The study found that KNN performed better than NC with 85% or greater accuracy in all test cases. The...the model. 4.2.1 No Surge. On the 10th of January 2007, President George W. Bush delivered a speech to the American Public outlining a new strategy in
Accelerating Families of Fuzzy K-Means Algorithms for Vector Quantization Codebook Design
Mata, Edson; Bandeira, Silvio; de Mattos Neto, Paulo; Lopes, Waslon; Madeiro, Francisco
2016-01-01
The performance of signal processing systems based on vector quantization depends on codebook design. In the image compression scenario, the quality of the reconstructed images depends on the codebooks used. In this paper, alternatives are proposed for accelerating families of fuzzy K-means algorithms for codebook design. The acceleration is obtained by reducing the number of iterations of the algorithms and applying efficient nearest neighbor search techniques. Simulation results concerning image vector quantization have shown that the acceleration obtained so far does not decrease the quality of the reconstructed images. Codebook design time savings up to about 40% are obtained by the accelerated versions with respect to the original versions of the algorithms. PMID:27886061
Accelerating Families of Fuzzy K-Means Algorithms for Vector Quantization Codebook Design.
Mata, Edson; Bandeira, Silvio; de Mattos Neto, Paulo; Lopes, Waslon; Madeiro, Francisco
2016-11-23
The performance of signal processing systems based on vector quantization depends on codebook design. In the image compression scenario, the quality of the reconstructed images depends on the codebooks used. In this paper, alternatives are proposed for accelerating families of fuzzy K-means algorithms for codebook design. The acceleration is obtained by reducing the number of iterations of the algorithms and applying efficient nearest neighbor search techniques. Simulation results concerning image vector quantization have shown that the acceleration obtained so far does not decrease the quality of the reconstructed images. Codebook design time savings up to about 40% are obtained by the accelerated versions with respect to the original versions of the algorithms.
Ronald E. McRoberts; Erkki O. Tomppo; Andrew O. Finley; Heikkinen Juha
2007-01-01
The k-Nearest Neighbor (k-NN) technique has become extremely popular for a variety of forest inventory mapping and estimation applications. Much of this popularity may be attributed to the non-parametric, multivariate features of the technique, its intuitiveness, and its ease of use. When used with satellite imagery and forest...
Jay M. Ver Hoef; Hailemariam Temesgen; Sergio Gómez
2013-01-01
Forest surveys provide critical information for many diverse interests. Data are often collected from samples, and from these samples, maps of resources and estimates of aerial totals or averages are required. In this paper, two approaches for mapping and estimating totals; the spatial linear model (SLM) and k-NN (k-Nearest Neighbor) are compared, theoretically,...
NASA Astrophysics Data System (ADS)
Purwanti, Endah; Calista, Evelyn
2017-05-01
Leukemia is a type of cancer which is caused by malignant neoplasms in leukocyte cells. Leukemia disease which can cause death quickly enough for the sufferer is a type of acute lymphocyte leukemia (ALL). In this study, we propose automatic detection of lymphocyte leukemia through classification of lymphocyte cell images obtained from peripheral blood smear single cell. There are two main objectives in this study. The first is to extract featuring cells. The second objective is to classify the lymphocyte cells into two classes, namely normal and abnormal lymphocytes. In conducting this study, we use combination of shape feature and histogram feature, and the classification algorithm is k-nearest Neighbour with k variation is 1, 3, 5, 7, 9, 11, 13, and 15. The best level of accuracy, sensitivity, and specificity in this study are 90%, 90%, and 90%, and they were obtained from combined features of area-perimeter-mean-standard deviation with k=7.
Li, Qingbo; Hao, Can; Kang, Xue; Zhang, Jialin; Sun, Xuejun; Wang, Wenbo; Zeng, Haishan
2017-11-27
Combining Fourier transform infrared spectroscopy (FTIR) with endoscopy, it is expected that noninvasive, rapid detection of colorectal cancer can be performed in vivo in the future. In this study, Fourier transform infrared spectra were collected from 88 endoscopic biopsy colorectal tissue samples (41 colitis and 47 cancers). A new method, viz., entropy weight local-hyperplane k-nearest-neighbor (EWHK), which is an improved version of K-local hyperplane distance nearest-neighbor (HKNN), is proposed for tissue classification. In order to avoid limiting high dimensions and small values of the nearest neighbor, the new EWHK method calculates feature weights based on information entropy. The average results of the random classification showed that the EWHK classifier for differentiating cancer from colitis samples produced a sensitivity of 81.38% and a specificity of 92.69%.
K-Nearest Neighbor Estimation of Forest Attributes: Improving Mapping Efficiency
Andrew O. Finley; Alan R. Ek; Yun Bai; Marvin E. Bauer
2005-01-01
This paper describes our efforts in refining k-nearest neighbor forest attributes classification using U.S. Department of Agriculture Forest Service Forest Inventory and Analysis plot data and Landsat 7 Enhanced Thematic Mapper Plus imagery. The analysis focuses on FIA-defined forest type classification across St. Louis County in northeastern Minnesota. We outline...
Probability machines: consistent probability estimation using nonparametric learning machines.
Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A
2012-01-01
Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.
Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi
2017-11-02
Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.
Nearest Neighbor Searching in Binary Search Trees: Simulation of a Multiprocessor System.
ERIC Educational Resources Information Center
Stewart, Mark; Willett, Peter
1987-01-01
Describes the simulation of a nearest neighbor searching algorithm for document retrieval using a pool of microprocessors. Three techniques are described which allow parallel searching of a binary search tree as well as a PASCAL-based system, PASSIM, which can simulate these techniques. Fifty-six references are provided. (Author/LRW)
Improving the accuracy of k-nearest neighbor using local mean based and distance weight
NASA Astrophysics Data System (ADS)
Syaliman, K. U.; Nababan, E. B.; Sitompul, O. S.
2018-03-01
In k-nearest neighbor (kNN), the determination of classes for new data is normally performed by a simple majority vote system, which may ignore the similarities among data, as well as allowing the occurrence of a double majority class that can lead to misclassification. In this research, we propose an approach to resolve the majority vote issues by calculating the distance weight using a combination of local mean based k-nearest neighbor (LMKNN) and distance weight k-nearest neighbor (DWKNN). The accuracy of results is compared to the accuracy acquired from the original k-NN method using several datasets from the UCI Machine Learning repository, Kaggle and Keel, such as ionosphare, iris, voice genre, lower back pain, and thyroid. In addition, the proposed method is also tested using real data from a public senior high school in city of Tualang, Indonesia. Results shows that the combination of LMKNN and DWKNN was able to increase the classification accuracy of kNN, whereby the average accuracy on test data is 2.45% with the highest increase in accuracy of 3.71% occurring on the lower back pain symptoms dataset. For the real data, the increase in accuracy is obtained as high as 5.16%.
Weighted Parzen Windows for Pattern Classification
1994-05-01
Nearest-Neighbor Rule The k-Nearest-Neighbor ( kNN ) technique is nonparametric, assuming nothing about the distribution of the data. Stated succinctly...probabilities P(wj I x) from samples." Raudys and Jain [20:255] advance this interpretation by pointing out that the kNN technique can be viewed as the...34Parzen window classifier with a hyper- rectangular window function." As with the Parzen-window technique, the kNN classifier is more accurate as the
Allen, Victoria W; Shirasu-Hiza, Mimi
2018-01-01
Despite being pervasive, the control of programmed grooming is poorly understood. We addressed this gap by developing a high-throughput platform that allows long-term detection of grooming in Drosophila melanogaster. In our method, a k-nearest neighbors algorithm automatically classifies fly behavior and finds grooming events with over 90% accuracy in diverse genotypes. Our data show that flies spend ~13% of their waking time grooming, driven largely by two major internal programs. One of these programs regulates the timing of grooming and involves the core circadian clock components cycle, clock, and period. The second program regulates the duration of grooming and, while dependent on cycle and clock, appears to be independent of period. This emerging dual control model in which one program controls timing and another controls duration, resembles the two-process regulatory model of sleep. Together, our quantitative approach presents the opportunity for further dissection of mechanisms controlling long-term grooming in Drosophila. PMID:29485401
Rotationally Invariant Image Representation for Viewing Direction Classification in Cryo-EM
Zhao, Zhizhen; Singer, Amit
2014-01-01
We introduce a new rotationally invariant viewing angle classification method for identifying, among a large number of cryo-EM projection images, similar views without prior knowledge of the molecule. Our rotationally invariant features are based on the bispectrum. Each image is denoised and compressed using steerable principal component analysis (PCA) such that rotating an image is equivalent to phase shifting the expansion coefficients. Thus we are able to extend the theory of bispectrum of 1D periodic signals to 2D images. The randomized PCA algorithm is then used to efficiently reduce the dimensionality of the bispectrum coefficients, enabling fast computation of the similarity between any pair of images. The nearest neighbors provide an initial classification of similar viewing angles. In this way, rotational alignment is only performed for images with their nearest neighbors. The initial nearest neighbor classification and alignment are further improved by a new classification method called vector diffusion maps. Our pipeline for viewing angle classification and alignment is experimentally shown to be faster and more accurate than reference-free alignment with rotationally invariant K-means clustering, MSA/MRA 2D classification, and their modern approximations. PMID:24631969
Novel approach for image skeleton and distance transformation parallel algorithms
NASA Astrophysics Data System (ADS)
Qing, Kent P.; Means, Robert W.
1994-05-01
Image Understanding is more important in medical imaging than ever, particularly where real-time automatic inspection, screening and classification systems are installed. Skeleton and distance transformations are among the common operations that extract useful information from binary images and aid in Image Understanding. The distance transformation describes the objects in an image by labeling every pixel in each object with the distance to its nearest boundary. The skeleton algorithm starts from the distance transformation and finds the set of pixels that have a locally maximum label. The distance algorithm has to scan the entire image several times depending on the object width. For each pixel, the algorithm must access the neighboring pixels and find the maximum distance from the nearest boundary. It is a computational and memory access intensive procedure. In this paper, we propose a novel parallel approach to the distance transform and skeleton algorithms using the latest VLSI high- speed convolutional chips such as HNC's ViP. The algorithm speed is dependent on the object's width and takes (k + [(k-1)/3]) * 7 milliseconds for a 512 X 512 image with k being the maximum distance of the largest object. All objects in the image will be skeletonized at the same time in parallel.
Michael J. Falkowski; Andrew T. Hudak; Nicholas L. Crookston; Paul E. Gessler; Edward H. Uebler; Alistair M. S. Smith
2010-01-01
Sustainable forest management requires timely, detailed forest inventory data across large areas, which is difficult to obtain via traditional forest inventory techniques. This study evaluated k-nearest neighbor imputation models incorporating LiDAR data to predict tree-level inventory data (individual tree height, diameter at breast height, and...
NASA Astrophysics Data System (ADS)
Wang, Yonggang; Li, Deng; Lu, Xiaoming; Cheng, Xinyi; Wang, Liwei
2014-10-01
Continuous crystal-based positron emission tomography (PET) detectors could be an ideal alternative for current high-resolution pixelated PET detectors if the issues of high performance γ interaction position estimation and its real-time implementation are solved. Unfortunately, existing position estimators are not very feasible for implementation on field-programmable gate array (FPGA). In this paper, we propose a new self-organizing map neural network-based nearest neighbor (SOM-NN) positioning scheme aiming not only at providing high performance, but also at being realistic for FPGA implementation. Benefitting from the SOM feature mapping mechanism, the large set of input reference events at each calibration position is approximated by a small set of prototypes, and the computation of the nearest neighbor searching for unknown events is largely reduced. Using our experimental data, the scheme was evaluated, optimized and compared with the smoothed k-NN method. The spatial resolutions of full-width-at-half-maximum (FWHM) of both methods averaged over the center axis of the detector were obtained as 1.87 ±0.17 mm and 1.92 ±0.09 mm, respectively. The test results show that the SOM-NN scheme has an equivalent positioning performance with the smoothed k-NN method, but the amount of computation is only about one-tenth of the smoothed k-NN method. In addition, the algorithm structure of the SOM-NN scheme is more feasible for implementation on FPGA. It has the potential to realize real-time position estimation on an FPGA with a high-event processing throughput.
Secure Nearest Neighbor Query on Crowd-Sensing Data
Cheng, Ke; Wang, Liangmin; Zhong, Hong
2016-01-01
Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes. PMID:27669253
Secure Nearest Neighbor Query on Crowd-Sensing Data.
Cheng, Ke; Wang, Liangmin; Zhong, Hong
2016-09-22
Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes.
Biclustering Learning of Trading Rules.
Huang, Qinghua; Wang, Ting; Tao, Dacheng; Li, Xuelong
2015-10-01
Technical analysis with numerous indicators and patterns has been regarded as important evidence for making trading decisions in financial markets. However, it is extremely difficult for investors to find useful trading rules based on numerous technical indicators. This paper innovatively proposes the use of biclustering mining to discover effective technical trading patterns that contain a combination of indicators from historical financial data series. This is the first attempt to use biclustering algorithm on trading data. The mined patterns are regarded as trading rules and can be classified as three trading actions (i.e., the buy, the sell, and no-action signals) with respect to the maximum support. A modified K nearest neighborhood ( K -NN) method is applied to classification of trading days in the testing period. The proposed method [called biclustering algorithm and the K nearest neighbor (BIC- K -NN)] was implemented on four historical datasets and the average performance was compared with the conventional buy-and-hold strategy and three previously reported intelligent trading systems. Experimental results demonstrate that the proposed trading system outperforms its counterparts and will be useful for investment in various financial markets.
Comparison of crisp and fuzzy character networks in handwritten word recognition
NASA Technical Reports Server (NTRS)
Gader, Paul; Mohamed, Magdi; Chiang, Jung-Hsien
1992-01-01
Experiments involving handwritten word recognition on words taken from images of handwritten address blocks from the United States Postal Service mailstream are described. The word recognition algorithm relies on the use of neural networks at the character level. The neural networks are trained using crisp and fuzzy desired outputs. The fuzzy outputs were defined using a fuzzy k-nearest neighbor algorithm. The crisp networks slightly outperformed the fuzzy networks at the character level but the fuzzy networks outperformed the crisp networks at the word level.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schreiner, S.; Paschal, C.B.; Galloway, R.L.
Four methods of producing maximum intensity projection (MIP) images were studied and compared. Three of the projection methods differ in the interpolation kernel used for ray tracing. The interpolation kernels include nearest neighbor interpolation, linear interpolation, and cubic convolution interpolation. The fourth projection method is a voxel projection method that is not explicitly a ray-tracing technique. The four algorithms` performance was evaluated using a computer-generated model of a vessel and using real MR angiography data. The evaluation centered around how well an algorithm transferred an object`s width to the projection plane. The voxel projection algorithm does not suffer from artifactsmore » associated with the nearest neighbor algorithm. Also, a speed-up in the calculation of the projection is seen with the voxel projection method. Linear interpolation dramatically improves the transfer of width information from the 3D MRA data set over both nearest neighbor and voxel projection methods. Even though the cubic convolution interpolation kernel is theoretically superior to the linear kernel, it did not project widths more accurately than linear interpolation. A possible advantage to the nearest neighbor interpolation is that the size of small vessels tends to be exaggerated in the projection plane, thereby increasing their visibility. The results confirm that the way in which an MIP image is constructed has a dramatic effect on information contained in the projection. The construction method must be chosen with the knowledge that the clinical information in the 2D projections in general will be different from that contained in the original 3D data volume. 27 refs., 16 figs., 2 tabs.« less
Error minimizing algorithms for nearest eighbor classifiers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Porter, Reid B; Hush, Don; Zimmer, G. Beate
2011-01-03
Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. Wemore » use the framework to investigate a new cost sensitive loss function that allows us to train a Nearest Neighbor type classifier for low false alarm rate applications. We report results on both synthetic data and real-world image data.« less
Minimum Expected Risk Estimation for Near-neighbor Classification
2006-04-01
We consider the problems of class probability estimation and classification when using near-neighbor classifiers, such as k-nearest neighbors ( kNN ...estimate for weighted kNN classifiers with different prior information, for a broad class of risk functions. Theory and simulations show how significant...the difference is compared to the standard maximum likelihood weighted kNN estimates. Comparisons are made with uniform weights, symmetric weights
Song, Wen; Ade, Carl; Broxterman, Ryan; Barstow, Thomas; Nelson, Thomas; Warren, Steve
2012-01-01
Accelerometer data provide useful information about subject activity in many different application scenarios. For this study, single-accelerometer data were acquired from subjects participating in field tests that mimic tasks that astronauts might encounter in reduced gravity environments. The primary goal of this effort was to apply classification algorithms that could identify these tasks based on features present in their corresponding accelerometer data, where the end goal is to establish methods to unobtrusively gauge subject well-being based on sensors that reside in their local environment. In this initial analysis, six different activities that involve leg movement are classified. The k-Nearest Neighbors (kNN) algorithm was found to be the most effective, with an overall classification success rate of 90.8%.
A collaborative filtering recommendation algorithm based on weighted SimRank and social trust
NASA Astrophysics Data System (ADS)
Su, Chang; Zhang, Butao
2017-05-01
Collaborative filtering is one of the most widely used recommendation technologies, but the data sparsity and cold start problem of collaborative filtering algorithms are difficult to solve effectively. In order to alleviate the problem of data sparsity in collaborative filtering algorithm, firstly, a weighted improved SimRank algorithm is proposed to compute the rating similarity between users in rating data set. The improved SimRank can find more nearest neighbors for target users according to the transmissibility of rating similarity. Then, we build trust network and introduce the calculation of trust degree in the trust relationship data set. Finally, we combine rating similarity and trust to build a comprehensive similarity in order to find more appropriate nearest neighbors for target user. Experimental results show that the algorithm proposed in this paper improves the recommendation precision of the Collaborative algorithm effectively.
Real-time Interpolation for True 3-Dimensional Ultrasound Image Volumes
Ji, Songbai; Roberts, David W.; Hartov, Alex; Paulsen, Keith D.
2013-01-01
We compared trilinear interpolation to voxel nearest neighbor and distance-weighted algorithms for fast and accurate processing of true 3-dimensional ultrasound (3DUS) image volumes. In this study, the computational efficiency and interpolation accuracy of the 3 methods were compared on the basis of a simulated 3DUS image volume, 34 clinical 3DUS image volumes from 5 patients, and 2 experimental phantom image volumes. We show that trilinear interpolation improves interpolation accuracy over both the voxel nearest neighbor and distance-weighted algorithms yet achieves real-time computational performance that is comparable to the voxel nearest neighbor algrorithm (1–2 orders of magnitude faster than the distance-weighted algorithm) as well as the fastest pixel-based algorithms for processing tracked 2-dimensional ultrasound images (0.035 seconds per 2-dimesional cross-sectional image [76,800 pixels interpolated, or 0.46 ms/1000 pixels] and 1.05 seconds per full volume with a 1-mm3 voxel size [4.6 million voxels interpolated, or 0.23 ms/1000 voxels]). On the basis of these results, trilinear interpolation is recommended as a fast and accurate interpolation method for rectilinear sampling of 3DUS image acquisitions, which is required to facilitate subsequent processing and display during operating room procedures such as image-guided neurosurgery. PMID:21266563
Real-time interpolation for true 3-dimensional ultrasound image volumes.
Ji, Songbai; Roberts, David W; Hartov, Alex; Paulsen, Keith D
2011-02-01
We compared trilinear interpolation to voxel nearest neighbor and distance-weighted algorithms for fast and accurate processing of true 3-dimensional ultrasound (3DUS) image volumes. In this study, the computational efficiency and interpolation accuracy of the 3 methods were compared on the basis of a simulated 3DUS image volume, 34 clinical 3DUS image volumes from 5 patients, and 2 experimental phantom image volumes. We show that trilinear interpolation improves interpolation accuracy over both the voxel nearest neighbor and distance-weighted algorithms yet achieves real-time computational performance that is comparable to the voxel nearest neighbor algrorithm (1-2 orders of magnitude faster than the distance-weighted algorithm) as well as the fastest pixel-based algorithms for processing tracked 2-dimensional ultrasound images (0.035 seconds per 2-dimesional cross-sectional image [76,800 pixels interpolated, or 0.46 ms/1000 pixels] and 1.05 seconds per full volume with a 1-mm(3) voxel size [4.6 million voxels interpolated, or 0.23 ms/1000 voxels]). On the basis of these results, trilinear interpolation is recommended as a fast and accurate interpolation method for rectilinear sampling of 3DUS image acquisitions, which is required to facilitate subsequent processing and display during operating room procedures such as image-guided neurosurgery.
NASA Astrophysics Data System (ADS)
Squiers, John J.; Li, Weizhi; King, Darlene R.; Mo, Weirong; Zhang, Xu; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.
2016-03-01
The clinical judgment of expert burn surgeons is currently the standard on which diagnostic and therapeutic decisionmaking regarding burn injuries is based. Multispectral imaging (MSI) has the potential to increase the accuracy of burn depth assessment and the intraoperative identification of viable wound bed during surgical debridement of burn injuries. A highly accurate classification model must be developed using machine-learning techniques in order to translate MSI data into clinically-relevant information. An animal burn model was developed to build an MSI training database and to study the burn tissue classification ability of several models trained via common machine-learning algorithms. The algorithms tested, from least to most complex, were: K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), weighted linear discriminant analysis (W-LDA), quadratic discriminant analysis (QDA), ensemble linear discriminant analysis (EN-LDA), ensemble K-nearest neighbors (EN-KNN), and ensemble decision tree (EN-DT). After the ground-truth database of six tissue types (healthy skin, wound bed, blood, hyperemia, partial injury, full injury) was generated by histopathological analysis, we used 10-fold cross validation to compare the algorithms' performances based on their accuracies in classifying data against the ground truth, and each algorithm was tested 100 times. The mean test accuracy of the algorithms were KNN 68.3%, DT 61.5%, LDA 70.5%, W-LDA 68.1%, QDA 68.9%, EN-LDA 56.8%, EN-KNN 49.7%, and EN-DT 36.5%. LDA had the highest test accuracy, reflecting the bias-variance tradeoff over the range of complexities inherent to the algorithms tested. Several algorithms were able to match the current standard in burn tissue classification, the clinical judgment of expert burn surgeons. These results will guide further development of an MSI burn tissue classification system. Given that there are few surgeons and facilities specializing in burn care, this technology may improve the standard of burn care for patients without access to specialized facilities.
Mapping growing stock volume and forest live biomass: a case study of the Polissya region of Ukraine
NASA Astrophysics Data System (ADS)
Bilous, Andrii; Myroniuk, Viktor; Holiaka, Dmytrii; Bilous, Svitlana; See, Linda; Schepaschenko, Dmitry
2017-10-01
Forest inventory and biomass mapping are important tasks that require inputs from multiple data sources. In this paper we implement two methods for the Ukrainian region of Polissya: random forest (RF) for tree species prediction and k-nearest neighbors (k-NN) for growing stock volume and biomass mapping. We examined the suitability of the five-band RapidEye satellite image to predict the distribution of six tree species. The accuracy of RF is quite high: ~99% for forest/non-forest mask and 89% for tree species prediction. Our results demonstrate that inclusion of elevation as a predictor variable in the RF model improved the performance of tree species classification. We evaluated different distance metrics for the k-NN method, including Euclidean or Mahalanobis distance, most similar neighbor (MSN), gradient nearest neighbor, and independent component analysis. The MSN with the four nearest neighbors (k = 4) is the most precise (according to the root-mean-square deviation) for predicting forest attributes across the study area. The k-NN method allowed us to estimate growing stock volume with an accuracy of 3 m3 ha-1 and for live biomass of about 2 t ha-1 over the study area.
Missing value imputation for gene expression data by tailored nearest neighbors.
Faisal, Shahla; Tutz, Gerhard
2017-04-25
High dimensional data like gene expression and RNA-sequences often contain missing values. The subsequent analysis and results based on these incomplete data can suffer strongly from the presence of these missing values. Several approaches to imputation of missing values in gene expression data have been developed but the task is difficult due to the high dimensionality (number of genes) of the data. Here an imputation procedure is proposed that uses weighted nearest neighbors. Instead of using nearest neighbors defined by a distance that includes all genes the distance is computed for genes that are apt to contribute to the accuracy of imputed values. The method aims at avoiding the curse of dimensionality, which typically occurs if local methods as nearest neighbors are applied in high dimensional settings. The proposed weighted nearest neighbors algorithm is compared to existing missing value imputation techniques like mean imputation, KNNimpute and the recently proposed imputation by random forests. We use RNA-sequence and microarray data from studies on human cancer to compare the performance of the methods. The results from simulations as well as real studies show that the weighted distance procedure can successfully handle missing values for high dimensional data structures where the number of predictors is larger than the number of samples. The method typically outperforms the considered competitors.
False-nearest-neighbors algorithm and noise-corrupted time series
NASA Astrophysics Data System (ADS)
Rhodes, Carl; Morari, Manfred
1997-05-01
The false-nearest-neighbors (FNN) algorithm was originally developed to determine the embedding dimension for autonomous time series. For noise-free computer-generated time series, the algorithm does a good job in predicting the embedding dimension. However, the problem of predicting the embedding dimension when the time-series data are corrupted by noise was not fully examined in the original studies of the FNN algorithm. Here it is shown that with large data sets, even small amounts of noise can lead to incorrect prediction of the embedding dimension. Surprisingly, as the length of the time series analyzed by FNN grows larger, the cause of incorrect prediction becomes more pronounced. An analysis of the effect of noise on the FNN algorithm and a solution for dealing with the effects of noise are given here. Some results on the theoretically correct choice of the FNN threshold are also presented.
Suratanee, Apichat; Plaimas, Kitiporn
2017-01-01
The associations between proteins and diseases are crucial information for investigating pathological mechanisms. However, the number of known and reliable protein-disease associations is quite small. In this study, an analysis framework to infer associations between proteins and diseases was developed based on a large data set of a human protein-protein interaction network integrating an effective network search, namely, the reverse k -nearest neighbor (R k NN) search. The R k NN search was used to identify an impact of a protein on other proteins. Then, associations between proteins and diseases were inferred statistically. The method using the R k NN search yielded a much higher precision than a random selection, standard nearest neighbor search, or when applying the method to a random protein-protein interaction network. All protein-disease pair candidates were verified by a literature search. Supporting evidence for 596 pairs was identified. In addition, cluster analysis of these candidates revealed 10 promising groups of diseases to be further investigated experimentally. This method can be used to identify novel associations to better understand complex relationships between proteins and diseases.
NASA Astrophysics Data System (ADS)
Wang, Yujie; Wang, Zhen; Wang, Yanli; Liu, Taigang; Zhang, Wenbing
2018-01-01
The thermodynamic and kinetic parameters of an RNA base pair with different nearest and next nearest neighbors were obtained through long-time molecular dynamics simulation of the opening-closing switch process of the base pair near its melting temperature. The results indicate that thermodynamic parameters of GC base pair are dependent on the nearest neighbor base pair, and the next nearest neighbor base pair has little effect, which validated the nearest-neighbor model. The closing and opening rates of the GC base pair also showed nearest neighbor dependences. At certain temperature, the closing and opening rates of the GC pair with nearest neighbor AU is larger than that with the nearest neighbor GC, and the next nearest neighbor plays little role. The free energy landscape of the GC base pair with the nearest neighbor GC is rougher than that with nearest neighbor AU.
Pivot methods for global optimization
NASA Astrophysics Data System (ADS)
Stanton, Aaron Fletcher
A new algorithm is presented for the location of the global minimum of a multiple minima problem. It begins with a series of randomly placed probes in phase space, and then uses an iterative redistribution of the worst probes into better regions of phase space until a chosen convergence criterion is fulfilled. The method quickly converges, does not require derivatives, and is resistant to becoming trapped in local minima. Comparison of this algorithm with others using a standard test suite demonstrates that the number of function calls has been decreased conservatively by a factor of about three with the same degrees of accuracy. Two major variations of the method are presented, differing primarily in the method of choosing the probes that act as the basis for the new probes. The first variation, termed the lowest energy pivot method, ranks all probes by their energy and keeps the best probes. The probes being discarded select from those being kept as the basis for the new cycle. In the second variation, the nearest neighbor pivot method, all probes are paired with their nearest neighbor. The member of each pair with the higher energy is relocated in the vicinity of its neighbor. Both methods are tested against a standard test suite of functions to determine their relative efficiency, and the nearest neighbor pivot method is found to be the more efficient. A series of Lennard-Jones clusters is optimized with the nearest neighbor method, and a scaling law is found for cpu time versus the number of particles in the system. The two methods are then compared more explicitly, and finally a study in the use of the pivot method for solving the Schroedinger equation is presented. The nearest neighbor method is found to be able to solve the ground state of the quantum harmonic oscillator from a pure random initialization of the wavefunction.
General formulation of long-range degree correlations in complex networks
NASA Astrophysics Data System (ADS)
Fujiki, Yuka; Takaguchi, Taro; Yakubo, Kousuke
2018-06-01
We provide a general framework for analyzing degree correlations between nodes separated by more than one step (i.e., beyond nearest neighbors) in complex networks. One joint and four conditional probability distributions are introduced to fully describe long-range degree correlations with respect to degrees k and k' of two nodes and shortest path length l between them. We present general relations among these probability distributions and clarify the relevance to nearest-neighbor degree correlations. Unlike nearest-neighbor correlations, some of these probability distributions are meaningful only in finite-size networks. Furthermore, as a baseline to determine the existence of intrinsic long-range degree correlations in a network other than inevitable correlations caused by the finite-size effect, the functional forms of these probability distributions for random networks are analytically evaluated within a mean-field approximation. The utility of our argument is demonstrated by applying it to real-world networks.
Biswas, Surama; Dutta, Subarna; Acharyya, Sriyankar
2017-12-01
Identifying a small subset of disease critical genes out of a large size of microarray gene expression data is a challenge in computational life sciences. This paper has applied four meta-heuristic algorithms, namely, honey bee mating optimization (HBMO), harmony search (HS), differential evolution (DE) and genetic algorithm (basic version GA) to find disease critical genes of preeclampsia which affects women during gestation. Two hybrid algorithms, namely, HBMO-kNN and HS-kNN have been newly proposed here where kNN (k nearest neighbor classifier) is used for sample classification. Performances of these new approaches have been compared with other two hybrid algorithms, namely, DE-kNN and SGA-kNN. Three datasets of different sizes have been used. In a dataset, the set of genes found common in the output of each algorithm is considered here as disease critical genes. In different datasets, the percentage of classification or classification accuracy of meta-heuristic algorithms varied between 92.46 and 100%. HBMO-kNN has the best performance (99.64-100%) in almost all data sets. DE-kNN secures the second position (99.42-100%). Disease critical genes obtained here match with clinically revealed preeclampsia genes to a large extent.
A Fast Robot Identification and Mapping Algorithm Based on Kinect Sensor.
Zhang, Liang; Shen, Peiyi; Zhu, Guangming; Wei, Wei; Song, Houbing
2015-08-14
Internet of Things (IoT) is driving innovation in an ever-growing set of application domains such as intelligent processing for autonomous robots. For an autonomous robot, one grand challenge is how to sense its surrounding environment effectively. The Simultaneous Localization and Mapping with RGB-D Kinect camera sensor on robot, called RGB-D SLAM, has been developed for this purpose but some technical challenges must be addressed. Firstly, the efficiency of the algorithm cannot satisfy real-time requirements; secondly, the accuracy of the algorithm is unacceptable. In order to address these challenges, this paper proposes a set of novel improvement methods as follows. Firstly, the ORiented Brief (ORB) method is used in feature detection and descriptor extraction. Secondly, a bidirectional Fast Library for Approximate Nearest Neighbors (FLANN) k-Nearest Neighbor (KNN) algorithm is applied to feature match. Then, the improved RANdom SAmple Consensus (RANSAC) estimation method is adopted in the motion transformation. In the meantime, high precision General Iterative Closest Points (GICP) is utilized to register a point cloud in the motion transformation optimization. To improve the accuracy of SLAM, the reduced dynamic covariance scaling (DCS) algorithm is formulated as a global optimization problem under the G2O framework. The effectiveness of the improved algorithm has been verified by testing on standard data and comparing with the ground truth obtained on Freiburg University's datasets. The Dr Robot X80 equipped with a Kinect camera is also applied in a building corridor to verify the correctness of the improved RGB-D SLAM algorithm. With the above experiments, it can be seen that the proposed algorithm achieves higher processing speed and better accuracy.
Optimal Detection Range of RFID Tag for RFID-based Positioning System Using the k-NN Algorithm.
Han, Soohee; Kim, Junghwan; Park, Choung-Hwan; Yoon, Hee-Cheon; Heo, Joon
2009-01-01
Positioning technology to track a moving object is an important and essential component of ubiquitous computing environments and applications. An RFID-based positioning system using the k-nearest neighbor (k-NN) algorithm can determine the position of a moving reader from observed reference data. In this study, the optimal detection range of an RFID-based positioning system was determined on the principle that tag spacing can be derived from the detection range. It was assumed that reference tags without signal strength information are regularly distributed in 1-, 2- and 3-dimensional spaces. The optimal detection range was determined, through analytical and numerical approaches, to be 125% of the tag-spacing distance in 1-dimensional space. Through numerical approaches, the range was 134% in 2-dimensional space, 143% in 3-dimensional space.
The nearest neighbor and the bayes error rates.
Loizou, G; Maybank, S J
1987-02-01
The (k, l) nearest neighbor method of pattern classification is compared to the Bayes method. If the two acceptance rates are equal then the asymptotic error rates satisfy the inequalities Ek,l + 1 ¿ E*(¿) ¿ Ek,l dE*(¿), where d is a function of k, l, and the number of pattern classes, and ¿ is the reject threshold for the Bayes method. An explicit expression for d is given which is optimal in the sense that for some probability distributions Ek,l and dE* (¿) are equal.
Jan, Shau-Shiun; Hsu, Li-Ta; Tsai, Wen-Ming
2010-01-01
In order to provide the seamless navigation and positioning services for indoor environments, an indoor location based service (LBS) test bed is developed to integrate the indoor positioning system and the indoor three-dimensional (3D) geographic information system (GIS). A wireless sensor network (WSN) is used in the developed indoor positioning system. Considering the power consumption, in this paper the ZigBee radio is used as the wireless protocol, and the received signal strength (RSS) fingerprinting positioning method is applied as the primary indoor positioning algorithm. The matching processes of the user location include the nearest neighbor (NN) algorithm, the K-weighted nearest neighbors (KWNN) algorithm, and the probabilistic approach. To enhance the positioning accuracy for the dynamic user, the particle filter is used to improve the positioning performance. As part of this research, a 3D indoor GIS is developed to be used with the indoor positioning system. This involved using the computer-aided design (CAD) software and the virtual reality markup language (VRML) to implement a prototype indoor LBS test bed. Thus, a rapid and practical procedure for constructing a 3D indoor GIS is proposed, and this GIS is easy to update and maintenance for users. The building of the Department of Aeronautics and Astronautics at National Cheng Kung University in Taiwan is used as an example to assess the performance of various algorithms for the indoor positioning system.
Jan, Shau-Shiun; Hsu, Li-Ta; Tsai, Wen-Ming
2010-01-01
In order to provide the seamless navigation and positioning services for indoor environments, an indoor location based service (LBS) test bed is developed to integrate the indoor positioning system and the indoor three-dimensional (3D) geographic information system (GIS). A wireless sensor network (WSN) is used in the developed indoor positioning system. Considering the power consumption, in this paper the ZigBee radio is used as the wireless protocol, and the received signal strength (RSS) fingerprinting positioning method is applied as the primary indoor positioning algorithm. The matching processes of the user location include the nearest neighbor (NN) algorithm, the K-weighted nearest neighbors (KWNN) algorithm, and the probabilistic approach. To enhance the positioning accuracy for the dynamic user, the particle filter is used to improve the positioning performance. As part of this research, a 3D indoor GIS is developed to be used with the indoor positioning system. This involved using the computer-aided design (CAD) software and the virtual reality markup language (VRML) to implement a prototype indoor LBS test bed. Thus, a rapid and practical procedure for constructing a 3D indoor GIS is proposed, and this GIS is easy to update and maintenance for users. The building of the Department of Aeronautics and Astronautics at National Cheng Kung University in Taiwan is used as an example to assess the performance of various algorithms for the indoor positioning system. PMID:22319282
[Galaxy/quasar classification based on nearest neighbor method].
Li, Xiang-Ru; Lu, Yu; Zhou, Jian-Ming; Wang, Yong-Jun
2011-09-01
With the wide application of high-quality CCD in celestial spectrum imagery and the implementation of many large sky survey programs (e. g., Sloan Digital Sky Survey (SDSS), Two-degree-Field Galaxy Redshift Survey (2dF), Spectroscopic Survey Telescope (SST), Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) program and Large Synoptic Survey Telescope (LSST) program, etc.), celestial observational data are coming into the world like torrential rain. Therefore, to utilize them effectively and fully, research on automated processing methods for celestial data is imperative. In the present work, we investigated how to recognizing galaxies and quasars from spectra based on nearest neighbor method. Galaxies and quasars are extragalactic objects, they are far away from earth, and their spectra are usually contaminated by various noise. Therefore, it is a typical problem to recognize these two types of spectra in automatic spectra classification. Furthermore, the utilized method, nearest neighbor, is one of the most typical, classic, mature algorithms in pattern recognition and data mining, and often is used as a benchmark in developing novel algorithm. For applicability in practice, it is shown that the recognition ratio of nearest neighbor method (NN) is comparable to the best results reported in the literature based on more complicated methods, and the superiority of NN is that this method does not need to be trained, which is useful in incremental learning and parallel computation in mass spectral data processing. In conclusion, the results in this work are helpful for studying galaxies and quasars spectra classification.
Multidimensional Optimization of Signal Space Distance Parameters in WLAN Positioning
Brković, Milenko; Simić, Mirjana
2014-01-01
Accurate indoor localization of mobile users is one of the challenging problems of the last decade. Besides delivering high speed Internet, Wireless Local Area Network (WLAN) can be used as an effective indoor positioning system, being competitive both in terms of accuracy and cost. Among the localization algorithms, nearest neighbor fingerprinting algorithms based on Received Signal Strength (RSS) parameter have been extensively studied as an inexpensive solution for delivering indoor Location Based Services (LBS). In this paper, we propose the optimization of the signal space distance parameters in order to improve precision of WLAN indoor positioning, based on nearest neighbor fingerprinting algorithms. Experiments in a real WLAN environment indicate that proposed optimization leads to substantial improvements of the localization accuracy. Our approach is conceptually simple, is easy to implement, and does not require any additional hardware. PMID:24757443
2007-02-28
Shah, D. Waagen, H. Schmitt, S. Bellofiore, A. Spanias, and D. Cochran, 32nd International Conference on Acoustics, Speech , and Signal Processing...Information Exploitation Office kNN k-Nearest Neighbor LEAN Laplacian Eigenmap Adaptive Neighbor LIP Linear Integer Programming ISP
Ronald E. McRoberts; Steen Magnussen; Erkki O. Tomppo; Gherardo Chirici
2011-01-01
Nearest neighbors techniques have been shown to be useful for estimating forest attributes, particularly when used with forest inventory and satellite image data. Published reports of positive results have been truly international in scope. However, for these techniques to be more useful, they must be able to contribute to scientific inference which, for sample-based...
a Gross Error Elimination Method for Point Cloud Data Based on Kd-Tree
NASA Astrophysics Data System (ADS)
Kang, Q.; Huang, G.; Yang, S.
2018-04-01
Point cloud data has been one type of widely used data sources in the field of remote sensing. Key steps of point cloud data's pro-processing focus on gross error elimination and quality control. Owing to the volume feature of point could data, existed gross error elimination methods need spend massive memory both in space and time. This paper employed a new method which based on Kd-tree algorithm to construct, k-nearest neighbor algorithm to search, settled appropriate threshold to determine with result turns out a judgement that whether target point is or not an outlier. Experimental results show that, our proposed algorithm will help to delete gross error in point cloud data and facilitate to decrease memory consumption, improve efficiency.
Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Jian; Hamidouche, Khaled; Zheng, Jie
2015-08-05
Machine Learning algorithms are benefiting from the continuous improvement of programming models, including MPI, MapReduce and PGAS. k-Nearest Neighbors (k-NN) algorithm is a widely used machine learning algorithm, applied to supervised learning tasks such as classification. Several parallel implementations of k-NN have been proposed in the literature and practice. However, on high-performance computing systems with high-speed interconnects, it is important to further accelerate existing designs of the k-NN algorithm through taking advantage of scalable programming models. To improve the performance of k-NN on large-scale environment with InfiniBand network, this paper proposes several alternative hybrid MPI+OpenSHMEM designs and performs a systemicmore » evaluation and analysis on typical workloads. The hybrid designs leverage the one-sided memory access to better overlap communication with computation than the existing pure MPI design, and propose better schemes for efficient buffer management. The implementation based on k-NN program from MaTEx with MVAPICH2-X (Unified MPI+PGAS Communication Runtime over InfiniBand) shows up to 9.0% time reduction for training KDD Cup 2010 workload over 512 cores, and 27.6% time reduction for small workload with balanced communication and computation. Experiments of running with varied number of cores show that our design can maintain good scalability.« less
Xia, Wenjun; Mita, Yoshio; Shibata, Tadashi
2016-05-01
Aiming at efficient data condensation and improving accuracy, this paper presents a hardware-friendly template reduction (TR) method for the nearest neighbor (NN) classifiers by introducing the concept of critical boundary vectors. A hardware system is also implemented to demonstrate the feasibility of using an field-programmable gate array (FPGA) to accelerate the proposed method. Initially, k -means centers are used as substitutes for the entire template set. Then, to enhance the classification performance, critical boundary vectors are selected by a novel learning algorithm, which is completed within a single iteration. Moreover, to remove noisy boundary vectors that can mislead the classification in a generalized manner, a global categorization scheme has been explored and applied to the algorithm. The global characterization automatically categorizes each classification problem and rapidly selects the boundary vectors according to the nature of the problem. Finally, only critical boundary vectors and k -means centers are used as the new template set for classification. Experimental results for 24 data sets show that the proposed algorithm can effectively reduce the number of template vectors for classification with a high learning speed. At the same time, it improves the accuracy by an average of 2.17% compared with the traditional NN classifiers and also shows greater accuracy than seven other TR methods. We have shown the feasibility of using a proof-of-concept FPGA system of 256 64-D vectors to accelerate the proposed method on hardware. At a 50-MHz clock frequency, the proposed system achieves a 3.86 times higher learning speed than on a 3.4-GHz PC, while consuming only 1% of the power of that used by the PC.
Learning Instance-Specific Predictive Models
Visweswaran, Shyam; Cooper, Gregory F.
2013-01-01
This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. PMID:25045325
NASA Astrophysics Data System (ADS)
Wei, B. G.; Huo, K. X.; Yao, Z. F.; Lou, J.; Li, X. Y.
2018-03-01
It is one of the difficult problems encountered in the research of condition maintenance technology of transformers to recognize partial discharge (PD) pattern. According to the main physical characteristics of PD, three models of oil-paper insulation defects were set up in laboratory to study the PD of transformers, and phase resolved partial discharge (PRPD) was constructed. By using least square method, the grey-scale images of PRPD were constructed and features of each grey-scale image were 28 box dimensions and 28 information dimensions. Affinity propagation algorithm based on manifold distance (AP-MD) for transformers PD pattern recognition was established, and the data of box dimension and information dimension were clustered based on AP-MD. Study shows that clustering result of AP-MD is better than the results of affinity propagation (AP), k-means and fuzzy c-means algorithm (FCM). By choosing different k values of k-nearest neighbor, we find clustering accuracy of AP-MD falls when k value is larger or smaller, and the optimal k value depends on sample size.
David M. Bell; Matthew J. Gregory; Janet L. Ohmann
2015-01-01
Imputation provides a useful method for mapping forest attributes across broad geographic areas based on field plot measurements and Landsat multi-spectral data, but the resulting map products may be of limited use without corresponding analyses of uncertainties in predictions. In the case of k-nearest neighbor (kNN) imputation with k = 1, such as the Gradient Nearest...
NASA Astrophysics Data System (ADS)
Bieniek, Maciej; Korkusiński, Marek; Szulakowska, Ludmiła; Potasz, Paweł; Ozfidan, Isil; Hawrylak, Paweł
2018-02-01
We present here the minimal tight-binding model for a single layer of transition metal dichalcogenides (TMDCs) MX 2(M , metal; X , chalcogen) which illuminates the physics and captures band nesting, massive Dirac fermions, and valley Landé and Zeeman magnetic field effects. TMDCs share the hexagonal lattice with graphene but their electronic bands require much more complex atomic orbitals. Using symmetry arguments, a minimal basis consisting of three metal d orbitals and three chalcogen dimer p orbitals is constructed. The tunneling matrix elements between nearest-neighbor metal and chalcogen orbitals are explicitly derived at K ,-K , and Γ points of the Brillouin zone. The nearest-neighbor tunneling matrix elements connect specific metal and sulfur orbitals yielding an effective 6 ×6 Hamiltonian giving correct composition of metal and chalcogen orbitals but not the direct gap at K points. The direct gap at K , correct masses, and conduction band minima at Q points responsible for band nesting are obtained by inclusion of next-neighbor Mo-Mo tunneling. The parameters of the next-nearest-neighbor model are successfully fitted to MX 2(M =Mo ; X =S ) density functional ab initio calculations of the highest valence and lowest conduction band dispersion along K -Γ line in the Brillouin zone. The effective two-band massive Dirac Hamiltonian for MoS2, Landé g factors, and valley Zeeman splitting are obtained.
Quantitative diagnosis of bladder cancer by morphometric analysis of HE images
NASA Astrophysics Data System (ADS)
Wu, Binlin; Nebylitsa, Samantha V.; Mukherjee, Sushmita; Jain, Manu
2015-02-01
In clinical practice, histopathological analysis of biopsied tissue is the main method for bladder cancer diagnosis and prognosis. The diagnosis is performed by a pathologist based on the morphological features in the image of a hematoxylin and eosin (HE) stained tissue sample. This manuscript proposes algorithms to perform morphometric analysis on the HE images, quantify the features in the images, and discriminate bladder cancers with different grades, i.e. high grade and low grade. The nuclei are separated from the background and other types of cells such as red blood cells (RBCs) and immune cells using manual outlining, color deconvolution and image segmentation. A mask of nuclei is generated for each image for quantitative morphometric analysis. The features of the nuclei in the mask image including size, shape, orientation, and their spatial distributions are measured. To quantify local clustering and alignment of nuclei, we propose a 1-nearest-neighbor (1-NN) algorithm which measures nearest neighbor distance and nearest neighbor parallelism. The global distributions of the features are measured using statistics of the proposed parameters. A linear support vector machine (SVM) algorithm is used to classify the high grade and low grade bladder cancers. The results show using a particular group of nuclei such as large ones, and combining multiple parameters can achieve better discrimination. This study shows the proposed approach can potentially help expedite pathological diagnosis by triaging potentially suspicious biopsies.
NASA Astrophysics Data System (ADS)
Wood, W. T.; Runyan, T. E.; Palmsten, M.; Dale, J.; Crawford, C.
2016-12-01
Natural Gas (primarily methane) and gas hydrate accumulations require certain bio-geochemical, as well as physical conditions, some of which are poorly sampled and/or poorly understood. We exploit recent advances in the prediction of seafloor porosity and heat flux via machine learning techniques (e.g. Random forests and Bayesian networks) to predict the occurrence of gas and subsequently gas hydrate in marine sediments. The prediction (actually guided interpolation) of key parameters we use in this study is a K-nearest neighbor technique. KNN requires only minimal pre-processing of the data and predictors, and requires minimal run-time input so the results are almost entirely data-driven. Specifically we use new estimates of sedimentation rate and sediment type, along with recently derived compaction modeling to estimate profiles of porosity and age. We combined the compaction with seafloor heat flux to estimate temperature with depth and geologic age, which, with estimates of organic carbon, and models of methanogenesis yield limits on the production of methane. Results include geospatial predictions of gas (and gas hydrate) accumulations, with quantitative estimates of uncertainty. The Generic Earth Modeling System (GEMS) we have developed to derive the machine learning estimates is modular and easily updated with new algorithms or data.
Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar
2016-04-01
Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data. Copyright © 2016 Elsevier Inc. All rights reserved.
Computer Simulation of Energy Parameters and Magnetic Effects in Fe-Si-C Ternary Alloys
NASA Astrophysics Data System (ADS)
Ridnyi, Ya. M.; Mirzoev, A. A.; Mirzaev, D. A.
2018-06-01
The paper presents ab initio simulation with the WIEN2k software package of the equilibrium structure and properties of silicon and carbon atoms dissolved in iron with the body-centered cubic crystal system of the lattice. Silicon and carbon atoms manifest a repulsive interaction in the first two nearest neighbors, in the second neighbor the repulsion being stronger than in the first. In the third and next-nearest neighbors a very weak repulsive interaction occurs and tends to zero with increasing distance between atoms. Silicon and carbon dissolution reduces the magnetic moment of iron atoms.
NASA Astrophysics Data System (ADS)
Moldovanu, Simona; Bibicu, Dorin; Moraru, Luminita; Nicolae, Mariana Carmen
2011-12-01
Co-occurrence matrix has been applied successfully for echographic images characterization because it contains information about spatial distribution of grey-scale levels in an image. The paper deals with the analysis of pixels in selected regions of interest of an US image of the liver. The useful information obtained refers to texture features such as entropy, contrast, dissimilarity and correlation extract with co-occurrence matrix. The analyzed US images were grouped in two distinct sets: healthy liver and steatosis (or fatty) liver. These two sets of echographic images of the liver build a database that includes only histological confirmed cases: 10 images of healthy liver and 10 images of steatosis liver. The healthy subjects help to compute four textural indices and as well as control dataset. We chose to study these diseases because the steatosis is the abnormal retention of lipids in cells. The texture features are statistical measures and they can be used to characterize irregularity of tissues. The goal is to extract the information using the Nearest Neighbor classification algorithm. The K-NN algorithm is a powerful tool to classify features textures by means of grouping in a training set using healthy liver, on the one hand, and in a holdout set using the features textures of steatosis liver, on the other hand. The results could be used to quantify the texture information and will allow a clear detection between health and steatosis liver.
Exploitation of RF-DNA for Device Classification and Verification Using GRLVQI Processing
2012-12-01
5 FLD Fisher’s Linear Discriminant . . . . . . . . . . . . . . . . . . . 6 kNN K-Nearest Neighbor...Neighbor ( kNN ), Support Vector Machine (SVM), and simple cross-correlation techniques [40, 57, 82, 88, 94, 95]. The RF-DNA fingerprinting research in...Expansion and the Dis- crete Gabor Transform on a Non-Separable Lattice”. 2000 IEEE Int’l Conf on Acoustics, Speech , and Signal Processing (ICASSP00
Galván-Tejada, Carlos E.; Zanella-Calzada, Laura A.; Galván-Tejada, Jorge I.; Celaya-Padilla, José M.; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L.
2017-01-01
Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions. PMID:28216571
Galván-Tejada, Carlos E; Zanella-Calzada, Laura A; Galván-Tejada, Jorge I; Celaya-Padilla, José M; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L
2017-02-14
Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions.
The advantages of the surface Laplacian in brain-computer interface research.
McFarland, Dennis J
2015-09-01
Brain-computer interface (BCI) systems frequently use signal processing methods, such as spatial filtering, to enhance performance. The surface Laplacian can reduce spatial noise and aid in identification of sources. In BCI research, these two functions of the surface Laplacian correspond to prediction accuracy and signal orthogonality. In the present study, an off-line analysis of data from a sensorimotor rhythm-based BCI task dissociated these functions of the surface Laplacian by comparing nearest-neighbor and next-nearest neighbor Laplacian algorithms. The nearest-neighbor Laplacian produced signals that were more orthogonal while the next-nearest Laplacian produced signals that resulted in better accuracy. Both prediction and signal identification are important for BCI research. Better prediction of user's intent produces increased speed and accuracy of communication and control. Signal identification is important for ruling out the possibility of control by artifacts. Identifying the nature of the control signal is relevant both to understanding exactly what is being studied and in terms of usability for individuals with limited motor control. Copyright © 2014 Elsevier B.V. All rights reserved.
Membership-degree preserving discriminant analysis with applications to face recognition.
Yang, Zhangjing; Liu, Chuancai; Huang, Pu; Qian, Jianjun
2013-01-01
In pattern recognition, feature extraction techniques have been widely employed to reduce the dimensionality of high-dimensional data. In this paper, we propose a novel feature extraction algorithm called membership-degree preserving discriminant analysis (MPDA) based on the fisher criterion and fuzzy set theory for face recognition. In the proposed algorithm, the membership degree of each sample to particular classes is firstly calculated by the fuzzy k-nearest neighbor (FKNN) algorithm to characterize the similarity between each sample and class centers, and then the membership degree is incorporated into the definition of the between-class scatter and the within-class scatter. The feature extraction criterion via maximizing the ratio of the between-class scatter to the within-class scatter is applied. Experimental results on the ORL, Yale, and FERET face databases demonstrate the effectiveness of the proposed algorithm.
Feature weight estimation for gene selection: a local hyperlinear learning approach
2014-01-01
Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071
Fast Exact Search in Hamming Space With Multi-Index Hashing.
Norouzi, Mohammad; Punjani, Ali; Fleet, David J
2014-06-01
There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used as such, as it was thought to be ineffective. We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. The approach is storage efficient and straight-forward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. Empirical results show dramatic speedups over a linear scan baseline for datasets of up to one billion codes of 64, 128, or 256 bits.
Lip reading using neural networks
NASA Astrophysics Data System (ADS)
Kalbande, Dhananjay; Mishra, Akassh A.; Patil, Sanjivani; Nirgudkar, Sneha; Patel, Prashant
2011-10-01
Computerized lip reading, or speech reading, is concerned with the difficult task of converting a video signal of a speaking person to written text. It has several applications like teaching deaf and dumb to speak and communicate effectively with the other people, its crime fighting potential and invariance to acoustic environment. We convert the video of the subject speaking vowels into images and then images are further selected manually for processing. However, several factors like fast speech, bad pronunciation, and poor illumination, movement of face, moustaches and beards make lip reading difficult. Contour tracking methods and Template matching are used for the extraction of lips from the face. K Nearest Neighbor algorithm is then used to classify the 'speaking' images and the 'silent' images. The sequence of images is then transformed into segments of utterances. Feature vector is calculated on each frame for all the segments and is stored in the database with properly labeled class. Character recognition is performed using modified KNN algorithm which assigns more weight to nearer neighbors. This paper reports the recognition of vowels using KNN algorithms
Neural Network and Nearest Neighbor Algorithms for Enhancing Sampling of Molecular Dynamics.
Galvelis, Raimondas; Sugita, Yuji
2017-06-13
The free energy calculations of complex chemical and biological systems with molecular dynamics (MD) are inefficient due to multiple local minima separated by high-energy barriers. The minima can be escaped using an enhanced sampling method such as metadynamics, which apply bias (i.e., importance sampling) along a set of collective variables (CV), but the maximum number of CVs (or dimensions) is severely limited. We propose a high-dimensional bias potential method (NN2B) based on two machine learning algorithms: the nearest neighbor density estimator (NNDE) and the artificial neural network (ANN) for the bias potential approximation. The bias potential is constructed iteratively from short biased MD simulations accounting for correlation among CVs. Our method is capable of achieving ergodic sampling and calculating free energy of polypeptides with up to 8-dimensional bias potential.
Enhanced Approximate Nearest Neighbor via Local Area Focused Search.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gonzales, Antonio; Blazier, Nicholas Paul
Approximate Nearest Neighbor (ANN) algorithms are increasingly important in machine learning, data mining, and image processing applications. There is a large family of space- partitioning ANN algorithms, such as randomized KD-Trees, that work well in practice but are limited by an exponential increase in similarity comparisons required to optimize recall. Additionally, they only support a small set of similarity metrics. We present Local Area Fo- cused Search (LAFS), a method that enhances the way queries are performed using an existing ANN index. Instead of a single query, LAFS performs a number of smaller (fewer similarity comparisons) queries and focuses onmore » a local neighborhood which is refined as candidates are identified. We show that our technique improves performance on several well known datasets and is easily extended to general similarity metrics using kernel projection techniques.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Badiev, M. K., E-mail: m-zagir@mail.ru; Murtazaev, A. K.; Ramazanov, M. K.
2016-10-15
The phase transitions (PTs) and critical properties of the antiferromagnetic Ising model on a layered (stacked) triangular lattice have been studied by the Monte Carlo method using a replica algorithm with allowance for the next-nearest-neighbor interactions. The character of PTs is analyzed using the histogram technique and the method of Binder cumulants. It is established that the transition from the disordered to paramagnetic phase in the adopted model is a second-order PT. Static critical exponents of the heat capacity (α), susceptibility (γ), order parameter (β), and correlation radius (ν) and the Fischer exponent η are calculated using the finite-size scalingmore » theory. It is shown that (i) the antiferromagnetic Ising model on a layered triangular lattice belongs to the XY universality class of critical behavior and (ii) allowance for the intralayer interactions of next-nearest neighbors in the adopted model leads to a change in the universality class of critical behavior.« less
A Vectorized ’Nearest-Neighbors’ Algorithm of Order N Using a Monotonic Logical Grid
1985-05-29
Computational Phy’sics 0 4 May 29 , 1985 This work was supported by the Office of Naval Research. . ~ Q~JUN 1719851- * NAVAL RESEARCH LABORATORY * lit...YE ’.ARK:NGS UNCLASSIFIED_______________ _____ K -. A R, ~ CA7,ON 4, 71CO1, 3 :)-S7R,9U-ON AdA,.A3:L ’Y OF REPOR7Io - EC..ASi.’ CA27 ON., DOWNGAZING...Year. Month. Day) 5 PAGE COUNT Interim FROM -____ Toi__ 1985 May 29 50 SuPPILSMENTARY NOTATION This work was supported by the Office of Naval
Credit scoring analysis using weighted k nearest neighbor
NASA Astrophysics Data System (ADS)
Mukid, M. A.; Widiharih, T.; Rusgiyono, A.; Prahutama, A.
2018-05-01
Credit scoring is a quatitative method to evaluate the credit risk of loan applications. Both statistical methods and artificial intelligence are often used by credit analysts to help them decide whether the applicants are worthy of credit. These methods aim to predict future behavior in terms of credit risk based on past experience of customers with similar characteristics. This paper reviews the weighted k nearest neighbor (WKNN) method for credit assessment by considering the use of some kernels. We use credit data from a private bank in Indonesia. The result shows that the Gaussian kernel and rectangular kernel have a better performance based on the value of percentage corrected classified whose value is 82.4% respectively.
Scalable Machine Learning for Massive Astronomical Datasets
NASA Astrophysics Data System (ADS)
Ball, Nicholas M.; Gray, A.
2014-04-01
We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.
Scalable Machine Learning for Massive Astronomical Datasets
NASA Astrophysics Data System (ADS)
Ball, Nicholas M.; Astronomy Data Centre, Canadian
2014-01-01
We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.
NASA Astrophysics Data System (ADS)
Huang, Jian; Liu, Gui-xiong
2016-09-01
The identification of targets varies in different surge tests. A multi-color space threshold segmentation and self-learning k-nearest neighbor algorithm ( k-NN) for equipment under test status identification was proposed after using feature matching to identify equipment status had to train new patterns every time before testing. First, color space (L*a*b*, hue saturation lightness (HSL), hue saturation value (HSV)) to segment was selected according to the high luminance points ratio and white luminance points ratio of the image. Second, the unknown class sample S r was classified by the k-NN algorithm with training set T z according to the feature vector, which was formed from number of pixels, eccentricity ratio, compactness ratio, and Euler's numbers. Last, while the classification confidence coefficient equaled k, made S r as one sample of pre-training set T z '. The training set T z increased to T z+1 by T z ' if T z ' was saturated. In nine series of illuminant, indicator light, screen, and disturbances samples (a total of 21600 frames), the algorithm had a 98.65%identification accuracy, also selected five groups of samples to enlarge the training set from T 0 to T 5 by itself.
Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search.
Liu, Xianglong; Deng, Cheng; Lang, Bo; Tao, Dacheng; Li, Xuelong
2016-02-01
Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare work studies the unified approach to constructing multiple informative hash tables using any type of hashing algorithms. Meanwhile, for multiple table search, it also lacks of a generic query-adaptive and fine-grained ranking scheme that can alleviate the binary quantization loss suffered in the standard hashing techniques. To solve the above problems, in this paper, we first regard the table construction as a selection problem over a set of candidate hash functions. With the graph representation of the function set, we propose an efficient solution that sequentially applies normalized dominant set to finding the most informative and independent hash functions for each table. To further reduce the redundancy between tables, we explore the reciprocal hash tables in a boosting manner, where the hash function graph is updated with high weights emphasized on the misclassified neighbor pairs of previous hash tables. To refine the ranking of the retrieved buckets within a certain Hamming radius from the query, we propose a query-adaptive bitwise weighting scheme to enable fine-grained bucket ranking in each hash table, exploiting the discriminative power of its hash functions and their complement for nearest neighbor search. Moreover, we integrate such scheme into the multiple table search using a fast, yet reciprocal table lookup algorithm within the adaptive weighted Hamming radius. In this paper, both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings. Our extensive experiments on several large-scale benchmarks demonstrate that the proposed techniques can significantly outperform both the naive construction methods and the state-of-the-art hashing algorithms.
McKinney, Brett A.; White, Bill C.; Grill, Diane E.; Li, Peter W.; Kennedy, Richard B.; Poland, Gregory A.; Oberg, Ann L.
2013-01-01
Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php. PMID:24339943
NASA Astrophysics Data System (ADS)
Gauthier, N.; Fennell, A.; Prévost, B.; Uldry, A.-C.; Delley, B.; Sibille, R.; Désilets-Benoit, A.; Dabkowska, H. A.; Nilsen, G. J.; Regnault, L.-P.; White, J. S.; Niedermayer, C.; Pomjakushin, V.; Bianchi, A. D.; Kenzelmann, M.
2017-04-01
Magnetic frustration and low dimensionality can prevent long-range magnetic order and lead to exotic correlated ground states. SrDy2O4 consists of magnetic Dy3 + ions forming magnetically frustrated zigzag chains along the c axis and shows no long-range order to temperatures as low as T =60 mK. We carried out neutron scattering and ac magnetic susceptibility measurements using powder and single crystals of SrDy2O4 . Diffuse neutron scattering indicates strong one-dimensional (1D) magnetic correlations along the chain direction that can be qualitatively accounted for by the axial next-nearest-neighbor Ising model with nearest-neighbor and next-nearest-neighbor exchange J1=0.3 meV and J2=0.2 meV, respectively. Three-dimensional (3D) correlations become important below T*≈0.7 K. At T =60 mK, the short-range correlations are characterized by a putative propagation vector k1 /2=(0 ,1/2 ,1/2 ) . We argue that the absence of long-range order arises from the presence of slowly decaying 1D domain walls that are trapped due to 3D correlations. This stabilizes a low-temperature phase without long-range magnetic order, but with well-ordered chain segments separated by slowly moving domain walls.
Simulating ensembles of source water quality using a K-nearest neighbor resampling approach.
Towler, Erin; Rajagopalan, Balaji; Seidel, Chad; Summers, R Scott
2009-03-01
Climatological, geological, and water management factors can cause significant variability in surface water quality. As drinking water quality standards become more stringent, the ability to quantify the variability of source water quality becomes more important for decision-making and planning in water treatment for regulatory compliance. However, paucity of long-term water quality data makes it challenging to apply traditional simulation techniques. To overcome this limitation, we have developed and applied a robust nonparametric K-nearest neighbor (K-nn) bootstrap approach utilizing the United States Environmental Protection Agency's Information Collection Rule (ICR) data. In this technique, first an appropriate "feature vector" is formed from the best available explanatory variables. The nearest neighbors to the feature vector are identified from the ICR data and are resampled using a weight function. Repetition of this results in water quality ensembles, and consequently the distribution and the quantification of the variability. The main strengths of the approach are its flexibility, simplicity, and the ability to use a large amount of spatial data with limited temporal extent to provide water quality ensembles for any given location. We demonstrate this approach by applying it to simulate monthly ensembles of total organic carbon for two utilities in the U.S. with very different watersheds and to alkalinity and bromide at two other U.S. utilities.
Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio
NASA Astrophysics Data System (ADS)
Nababan, A. A.; Sitompul, O. S.; Tulus
2018-04-01
K- Nearest Neighbor (KNN) is a good classifier, but from several studies, the result performance accuracy of KNN still lower than other methods. One of the causes of the low accuracy produced, because each attribute has the same effect on the classification process, while some less relevant characteristics lead to miss-classification of the class assignment for new data. In this research, we proposed Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio as a parameter to see the correlation between each attribute in the data and the Gain Ratio also will be used as the basis for weighting each attribute of the dataset. The accuracy of results is compared to the accuracy acquired from the original KNN method using 10-fold Cross-Validation with several datasets from the UCI Machine Learning repository and KEEL-Dataset Repository, such as abalone, glass identification, haberman, hayes-roth and water quality status. Based on the result of the test, the proposed method was able to increase the classification accuracy of KNN, where the highest difference of accuracy obtained hayes-roth dataset is worth 12.73%, and the lowest difference of accuracy obtained in the abalone dataset of 0.07%. The average result of the accuracy of all dataset increases the accuracy by 5.33%.
Vitola, Jaime; Pozo, Francesc; Tibaduiza, Diego A.; Anaya, Maribel
2017-01-01
Civil and military structures are susceptible and vulnerable to damage due to the environmental and operational conditions. Therefore, the implementation of technology to provide robust solutions in damage identification (by using signals acquired directly from the structure) is a requirement to reduce operational and maintenance costs. In this sense, the use of sensors permanently attached to the structures has demonstrated a great versatility and benefit since the inspection system can be automated. This automation is carried out with signal processing tasks with the aim of a pattern recognition analysis. This work presents the detailed description of a structural health monitoring (SHM) system based on the use of a piezoelectric (PZT) active system. The SHM system includes: (i) the use of a piezoelectric sensor network to excite the structure and collect the measured dynamic response, in several actuation phases; (ii) data organization; (iii) advanced signal processing techniques to define the feature vectors; and finally; (iv) the nearest neighbor algorithm as a machine learning approach to classify different kinds of damage. A description of the experimental setup, the experimental validation and a discussion of the results from two different structures are included and analyzed. PMID:28230796
Vitola, Jaime; Pozo, Francesc; Tibaduiza, Diego A; Anaya, Maribel
2017-02-21
Civil and military structures are susceptible and vulnerable to damage due to the environmental and operational conditions. Therefore, the implementation of technology to provide robust solutions in damage identification (by using signals acquired directly from the structure) is a requirement to reduce operational and maintenance costs. In this sense, the use of sensors permanently attached to the structures has demonstrated a great versatility and benefit since the inspection system can be automated. This automation is carried out with signal processing tasks with the aim of a pattern recognition analysis. This work presents the detailed description of a structural health monitoring (SHM) system based on the use of a piezoelectric (PZT) active system. The SHM system includes: (i) the use of a piezoelectric sensor network to excite the structure and collect the measured dynamic response, in several actuation phases; (ii) data organization; (iii) advanced signal processing techniques to define the feature vectors; and finally; (iv) the nearest neighbor algorithm as a machine learning approach to classify different kinds of damage. A description of the experimental setup, the experimental validation and a discussion of the results from two different structures are included and analyzed.
Diagnostic tools for nearest neighbors techniques when used with satellite imagery
Ronald E. McRoberts
2009-01-01
Nearest neighbors techniques are non-parametric approaches to multivariate prediction that are useful for predicting both continuous and categorical forest attribute variables. Although some assumptions underlying nearest neighbor techniques are common to other prediction techniques such as regression, other assumptions are unique to nearest neighbor techniques....
Yousef, Malik; Khalifa, Waleed; AbdAllah, Loai
2016-12-01
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that EC-kNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
Finite element computation on nearest neighbor connected machines
NASA Technical Reports Server (NTRS)
Mcaulay, A. D.
1984-01-01
Research aimed at faster, more cost effective parallel machines and algorithms for improving designer productivity with finite element computations is discussed. A set of 8 boards, containing 4 nearest neighbor connected arrays of commercially available floating point chips and substantial memory, are inserted into a commercially available machine. One-tenth Mflop (64 bit operation) processors provide an 89% efficiency when solving the equations arising in a finite element problem for a single variable regular grid of size 40 by 40 by 40. This is approximately 15 to 20 times faster than a much more expensive machine such as a VAX 11/780 used in double precision. The efficiency falls off as faster or more processors are envisaged because communication times become dominant. A novel successive overrelaxation algorithm which uses cyclic reduction in order to permit data transfer and computation to overlap in time is proposed.
Derrac, Joaquín; Triguero, Isaac; Garcia, Salvador; Herrera, Francisco
2012-10-01
Cooperative coevolution is a successful trend of evolutionary computation which allows us to define partitions of the domain of a given problem, or to integrate several related techniques into one, by the use of evolutionary algorithms. It is possible to apply it to the development of advanced classification methods, which integrate several machine learning techniques into a single proposal. A novel approach integrating instance selection, instance weighting, and feature weighting into the framework of a coevolutionary model is presented in this paper. We compare it with a wide range of evolutionary and nonevolutionary related methods, in order to show the benefits of the employment of coevolution to apply the techniques considered simultaneously. The results obtained, contrasted through nonparametric statistical tests, show that our proposal outperforms other methods in the comparison, thus becoming a suitable tool in the task of enhancing the nearest neighbor classifier.
An Efficient Statistical Computation Technique for Health Care Big Data using R
NASA Astrophysics Data System (ADS)
Sushma Rani, N.; Srinivasa Rao, P., Dr; Parimala, P.
2017-08-01
Due to the changes in living conditions and other factors many critical health related problems are arising. The diagnosis of the problem at earlier stages will increase the chances of survival and fast recovery. This reduces the time of recovery and the cost associated for the treatment. One such medical related issue is cancer and breast cancer has been identified as the second leading cause of cancer death. If detected in the early stage it can be cured. Once a patient is detected with breast cancer tumor, it should be classified whether it is cancerous or non-cancerous. So the paper uses k-nearest neighbors(KNN) algorithm which is one of the simplest machine learning algorithms and is an instance-based learning algorithm to classify the data. Day-to -day new records are added which leds to increase in the data to be classified and this tends to be big data problem. The algorithm is implemented in R whichis the most popular platform applied to machine learning algorithms for statistical computing. Experimentation is conducted by using various classification evaluation metric onvarious values of k. The results show that the KNN algorithm out performes better than existing models.
NASA Astrophysics Data System (ADS)
Toal, Siobhan; Schweitzer-Stenner, Reinhard; Rybka, Karin; Schwalbe, Hardol
2013-03-01
In order to enable structural predictions of intrinsically disordered proteins (IDPs) the intrinsic conformational propensities of amino acids must be complimented by information on nearest-neighbor interactions. To explore the influence of nearest-neighbors on conformational distributions, we preformed a joint vibrational (Infrared, Vibrational Circular Dichroism (VCD), polarized Raman) and 2D-NMR study of selected GxyG host-guest peptides: GDyG, GSyG, GxLG, GxVG, where x/y ={A,K,LV}. D and S (L and V) were chosen at the x (y) position due to their observance to drastically change the distribution of alanine in xAy tripeptide sequences in truncated coil libraries. The conformationally sensitive amide' profiles of the respective spectra were analyzed in terms of a statistical ensemble described as a superposition of 2D-Gaussian functions in Ramachandran space representing sub-ensembles of pPII-, β-strand-, helical-, and turn-like conformations. Our analysis and simulation of the amide I' band profiles exploits excitonic coupling between the local amide I' vibrational modes in the tetra-peptides. The resulting distributions reveal that D and S, which themselves have high propensities for turn-structures, strongly affect the conformational distribution of their downstream neighbor. Taken together, our results indicate that Dx and Sx motifs might act as conformational randomizers in proteins, attenuating intrinsic propensities of neighboring residues. Overall, our results show that nearest neighbor interactions contribute significantly to the Gibbs energy landscape of disordered peptides and proteins.
Ising lattices with +/-J second-nearest-neighbor interactions
NASA Astrophysics Data System (ADS)
Ramírez-Pastor, A. J.; Nieto, F.; Vogel, E. E.
1997-06-01
Second-nearest-neighbor interactions are added to the usual nearest-neighbor Ising Hamiltonian for square lattices in different ways. The starting point is a square lattice where half the nearest-neighbor interactions are ferromagnetic and the other half of the bonds are antiferromagnetic. Then, second-nearest-neighbor interactions can also be assigned randomly or in a variety of causal manners determined by the nearest-neighbor interactions. In the present paper we consider three causal and three random ways of assigning second-nearest-neighbor exchange interactions. Several ground-state properties are then calculated for each of these lattices:energy per bond ɛg, site correlation parameter pg, maximal magnetization μg, and fraction of unfrustrated bonds hg. A set of 500 samples is considered for each size N (number of spins) and array (way of distributing the N spins). The properties of the original lattices with only nearest-neighbor interactions are already known, which allows realizing the effect of the additional interactions. We also include cubic lattices to discuss the distinction between coordination number and dimensionality. Comparison with results for triangular and honeycomb lattices is done at specific points.
Statistical analysis for validating ACO-KNN algorithm as feature selection in sentiment analysis
NASA Astrophysics Data System (ADS)
Ahmad, Siti Rohaidah; Yusop, Nurhafizah Moziyana Mohd; Bakar, Azuraliza Abu; Yaakub, Mohd Ridzwan
2017-10-01
This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbor (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, IG-GA, and IG-RSAR algorithms. This study evaluated the performance of the ACO-KNN algorithm using precision, recall, and F-score, which were validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.
Yousef, Malik; Khalifa, Waleed; AbedAllah, Loai
2016-12-22
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that ECkNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
Noise reduction and image enhancement using a hardware implementation of artificial neural networks
NASA Astrophysics Data System (ADS)
David, Robert; Williams, Erin; de Tremiolles, Ghislain; Tannhof, Pascal
1999-03-01
In this paper, we present a neural based solution developed for noise reduction and image enhancement using the ZISC, an IBM hardware processor which implements the Restricted Coulomb Energy algorithm and the K-Nearest Neighbor algorithm. Artificial neural networks present the advantages of processing time reduction in comparison with classical models, adaptability, and the weighted property of pattern learning. The goal of the developed application is image enhancement in order to restore old movies (noise reduction, focus correction, etc.), to improve digital television images, or to treat images which require adaptive processing (medical images, spatial images, special effects, etc.). Image results show a quantitative improvement over the noisy image as well as the efficiency of this system. Further enhancements are being examined to improve the output of the system.
Constructing a logical, regular axis topology from an irregular topology
Faraj, Daniel A.
2014-07-22
Constructing a logical regular topology from an irregular topology including, for each axial dimension and recursively, for each compute node in a subcommunicator until returning to a first node: adding to a logical line of the axial dimension a neighbor specified in a nearest neighbor list; calling the added compute node; determining, by the called node, whether any neighbor in the node's nearest neighbor list is available to add to the logical line; if a neighbor in the called compute node's nearest neighbor list is available to add to the logical line, adding, by the called compute node to the logical line, any neighbor in the called compute node's nearest neighbor list for the axial dimension not already added to the logical line; and, if no neighbor in the called compute node's nearest neighbor list is available to add to the logical line, returning to the calling compute node.
Constructing a logical, regular axis topology from an irregular topology
Faraj, Daniel A.
2014-07-01
Constructing a logical regular topology from an irregular topology including, for each axial dimension and recursively, for each compute node in a subcommunicator until returning to a first node: adding to a logical line of the axial dimension a neighbor specified in a nearest neighbor list; calling the added compute node; determining, by the called node, whether any neighbor in the node's nearest neighbor list is available to add to the logical line; if a neighbor in the called compute node's nearest neighbor list is available to add to the logical line, adding, by the called compute node to the logical line, any neighbor in the called compute node's nearest neighbor list for the axial dimension not already added to the logical line; and, if no neighbor in the called compute node's nearest neighbor list is available to add to the logical line, returning to the calling compute node.
Nearest neighbor-density-based clustering methods for large hyperspectral images
NASA Astrophysics Data System (ADS)
Cariou, Claude; Chehdi, Kacem
2017-10-01
We address the problem of hyperspectral image (HSI) pixel partitioning using nearest neighbor - density-based (NN-DB) clustering methods. NN-DB methods are able to cluster objects without specifying the number of clusters to be found. Within the NN-DB approach, we focus on deterministic methods, e.g. ModeSeek, knnClust, and GWENN (standing for Graph WatershEd using Nearest Neighbors). These methods only require the availability of a k-nearest neighbor (kNN) graph based on a given distance metric. Recently, a new DB clustering method, called Density Peak Clustering (DPC), has received much attention, and kNN versions of it have quickly followed and showed their efficiency. However, NN-DB methods still suffer from the difficulty of obtaining the kNN graph due to the quadratic complexity with respect to the number of pixels. This is why GWENN was embedded into a multiresolution (MR) scheme to bypass the computation of the full kNN graph over the image pixels. In this communication, we propose to extent the MR-GWENN scheme on three aspects. Firstly, similarly to knnClust, the original labeling rule of GWENN is modified to account for local density values, in addition to the labels of previously processed objects. Secondly, we set up a modified NN search procedure within the MR scheme, in order to stabilize of the number of clusters found from the coarsest to the finest spatial resolution. Finally, we show that these extensions can be easily adapted to the three other NN-DB methods (ModeSeek, knnClust, knnDPC) for pixel clustering in large HSIs. Experiments are conducted to compare the four NN-DB methods for pixel clustering in HSIs. We show that NN-DB methods can outperform a classical clustering method such as fuzzy c-means (FCM), in terms of classification accuracy, relevance of found clusters, and clustering speed. Finally, we demonstrate the feasibility and evaluate the performances of NN-DB methods on a very large image acquired by our AISA Eagle hyperspectral imaging sensor.
GPU Accelerated Chemical Similarity Calculation for Compound Library Comparison
Ma, Chao; Wang, Lirong; Xie, Xiang-Qun
2012-01-01
Chemical similarity calculation plays an important role in compound library design, virtual screening, and “lead” optimization. In this manuscript, we present a novel GPU-accelerated algorithm for all-vs-all Tanimoto matrix calculation and nearest neighbor search. By taking advantage of multi-core GPU architecture and CUDA parallel programming technology, the algorithm is up to 39 times superior to the existing commercial software that runs on CPUs. Because of the utilization of intrinsic GPU instructions, this approach is nearly 10 times faster than existing GPU-accelerated sparse vector algorithm, when Unity fingerprints are used for Tanimoto calculation. The GPU program that implements this new method takes about 20 minutes to complete the calculation of Tanimoto coefficients between 32M PubChem compounds and 10K Active Probes compounds, i.e., 324G Tanimoto coefficients, on a 128-CUDA-core GPU. PMID:21692447
Analysis of the seismicity preceding large earthquakes
NASA Astrophysics Data System (ADS)
Stallone, Angela; Marzocchi, Warner
2017-04-01
The most common earthquake forecasting models assume that the magnitude of the next earthquake is independent from the past. This feature is probably one of the most severe limitations of the capability to forecast large earthquakes. In this work, we investigate empirically on this specific aspect, exploring whether variations in seismicity in the space-time-magnitude domain encode some information on the size of the future earthquakes. For this purpose, and to verify the stability of the findings, we consider seismic catalogs covering quite different space-time-magnitude windows, such as the Alto Tiberina Near Fault Observatory (TABOO) catalogue, the California and Japanese seismic catalog. Our method is inspired by the statistical methodology proposed by Baiesi & Paczuski (2004) and elaborated by Zaliapin et al. (2008) to distinguish between triggered and background earthquakes, based on a pairwise nearest-neighbor metric defined by properly rescaled temporal and spatial distances. We generalize the method to a metric based on the k-nearest-neighbors that allows us to consider the overall space-time-magnitude distribution of k-earthquakes, which are the strongly correlated ancestors of a target event. Finally, we analyze the statistical properties of the clusters composed by the target event and its k-nearest-neighbors. In essence, the main goal of this study is to verify if different classes of target event magnitudes are characterized by distinctive "k-foreshocks" distributions. The final step is to show how the findings of this work may (or not) improve the skill of existing earthquake forecasting models.
Analysis of the Seismicity Preceding Large Earthquakes
NASA Astrophysics Data System (ADS)
Stallone, A.; Marzocchi, W.
2016-12-01
The most common earthquake forecasting models assume that the magnitude of the next earthquake is independent from the past. This feature is probably one of the most severe limitations of the capability to forecast large earthquakes.In this work, we investigate empirically on this specific aspect, exploring whether spatial-temporal variations in seismicity encode some information on the magnitude of the future earthquakes. For this purpose, and to verify the universality of the findings, we consider seismic catalogs covering quite different space-time-magnitude windows, such as the Alto Tiberina Near Fault Observatory (TABOO) catalogue, and the California and Japanese seismic catalog. Our method is inspired by the statistical methodology proposed by Zaliapin (2013) to distinguish triggered and background earthquakes, using the nearest-neighbor clustering analysis in a two-dimension plan defined by rescaled time and space. In particular, we generalize the metric based on the nearest-neighbor to a metric based on the k-nearest-neighbors clustering analysis that allows us to consider the overall space-time-magnitude distribution of k-earthquakes (k-foreshocks) which anticipate one target event (the mainshock); then we analyze the statistical properties of the clusters identified in this rescaled space. In essence, the main goal of this study is to verify if different classes of mainshock magnitudes are characterized by distinctive k-foreshocks distribution. The final step is to show how the findings of this work may (or not) improve the skill of existing earthquake forecasting models.
Using K-Nearest Neighbor Classification to Diagnose Abnormal Lung Sounds
Chen, Chin-Hsing; Huang, Wen-Tzeng; Tan, Tan-Hsu; Chang, Cheng-Chun; Chang, Yuan-Jen
2015-01-01
A reported 30% of people worldwide have abnormal lung sounds, including crackles, rhonchi, and wheezes. To date, the traditional stethoscope remains the most popular tool used by physicians to diagnose such abnormal lung sounds, however, many problems arise with the use of a stethoscope, including the effects of environmental noise, the inability to record and store lung sounds for follow-up or tracking, and the physician’s subjective diagnostic experience. This study has developed a digital stethoscope to help physicians overcome these problems when diagnosing abnormal lung sounds. In this digital system, mel-frequency cepstral coefficients (MFCCs) were used to extract the features of lung sounds, and then the K-means algorithm was used for feature clustering, to reduce the amount of data for computation. Finally, the K-nearest neighbor method was used to classify the lung sounds. The proposed system can also be used for home care: if the percentage of abnormal lung sound frames is > 30% of the whole test signal, the system can automatically warn the user to visit a physician for diagnosis. We also used bend sensors together with an amplification circuit, Bluetooth, and a microcontroller to implement a respiration detector. The respiratory signal extracted by the bend sensors can be transmitted to the computer via Bluetooth to calculate the respiratory cycle, for real-time assessment. If an abnormal status is detected, the device will warn the user automatically. Experimental results indicated that the error in respiratory cycles between measured and actual values was only 6.8%, illustrating the potential of our detector for home care applications. PMID:26053756
Activity Recognition in Egocentric video using SVM, kNN and Combined SVMkNN Classifiers
NASA Astrophysics Data System (ADS)
Sanal Kumar, K. P.; Bhavani, R., Dr.
2017-08-01
Egocentric vision is a unique perspective in computer vision which is human centric. The recognition of egocentric actions is a challenging task which helps in assisting elderly people, disabled patients and so on. In this work, life logging activity videos are taken as input. There are 2 categories, first one is the top level and second one is second level. Here, the recognition is done using the features like Histogram of Oriented Gradients (HOG), Motion Boundary Histogram (MBH) and Trajectory. The features are fused together and it acts as a single feature. The extracted features are reduced using Principal Component Analysis (PCA). The features that are reduced are provided as input to the classifiers like Support Vector Machine (SVM), k nearest neighbor (kNN) and combined Support Vector Machine (SVM) and k Nearest Neighbor (kNN) (combined SVMkNN). These classifiers are evaluated and the combined SVMkNN provided better results than other classifiers in the literature.
Teodoro, Douglas; Lovis, Christian
2013-01-01
Background Antibiotic resistance is a major worldwide public health concern. In clinical settings, timely antibiotic resistance information is key for care providers as it allows appropriate targeted treatment or improved empirical treatment when the specific results of the patient are not yet available. Objective To improve antibiotic resistance trend analysis algorithms by building a novel, fully data-driven forecasting method from the combination of trend extraction and machine learning models for enhanced biosurveillance systems. Methods We investigate a robust model for extraction and forecasting of antibiotic resistance trends using a decade of microbiology data. Our method consists of breaking down the resistance time series into independent oscillatory components via the empirical mode decomposition technique. The resulting waveforms describing intrinsic resistance trends serve as the input for the forecasting algorithm. The algorithm applies the delay coordinate embedding theorem together with the k-nearest neighbor framework to project mappings from past events into the future dimension and estimate the resistance levels. Results The algorithms that decompose the resistance time series and filter out high frequency components showed statistically significant performance improvements in comparison with a benchmark random walk model. We present further qualitative use-cases of antibiotic resistance trend extraction, where empirical mode decomposition was applied to highlight the specificities of the resistance trends. Conclusion The decomposition of the raw signal was found not only to yield valuable insight into the resistance evolution, but also to produce novel models of resistance forecasters with boosted prediction performance, which could be utilized as a complementary method in the analysis of antibiotic resistance trends. PMID:23637796
Kruppa, Jochen; Liu, Yufeng; Biau, Gérard; Kohler, Michael; König, Inke R; Malley, James D; Ziegler, Andreas
2014-07-01
Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Nearest-neighbor Kitaev exchange blocked by charge order in electron-doped α -RuCl3
NASA Astrophysics Data System (ADS)
Koitzsch, A.; Habenicht, C.; Müller, E.; Knupfer, M.; Büchner, B.; Kretschmer, S.; Richter, M.; van den Brink, J.; Börrnert, F.; Nowak, D.; Isaeva, A.; Doert, Th.
2017-10-01
A quantum spin liquid might be realized in α -RuCl3 , a honeycomb-lattice magnetic material with substantial spin-orbit coupling. Moreover, α -RuCl3 is a Mott insulator, which implies the possibility that novel exotic phases occur upon doping. Here, we study the electronic structure of this material when intercalated with potassium by photoemission spectroscopy, electron energy loss spectroscopy, and density functional theory calculations. We obtain a stable stoichiometry at K0.5RuCl3 . This gives rise to a peculiar charge disproportionation into formally Ru2 + (4 d6 ) and Ru3 + (4 d5 ). Every Ru 4 d5 site with one hole in the t2 g shell is surrounded by nearest neighbors of 4 d6 character, where the t2 g level is full and magnetically inert. Thus, each type of Ru site forms a triangular lattice, and nearest-neighbor interactions of the original honeycomb are blocked.
Material identification based on electrostatic sensing technology
NASA Astrophysics Data System (ADS)
Liu, Kai; Chen, Xi; Li, Jingnan
2018-04-01
When the robot travels on the surface of different media, the uncertainty of the medium will seriously affect the autonomous action of the robot. In this paper, the distribution characteristics of multiple electrostatic charges on the surface of materials are detected, so as to improve the accuracy of the existing electrostatic signal material identification methods, which is of great significance to help the robot optimize the control algorithm. In this paper, based on the electrostatic signal material identification method proposed by predecessors, the multi-channel detection circuit is used to obtain the electrostatic charge distribution at different positions of the material surface, the weights are introduced into the eigenvalue matrix, and the weight distribution is optimized by the evolutionary algorithm, which makes the eigenvalue matrix more accurately reflect the surface charge distribution characteristics of the material. The matrix is used as the input of the k-Nearest Neighbor (kNN)classification algorithm to classify the dielectric materials. The experimental results show that the proposed method can significantly improve the recognition rate of the existing electrostatic signal material recognition methods.
NASA Astrophysics Data System (ADS)
Morizet, N.; Godin, N.; Tang, J.; Maillet, E.; Fregonese, M.; Normand, B.
2016-03-01
This paper aims to propose a novel approach to classify acoustic emission (AE) signals deriving from corrosion experiments, even if embedded into a noisy environment. To validate this new methodology, synthetic data are first used throughout an in-depth analysis, comparing Random Forests (RF) to the k-Nearest Neighbor (k-NN) algorithm. Moreover, a new evaluation tool called the alter-class matrix (ACM) is introduced to simulate different degrees of uncertainty on labeled data for supervised classification. Then, tests on real cases involving noise and crevice corrosion are conducted, by preprocessing the waveforms including wavelet denoising and extracting a rich set of features as input of the RF algorithm. To this end, a software called RF-CAM has been developed. Results show that this approach is very efficient on ground truth data and is also very promising on real data, especially for its reliability, performance and speed, which are serious criteria for the chemical industry.
Applied algorithm in the liner inspection of solid rocket motors
NASA Astrophysics Data System (ADS)
Hoffmann, Luiz Felipe Simões; Bizarria, Francisco Carlos Parquet; Bizarria, José Walter Parquet
2018-03-01
In rocket motors, the bonding between the solid propellant and thermal insulation is accomplished by a thin adhesive layer, known as liner. The liner application method involves a complex sequence of tasks, which includes in its final stage, the surface integrity inspection. Nowadays in Brazil, an expert carries out a thorough visual inspection to detect defects on the liner surface that may compromise the propellant interface bonding. Therefore, this paper proposes an algorithm that uses the photometric stereo technique and the K-nearest neighbor (KNN) classifier to assist the expert in the surface inspection. Photometric stereo allows the surface information recovery of the test images, while the KNN method enables image pixels classification into two classes: non-defect and defect. Tests performed on a computer vision based prototype validate the algorithm. The positive results suggest that the algorithm is feasible and when implemented in a real scenario, will be able to help the expert in detecting defective areas on the liner surface.
Unsupervised Indoor Localization Based on Smartphone Sensors, iBeacon and Wi-Fi.
Chen, Jing; Zhang, Yi; Xue, Wei
2018-04-28
In this paper, we propose UILoc, an unsupervised indoor localization scheme that uses a combination of smartphone sensors, iBeacons and Wi-Fi fingerprints for reliable and accurate indoor localization with zero labor cost. Firstly, compared with the fingerprint-based method, the UILoc system can build a fingerprint database automatically without any site survey and the database will be applied in the fingerprint localization algorithm. Secondly, since the initial position is vital to the system, UILoc will provide the basic location estimation through the pedestrian dead reckoning (PDR) method. To provide accurate initial localization, this paper proposes an initial localization module, a weighted fusion algorithm combined with a k-nearest neighbors (KNN) algorithm and a least squares algorithm. In UILoc, we have also designed a reliable model to reduce the landmark correction error. Experimental results show that the UILoc can provide accurate positioning, the average localization error is about 1.1 m in the steady state, and the maximum error is 2.77 m.
Tilbury, Karissa; Hocker, James; Wen, Bruce L.; Sandbo, Nathan; Singh, Vikas; Campagnola, Paul J.
2014-01-01
Abstract. Patients with idiopathic fibrosis (IPF) have poor long-term survival as there are limited diagnostic/prognostic tools or successful therapies. Remodeling of the extracellular matrix (ECM) has been implicated in IPF progression; however, the structural consequences on the collagen architecture have not received considerable attention. Here, we demonstrate that second harmonic generation (SHG) and multiphoton fluorescence microscopy can quantitatively differentiate normal and IPF human tissues. For SHG analysis, we developed a classifier based on wavelet transforms, principle component analysis, and a K-nearest-neighbor algorithm to classify the specific alterations of the collagen structure observed in IPF tissues. The resulting ROC curves obtained by varying the numbers of principal components and nearest neighbors yielded accuracies of >95%. In contrast, simpler metrics based on SHG intensity and collagen coverage in the image provided little or no discrimination. We also characterized the change in the elastin/collagen balance by simultaneously measuring the elastin autofluorescence and SHG intensities and found that the IPF tissues were less elastic relative to collagen. This is consistent with known mechanical consequences of the disease. Understanding ECM remodeling in IPF via nonlinear optical microscopy may enhance our ability to differentiate patients with rapid and slow progression and, thus, provide better prognostic information. PMID:25134793
Quantum Correlation in the XY Spin Model with Anisotropic Three-Site Interaction
NASA Astrophysics Data System (ADS)
Wang, Yao; Chai, Bing-Bing; Guo, Jin-Liang
2018-05-01
We investigate pairwise entanglement and quantum discord (QD) in the XY spin model with anisotropic three-site interaction at zero and finite temperatures. For both the nearest-neighbor spins and the next nearest-neighbor spins, special attention is paid to the dependence of entanglement and QD on the anisotropic parameter δ induced by the next nearest-neighbor spins. We show that the behavior of QD differs in many ways from entanglement under the influences of the anisotropic three-site interaction at finite temperatures. More important, comparing the effects of δ on the entanglement and QD, we find the anisotropic three-site interaction plays an important role in the quantum correlations at zero and finite temperatures. It is found that δ can strengthen the quantum correlation for both the nearest-neighbor spins and the next nearest-neighbor spins, especially for the nearest-neighbor spins at low temperature.
Performing a scatterv operation on a hierarchical tree network optimized for collective operations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D
Performing a scatterv operation on a hierarchical tree network optimized for collective operations including receiving, by the scatterv module installed on the node, from a nearest neighbor parent above the node a chunk of data having at least a portion of data for the node; maintaining, by the scatterv module installed on the node, the portion of the data for the node; determining, by the scatterv module installed on the node, whether any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child; andmore » sending, by the scatterv module installed on the node, those portions of data to the nearest neighbor child if any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child.« less
NASA Astrophysics Data System (ADS)
Zhang, Xiaoli; Zhang, Guoren; Jia, Ting; Zeng, Zhi; Lin, H. Q.
2016-05-01
We study the abnormal ferromagnetism in α-K2AgF4, which is very similar to high-TC parent material La2CuO4 in structure. We find out that the electron correlation is very important in determining the insulating property of α-K2AgF4. The Ag(II) 4d9 in the octahedron crystal field has the t2 g 6 eg 3 electron occupation with eg x2-y2 orbital fully occupied and 3z2-r2 orbital partially occupied. The two eg orbitals are very extended indicating both of them are active in superexchange. Using the Hubbard model combined with Nth-order muffin-tin orbital (NMTO) downfolding technique, it is concluded that the exchange interaction between eg 3z2-r2 and x2-y2 from the first nearest neighbor Ag ions leads to the anomalous ferromagnetism in α-K2AgF4.
Ground State of Quasi-One Dimensional Competing Spin Chain Cs2Cu2Mo3O12 at zero and Finite Fields
NASA Astrophysics Data System (ADS)
Matsui, Kazuki; Goto, Takayuki; Angel, Julia; Watanabe, Isao; Sasaki, Takahiko; Hase, Masashi
The ground state of competing-spin-chain Cs2Cu2Mo3O12 with the ferromagnetic exchange interaction J1 = -93 K on nearest-neighboring spins and the antiferromagnetic one J2 = +33 K on next-nearest-neighboring spins was investigated by ZF/LF-μSR and 133Cs-NMR in the 3He temperature range. The zero-field μSR relaxation rate λ shows a significant increase below 1.85 K, suggesting the existence of magnetic order, which is consistent with the recent report on the specific heat. However, LF decoupling data at the lowest temperature 0.3 K indicate that the spins fluctuate dynamically, suggesting that the system is in a quasi-static ordered state under zero field. This idea is further supported by the fact that the broadening in NMR spectra below TN is weakened at low field below 2 T.
NASA Astrophysics Data System (ADS)
Blel, Sonia; Hamouda, Ajmi BH.; Mahjoub, B.; Einstein, T. L.
2017-02-01
In this paper we explore the meandering instability of vicinal steps with a kinetic Monte Carlo simulations (kMC) model including the attractive next-nearest-neighbor (NNN) interactions. kMC simulations show that increase of the NNN interaction strength leads to considerable reduction of the meandering wavelength and to weaker dependence of the wavelength on the deposition rate F. The dependences of the meandering wavelength on the temperature and the deposition rate obtained with simulations are in good quantitative agreement with the experimental result on the meandering instability of Cu(0 2 24) [T. Maroutian et al., Phys. Rev. B 64, 165401 (2001), 10.1103/PhysRevB.64.165401]. The effective step stiffness is found to depend not only on the strength of NNN interactions and the Ehrlich-Schwoebel barrier, but also on F. We argue that attractive NNN interactions intensify the incorporation of adatoms at step edges and enhance step roughening. Competition between NNN and nearest-neighbor interactions results in an alternative form of meandering instability which we call "roughening-limited" growth, rather than attachment-detachment-limited growth that governs the Bales-Zangwill instability. The computed effective wavelength and the effective stiffness behave as λeff˜F-q and β˜eff˜F-p , respectively, with q ≈p /2 .
Automatic tissue characterization from ultrasound imagery
NASA Astrophysics Data System (ADS)
Kadah, Yasser M.; Farag, Aly A.; Youssef, Abou-Bakr M.; Badawi, Ahmed M.
1993-08-01
In this work, feature extraction algorithms are proposed to extract the tissue characterization parameters from liver images. Then the resulting parameter set is further processed to obtain the minimum number of parameters representing the most discriminating pattern space for classification. This preprocessing step was applied to over 120 pathology-investigated cases to obtain the learning data for designing the classifier. The extracted features are divided into independent training and test sets and are used to construct both statistical and neural classifiers. The optimal criteria for these classifiers are set to have minimum error, ease of implementation and learning, and the flexibility for future modifications. Various algorithms for implementing various classification techniques are presented and tested on the data. The best performance was obtained using a single layer tensor model functional link network. Also, the voting k-nearest neighbor classifier provided comparably good diagnostic rates.
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets.
Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O; Gelfand, Alan E
2016-01-01
Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online.
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets
Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O.; Gelfand, Alan E.
2018-01-01
Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online. PMID:29720777
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fournier, Sean Donovan; Beall, Patrick S; Miller, Mark L
2014-08-01
Through the SNL New Mexico Small Business Assistance (NMSBA) program, several Sandia engineers worked with the Environmental Restoration Group (ERG) Inc. to verify and validate a novel algorithm used to determine the scanning Critical Level (L c ) and Minimum Detectable Concentration (MDC) (or Minimum Detectable Areal Activity) for the 102F scanning system. Through the use of Monte Carlo statistical simulations the algorithm mathematically demonstrates accuracy in determining the L c and MDC when a nearest-neighbor averaging (NNA) technique was used. To empirically validate this approach, SNL prepared several spiked sources and ran a test with the ERG 102F instrumentmore » on a bare concrete floor known to have no radiological contamination other than background naturally occurring radioactive material (NORM). The tests conclude that the NNA technique increases the sensitivity (decreases the L c and MDC) for high-density data maps that are obtained by scanning radiological survey instruments.« less
Realization of the axial next-nearest-neighbor Ising model in U 3 Al 2 Ge 3
Fobes, David M.; Lin, Shi-Zeng; Ghimire, Nirmal J.; ...
2017-11-09
Inmore » this paper, we report small-angle neutron scattering (SANS) measurements and theoretical modeling of U 3 Al 2 Ge 3 . Analysis of the SANS data reveals a phase transition to sinusoidally modulated magnetic order at T N = 63 K to be second order and a first-order phase transition to ferromagnetic order at T c = 48 K. Within the sinusoidally modulated magnetic phase (T c < T < T N), we uncover a dramatic change, by a factor of 3, in the ordering wave vector as a function of temperature. Finally, these observations all indicate that U 3 Al 2 Ge 3 is a close realization of the three-dimensional axial next-nearest-neighbor Ising model, a prototypical framework for describing commensurate to incommensurate phase transitions in frustrated magnets.« less
Realization of the axial next-nearest-neighbor Ising model in U 3 Al 2 Ge 3
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fobes, David M.; Lin, Shi-Zeng; Ghimire, Nirmal J.
Inmore » this paper, we report small-angle neutron scattering (SANS) measurements and theoretical modeling of U 3 Al 2 Ge 3 . Analysis of the SANS data reveals a phase transition to sinusoidally modulated magnetic order at T N = 63 K to be second order and a first-order phase transition to ferromagnetic order at T c = 48 K. Within the sinusoidally modulated magnetic phase (T c < T < T N), we uncover a dramatic change, by a factor of 3, in the ordering wave vector as a function of temperature. Finally, these observations all indicate that U 3 Al 2 Ge 3 is a close realization of the three-dimensional axial next-nearest-neighbor Ising model, a prototypical framework for describing commensurate to incommensurate phase transitions in frustrated magnets.« less
Categorizing document by fuzzy C-Means and K-nearest neighbors approach
NASA Astrophysics Data System (ADS)
Priandini, Novita; Zaman, Badrus; Purwanti, Endah
2017-08-01
Increasing of technology had made categorizing documents become important. It caused by increasing of number of documents itself. Managing some documents by categorizing is one of Information Retrieval application, because it involve text mining on its process. Whereas, categorization technique could be done both Fuzzy C-Means (FCM) and K-Nearest Neighbors (KNN) method. This experiment would consolidate both methods. The aim of the experiment is increasing performance of document categorize. First, FCM is in order to clustering training documents. Second, KNN is in order to categorize testing document until the output of categorization is shown. Result of the experiment is 14 testing documents retrieve relevantly to its category. Meanwhile 6 of 20 testing documents retrieve irrelevant to its category. Result of system evaluation shows that both precision and recall are 0,7.
NASA Astrophysics Data System (ADS)
Gabrieli, Andrea; Sant, Marco; Izadi, Saeed; Shabane, Parviz Seifpanahi; Onufriev, Alexey V.; Suffritti, Giuseppe B.
2018-02-01
Classical molecular dynamics simulations were performed to study the high-temperature (above 300 K) dynamic behavior of bulk water, specifically the behavior of the diffusion coefficient, hydrogen bond, and nearest-neighbor lifetimes. Two water potentials were compared: the recently proposed "globally optimal" point charge (OPC) model and the well-known TIP4P-Ew model. By considering the Arrhenius plots of the computed inverse diffusion coefficient and rotational relaxation constants, a crossover from Vogel-Fulcher-Tammann behavior to a linear trend with increasing temperature was detected at T* ≈ 309 and T* ≈ 285 K for the OPC and TIP4P-Ew models, respectively. Experimentally, the crossover point was previously observed at T* ± 315-5 K. We also verified that for the coefficient of thermal expansion α P ( T, P), the isobaric α P ( T) curves cross at about the same T* as in the experiment. The lifetimes of water hydrogen bonds and of the nearest neighbors were evaluated and were found to cross near T*, where the lifetimes are about 1 ps. For T < T*, hydrogen bonds persist longer than nearest neighbors, suggesting that the hydrogen bonding network dominates the water structure at T < T*, whereas for T > T*, water behaves more like a simple liquid. The fact that T* falls within the biologically relevant temperature range is a strong motivation for further analysis of the phenomenon and its possible consequences for biomolecular systems.
NASA Astrophysics Data System (ADS)
Yunker, Peter J.; Zhang, Zexin; Gratale, Matthew; Chen, Ke; Yodh, A. G.
2013-03-01
We study connections between vibrational spectra and average nearest neighbor number in disordered clusters of colloidal particles with attractive interactions. Measurements of displacement covariances between particles in each cluster permit calculation of the stiffness matrix, which contains effective spring constants linking pairs of particles. From the cluster stiffness matrix, we derive vibrational properties of corresponding "shadow" glassy clusters, with the same geometric configuration and interactions as the "source" cluster but without damping. Here, we investigate the stiffness matrix to elucidate the origin of the correlations between the median frequency of cluster vibrational modes and average number of nearest neighbors in the cluster. We find that the mean confining stiffness of particles in a cluster, i.e., the ensemble-averaged sum of nearest neighbor spring constants, correlates strongly with average nearest neighbor number, and even more strongly with median frequency. Further, we find that the average oscillation frequency of an individual particle is set by the total stiffness of its nearest neighbor bonds; this average frequency increases as the square root of the nearest neighbor bond stiffness, in a manner similar to the simple harmonic oscillator.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang Yanxia; Ma He; Peng Nanbo
We apply one of the lazy learning methods, the k-nearest neighbor (kNN) algorithm, to estimate the photometric redshifts of quasars based on various data sets from the Sloan Digital Sky Survey (SDSS), the UKIRT Infrared Deep Sky Survey (UKIDSS), and the Wide-field Infrared Survey Explorer (WISE; the SDSS sample, the SDSS-UKIDSS sample, the SDSS-WISE sample, and the SDSS-UKIDSS-WISE sample). The influence of the k value and different input patterns on the performance of kNN is discussed. kNN performs best when k is different with a special input pattern for a special data set. The best result belongs to the SDSS-UKIDSS-WISEmore » sample. The experimental results generally show that the more information from more bands, the better performance of photometric redshift estimation with kNN. The results also demonstrate that kNN using multiband data can effectively solve the catastrophic failure of photometric redshift estimation, which is met by many machine learning methods. Compared with the performance of various other methods of estimating the photometric redshifts of quasars, kNN based on KD-Tree shows superiority, exhibiting the best accuracy.« less
Attention Recognition in EEG-Based Affective Learning Research Using CFS+KNN Algorithm.
Hu, Bin; Li, Xiaowei; Sun, Shuting; Ratcliffe, Martyn
2018-01-01
The research detailed in this paper focuses on the processing of Electroencephalography (EEG) data to identify attention during the learning process. The identification of affect using our procedures is integrated into a simulated distance learning system that provides feedback to the user with respect to attention and concentration. The authors propose a classification procedure that combines correlation-based feature selection (CFS) and a k-nearest-neighbor (KNN) data mining algorithm. To evaluate the CFS+KNN algorithm, it was test against CFS+C4.5 algorithm and other classification algorithms. The classification performance was measured 10 times with different 3-fold cross validation data. The data was derived from 10 subjects while they were attempting to learn material in a simulated distance learning environment. A self-assessment model of self-report was used with a single valence to evaluate attention on 3 levels (high, neutral, low). It was found that CFS+KNN had a much better performance, giving the highest correct classification rate (CCR) of % for the valence dimension divided into three classes.
A class of parallel algorithms for computation of the manipulator inertia matrix
NASA Technical Reports Server (NTRS)
Fijany, Amir; Bejczy, Antal K.
1989-01-01
Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
A semi-supervised classification algorithm using the TAD-derived background as training data
NASA Astrophysics Data System (ADS)
Fan, Lei; Ambeau, Brittany; Messinger, David W.
2013-05-01
In general, spectral image classification algorithms fall into one of two categories: supervised and unsupervised. In unsupervised approaches, the algorithm automatically identifies clusters in the data without a priori information about those clusters (except perhaps the expected number of them). Supervised approaches require an analyst to identify training data to learn the characteristics of the clusters such that they can then classify all other pixels into one of the pre-defined groups. The classification algorithm presented here is a semi-supervised approach based on the Topological Anomaly Detection (TAD) algorithm. The TAD algorithm defines background components based on a mutual k-Nearest Neighbor graph model of the data, along with a spectral connected components analysis. Here, the largest components produced by TAD are used as regions of interest (ROI's),or training data for a supervised classification scheme. By combining those ROI's with a Gaussian Maximum Likelihood (GML) or a Minimum Distance to the Mean (MDM) algorithm, we are able to achieve a semi supervised classification method. We test this classification algorithm against data collected by the HyMAP sensor over the Cooke City, MT area and University of Pavia scene.
Lee, Jinseok; Chon, Ki H
2010-09-01
We present particle filtering (PF) algorithms for an accurate respiratory rate extraction from pulse oximeter recordings over a broad range: 12-90 breaths/min. These methods are based on an autoregressive (AR) model, where the aim is to find the pole angle with the highest magnitude as it corresponds to the respiratory rate. However, when SNR is low, the pole angle with the highest magnitude may not always lead to accurate estimation of the respiratory rate. To circumvent this limitation, we propose a probabilistic approach, using a sequential Monte Carlo method, named PF, which is combined with the optimal parameter search (OPS) criterion for an accurate AR model-based respiratory rate extraction. The PF technique has been widely adopted in many tracking applications, especially for nonlinear and/or non-Gaussian problems. We examine the performances of five different likelihood functions of the PF algorithm: the strongest neighbor, nearest neighbor (NN), weighted nearest neighbor (WNN), probability data association (PDA), and weighted probability data association (WPDA). The performance of these five combined OPS-PF algorithms was measured against a solely OPS-based AR algorithm for respiratory rate extraction from pulse oximeter recordings. The pulse oximeter data were collected from 33 healthy subjects with breathing rates ranging from 12 to 90 breaths/ min. It was found that significant improvement in accuracy can be achieved by employing particle filters, and that the combined OPS-PF employing either the NN or WNN likelihood function achieved the best results for all respiratory rates considered in this paper. The main advantage of the combined OPS-PF with either the NN or WNN likelihood function is that for the first time, respiratory rates as high as 90 breaths/min can be accurately extracted from pulse oximeter recordings.
An optimized video system for augmented reality in endodontics: a feasibility study.
Bruellmann, D D; Tjaden, H; Schwanecke, U; Barth, P
2013-03-01
We propose an augmented reality system for the reliable detection of root canals in video sequences based on a k-nearest neighbor color classification and introduce a simple geometric criterion for teeth. The new software was implemented using C++, Qt, and the image processing library OpenCV. Teeth are detected in video images to restrict the segmentation of the root canal orifices by using a k-nearest neighbor algorithm. The location of the root canal orifices were determined using Euclidean distance-based image segmentation. A set of 126 human teeth with known and verified locations of the root canal orifices was used for evaluation. The software detects root canals orifices for automatic classification of the teeth in video images and stores location and size of the found structures. Overall 287 of 305 root canals were correctly detected. The overall sensitivity was about 94 %. Classification accuracy for molars ranged from 65.0 to 81.2 % and from 85.7 to 96.7 % for premolars. The realized software shows that observations made in anatomical studies can be exploited to automate real-time detection of root canal orifices and tooth classification with a software system. Automatic storage of location, size, and orientation of the found structures with this software can be used for future anatomical studies. Thus, statistical tables with canal locations will be derived, which can improve anatomical knowledge of the teeth to alleviate root canal detection in the future. For this purpose the software is freely available at: http://www.dental-imaging.zahnmedizin.uni-mainz.de/.
Comparative Analysis of Document level Text Classification Algorithms using R
NASA Astrophysics Data System (ADS)
Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.
2017-08-01
From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.
NASA Astrophysics Data System (ADS)
Chen, Dan; Guo, Lin-yuan; Wang, Chen-hao; Ke, Xi-zheng
2017-07-01
Equalization can compensate channel distortion caused by channel multipath effects, and effectively improve convergent of modulation constellation diagram in optical wireless system. In this paper, the subspace blind equalization algorithm is used to preprocess M-ary phase shift keying (MPSK) subcarrier modulation signal in receiver. Mountain clustering is adopted to get the clustering centers of MPSK modulation constellation diagram, and the modulation order is automatically identified through the k-nearest neighbor (KNN) classifier. The experiment has been done under four different weather conditions. Experimental results show that the convergent of constellation diagram is improved effectively after using the subspace blind equalization algorithm, which means that the accuracy of modulation recognition is increased. The correct recognition rate of 16PSK can be up to 85% in any kind of weather condition which is mentioned in paper. Meanwhile, the correct recognition rate is the highest in cloudy and the lowest in heavy rain condition.
Machine learning methods in chemoinformatics
Mitchell, John B O
2014-01-01
Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure–activity relationships (QSAR), many others exist in the technical literature. This discussion is methods-based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k-Nearest Neighbors and naïve Bayes classifiers. WIREs Comput Mol Sci 2014, 4:468–481. How to cite this article: WIREs Comput Mol Sci 2014, 4:468–481. doi:10.1002/wcms.1183 PMID:25285160
Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.
Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi
2013-01-01
The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.
NASA Astrophysics Data System (ADS)
Li, Peng; Su, Haibin; Dong, Hui-Ning; Shen, Shun-Qing
2009-08-01
We study a triangular frustrated antiferromagnetic Heisenberg model with nearest-neighbor interactions J1 and third-nearest-neighbor interactions J3 by means of Schwinger-boson mean-field theory. By setting an antiferromagnetic J3 and varying J1 from positive to negative values, we disclose the low-temperature features of its interesting incommensurate phase. The gapless dispersion of quasiparticles leads to the intrinsic T2 law of specific heat. The magnetic susceptibility is linear in temperature. The local magnetization is significantly reduced by quantum fluctuations. We address possible relevance of these results to the low-temperature properties of NiGa2S4. From a careful analysis of the incommensurate spin wavevector, the interaction parameters are estimated as J1≈-3.8755 K and J3≈14.0628 K, in order to account for the experimental data.
Rb-NMR study of the quasi-one-dimensional competing spin-chain compound R b2C u2M o3O12
NASA Astrophysics Data System (ADS)
Matsui, Kazuki; Yagi, Ayato; Hoshino, Yukihiro; Atarashi, Sochiro; Hase, Masashi; Sasaki, Takahiko; Goto, Takayuki
2017-12-01
A Rb-NMR study has been performed on the quasi-one-dimensional competing spin chain R b2C u2M o3O12 with ferromagnetic and antiferromagnetic exchange interactions on nearest-neighboring and next-nearest neighboring spins, respectively. The system changes from a gapped ground state at zero field to a gapless state at HC≃2 T , where the existence of magnetic order below 1 K was demonstrated by a broadening of the NMR spectrum, associated with a critical divergence of 1 /T1 . In the higher-temperature region, T1-1 showed a power-law-type temperature dependence, from which the field dependence of the Luttinger parameter K was obtained and compared with theoretical calculations based on the spin nematic Tomonaga-Luttinger liquid (TLL) state.
NASA Astrophysics Data System (ADS)
Karľová, Katarína; Strečka, Jozef; Lyra, Marcelo L.
2018-03-01
The spin-1/2 Ising-Heisenberg pentagonal chain is investigated with use of the star-triangle transformation, which establishes a rigorous mapping equivalence with the effective spin-1/2 Ising zigzag ladder. The investigated model has a rich ground-state phase diagram including two spectacular quantum antiferromagnetic ground states with a fourfold broken symmetry. It is demonstrated that these long-period quantum ground states arise due to a competition between the effective next-nearest-neighbor and nearest-neighbor interactions of the corresponding spin-1/2 Ising zigzag ladder. The concurrence is used to quantify the bipartite entanglement between the nearest-neighbor Heisenberg spin pairs, which are quantum-mechanically entangled in two quantum ground states with or without spontaneously broken symmetry. The pair correlation functions between the nearest-neighbor Heisenberg spins as well as the next-nearest-neighbor and nearest-neighbor Ising spins were investigated with the aim to bring insight into how a relevant short-range order manifests itself at low enough temperatures. It is shown that the specific heat displays temperature dependencies with either one or two separate round maxima.
Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang
2016-11-16
The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
NASA Astrophysics Data System (ADS)
Surmach, M. A.; Chen, B. J.; Deng, Z.; Jin, C. Q.; Glasbrenner, J. K.; Mazin, I. I.; Ivanov, A.; Inosov, D. S.
2018-03-01
Dilute magnetic semiconductors (DMS) are nonmagnetic semiconductors doped with magnetic transition metals. The recently discovered DMS material (Ba1 -xKx) (Zn1-yMny) 2As2 offers a unique and versatile control of the Curie temperature TC by decoupling the spin (Mn2 +, S =5 /2 ) and charge (K+) doping in different crystallographic layers. In an attempt to describe from first-principles calculations the role of hole doping in stabilizing ferromagnetic order, it was recently suggested that the antiferromagnetic exchange coupling J between the nearest-neighbor Mn ions would experience a nearly twofold suppression upon doping 20% of holes by potassium substitution. At the same time, further-neighbor interactions become increasingly ferromagnetic upon doping, leading to a rapid increase of TC. Using inelastic neutron scattering, we have observed a localized magnetic excitation at about 13 meV associated with the destruction of the nearest-neighbor Mn-Mn singlet ground state. Hole doping results in a notable broadening of this peak, evidencing significant particle-hole damping, but with only a minor change in the peak position. We argue that this unexpected result can be explained by a combined effect of superexchange and double-exchange interactions.
Iris Recognition Using Feature Extraction of Box Counting Fractal Dimension
NASA Astrophysics Data System (ADS)
Khotimah, C.; Juniati, D.
2018-01-01
Biometrics is a science that is now growing rapidly. Iris recognition is a biometric modality which captures a photo of the eye pattern. The markings of the iris are distinctive that it has been proposed to use as a means of identification, instead of fingerprints. Iris recognition was chosen for identification in this research because every human has a special feature that each individual is different and the iris is protected by the cornea so that it will have a fixed shape. This iris recognition consists of three step: pre-processing of data, feature extraction, and feature matching. Hough transformation is used in the process of pre-processing to locate the iris area and Daugman’s rubber sheet model to normalize the iris data set into rectangular blocks. To find the characteristics of the iris, it was used box counting method to get the fractal dimension value of the iris. Tests carried out by used k-fold cross method with k = 5. In each test used 10 different grade K of K-Nearest Neighbor (KNN). The result of iris recognition was obtained with the best accuracy was 92,63 % for K = 3 value on K-Nearest Neighbor (KNN) method.
NASA Astrophysics Data System (ADS)
Qur’ania, A.; Sarinah, I.
2018-03-01
People often wrong in knowing the type of jasmine by just looking at the white color of the jasmine, while not all white flowers including jasmine and not all jasmine flowers have white. There is a jasmine that is yellow and there is a jasmine that is white and purple.The aim of this research is to identify Jasmine flower (Jasminum sp.) based on the shape of the flower image-based using Sobel edge detection and k-Nearest Neighbor. Edge detection is used to detect the type of flower from the flower shape. Edge detection aims to improve the appearance of the border of a digital image. While k-Nearest Neighbor method is used to classify the classification of test objects into classes that have neighbouring properties closest to the object of training. The data used in this study are three types of jasmine namely jasmine white (Jasminum sambac), jasmine gambir (Jasminum pubescens), and jasmine japan (Pseuderanthemum reticulatum). Testing of jasmine flower image resized 50 × 50 pixels, 100 × 100 pixels, 150 × 150 pixels yields an accuracy of 84%. Tests on distance values of the k-NN method with spacing 5, 10 and 15 resulted in different accuracy rates for 5 and 10 closest distances yielding the same accuracy rate of 84%, for the 15 shortest distance resulted in a small accuracy of 65.2%.
NASA Astrophysics Data System (ADS)
Kohno, Masanori
2018-05-01
The single-particle spectral properties of the two-dimensional t-J model with next-nearest-neighbor hopping are investigated near the Mott transition by using cluster perturbation theory. The spectral features are interpreted by considering the effects of the next-nearest-neighbor hopping on the shift of the spectral-weight distribution of the two-dimensional t-J model. Various anomalous features observed in hole-doped and electron-doped high-temperature cuprate superconductors are collectively explained in the two-dimensional t-J model with next-nearest-neighbor hopping near the Mott transition.
Thermography based diagnosis of ruptured anterior cruciate ligament (ACL) in canines
NASA Astrophysics Data System (ADS)
Lama, Norsang; Umbaugh, Scott E.; Mishra, Deependra; Dahal, Rohini; Marino, Dominic J.; Sackman, Joseph
2016-09-01
Anterior cruciate ligament (ACL) rupture in canines is a common orthopedic injury in veterinary medicine. Veterinarians use both imaging and non-imaging methods to diagnose the disease. Common imaging methods such as radiography, computed tomography (CT scan) and magnetic resonance imaging (MRI) have some disadvantages: expensive setup, high dose of radiation, and time-consuming. In this paper, we present an alternative diagnostic method based on feature extraction and pattern classification (FEPC) to diagnose abnormal patterns in ACL thermograms. The proposed method was experimented with a total of 30 thermograms for each camera view (anterior, lateral and posterior) including 14 disease and 16 non-disease cases provided from Long Island Veterinary Specialists. The normal and abnormal patterns in thermograms are analyzed in two steps: feature extraction and pattern classification. Texture features based on gray level co-occurrence matrices (GLCM), histogram features and spectral features are extracted from the color normalized thermograms and the computed feature vectors are applied to Nearest Neighbor (NN) classifier, K-Nearest Neighbor (KNN) classifier and Support Vector Machine (SVM) classifier with leave-one-out validation method. The algorithm gives the best classification success rate of 86.67% with a sensitivity of 85.71% and a specificity of 87.5% in ACL rupture detection using NN classifier for the lateral view and Norm-RGB-Lum color normalization method. Our results show that the proposed method has the potential to detect ACL rupture in canines.
A Scalable O(N) Algorithm for Large-Scale Parallel First-Principles Molecular Dynamics Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Osei-Kuffuor, Daniel; Fattebert, Jean-Luc
2014-01-01
Traditional algorithms for first-principles molecular dynamics (FPMD) simulations only gain a modest capability increase from current petascale computers, due to their O(N 3) complexity and their heavy use of global communications. To address this issue, we are developing a truly scalable O(N) complexity FPMD algorithm, based on density functional theory (DFT), which avoids global communications. The computational model uses a general nonorthogonal orbital formulation for the DFT energy functional, which requires knowledge of selected elements of the inverse of the associated overlap matrix. We present a scalable algorithm for approximately computing selected entries of the inverse of the overlap matrix,more » based on an approximate inverse technique, by inverting local blocks corresponding to principal submatrices of the global overlap matrix. The new FPMD algorithm exploits sparsity and uses nearest neighbor communication to provide a computational scheme capable of extreme scalability. Accuracy is controlled by the mesh spacing of the finite difference discretization, the size of the localization regions in which the electronic orbitals are confined, and a cutoff beyond which the entries of the overlap matrix can be omitted when computing selected entries of its inverse. We demonstrate the algorithm's excellent parallel scaling for up to O(100K) atoms on O(100K) processors, with a wall-clock time of O(1) minute per molecular dynamics time step.« less
Inspection of wear particles in oils by using a fuzzy classifier
NASA Astrophysics Data System (ADS)
Hamalainen, Jari J.; Enwald, Petri
1994-11-01
The reliability of stand-alone machines and larger production units can be improved by automated condition monitoring. Analysis of wear particles in lubricating or hydraulic oils helps diagnosing the wear states of machine parts. This paper presents a computer vision system for automated classification of wear particles. Digitized images from experiments with a bearing test bench, a hydraulic system with an industrial company, and oil samples from different industrial sources were used for algorithm development and testing. The wear particles were divided into four classes indicating different wear mechanisms: cutting wear, fatigue wear, adhesive wear, and abrasive wear. The results showed that the fuzzy K-nearest neighbor classifier utilized gave the same distribution of wear particles as the classification by a human expert.
Atehortúa, Sara C; Palacio-Mejía, Lina S
2014-01-01
Assessing the impact of subsidized healthcare insurance on access to cervical cytology in Medellin, Colombia. Propensity score matching (PSM) was used with 2008 Life Quality Survey in Colombia figures to obtain a control group comparable to a treatment group. This involved using stratification estimates, the k-nearest-neighbor algorithm and kernel density for calculating impact size Access to cytology for 19 to 49 year-old women having subsidized healthcare insurance were 2.2 % to 2.9 % lower compared to women who did not have any healthcare insurance. Estimates were not statistically significant for women over 50 years-old. Women lacking healthcare insurance having increased access to cytology could be explained by charities or social programs aiding the population lacking healthcare insurance.
Social aggregation in pea aphids: experiment and random walk modeling.
Nilsen, Christa; Paige, John; Warner, Olivia; Mayhew, Benjamin; Sutley, Ryan; Lam, Matthew; Bernoff, Andrew J; Topaz, Chad M
2013-01-01
From bird flocks to fish schools and ungulate herds to insect swarms, social biological aggregations are found across the natural world. An ongoing challenge in the mathematical modeling of aggregations is to strengthen the connection between models and biological data by quantifying the rules that individuals follow. We model aggregation of the pea aphid, Acyrthosiphon pisum. Specifically, we conduct experiments to track the motion of aphids walking in a featureless circular arena in order to deduce individual-level rules. We observe that each aphid transitions stochastically between a moving and a stationary state. Moving aphids follow a correlated random walk. The probabilities of motion state transitions, as well as the random walk parameters, depend strongly on distance to an aphid's nearest neighbor. For large nearest neighbor distances, when an aphid is essentially isolated, its motion is ballistic with aphids moving faster, turning less, and being less likely to stop. In contrast, for short nearest neighbor distances, aphids move more slowly, turn more, and are more likely to become stationary; this behavior constitutes an aggregation mechanism. From the experimental data, we estimate the state transition probabilities and correlated random walk parameters as a function of nearest neighbor distance. With the individual-level model established, we assess whether it reproduces the macroscopic patterns of movement at the group level. To do so, we consider three distributions, namely distance to nearest neighbor, angle to nearest neighbor, and percentage of population moving at any given time. For each of these three distributions, we compare our experimental data to the output of numerical simulations of our nearest neighbor model, and of a control model in which aphids do not interact socially. Our stochastic, social nearest neighbor model reproduces salient features of the experimental data that are not captured by the control.
Quantum realization of the nearest-neighbor interpolation method for FRQI and NEQR
NASA Astrophysics Data System (ADS)
Sang, Jianzhi; Wang, Shen; Niu, Xiamu
2016-01-01
This paper is concerned with the feasibility of the classical nearest-neighbor interpolation based on flexible representation of quantum images (FRQI) and novel enhanced quantum representation (NEQR). Firstly, the feasibility of the classical image nearest-neighbor interpolation for quantum images of FRQI and NEQR is proven. Then, by defining the halving operation and by making use of quantum rotation gates, the concrete quantum circuit of the nearest-neighbor interpolation for FRQI is designed for the first time. Furthermore, quantum circuit of the nearest-neighbor interpolation for NEQR is given. The merit of the proposed NEQR circuit lies in their low complexity, which is achieved by utilizing the halving operation and the quantum oracle operator. Finally, in order to further improve the performance of the former circuits, new interpolation circuits for FRQI and NEQR are presented by using Control-NOT gates instead of a halving operation. Simulation results show the effectiveness of the proposed circuits.
NASA Astrophysics Data System (ADS)
Stevens, Jeffrey
The past decade has seen the emergence of many hyperspectral image (HSI) analysis algorithms based on graph theory and derived manifold-coordinates. Yet, despite the growing number of algorithms, there has been limited study of the graphs constructed from spectral data themselves. Which graphs are appropriate for various HSI analyses--and why? This research aims to begin addressing these questions as the performance of graph-based techniques is inextricably tied to the graphical model constructed from the spectral data. We begin with a literature review providing a survey of spectral graph construction techniques currently used by the hyperspectral community, starting with simple constructs demonstrating basic concepts and then incrementally adding components to derive more complex approaches. Throughout this development, we discuss algorithm advantages and disadvantages for different types of hyperspectral analysis. A focus is provided on techniques influenced by spectral density through which the concept of community structure arises. Through the use of simulated and real HSI data, we demonstrate density-based edge allocation produces more uniform nearest neighbor lists than non-density based techniques through increasing the number of intracluster edges, facilitating higher k-nearest neighbor (k-NN) classification performance. Imposing the common mutuality constraint to symmetrify adjacency matrices is demonstrated to be beneficial in most circumstances, especially in rural (less cluttered) scenes. Many complex adaptive edge-reweighting techniques are shown to slightly degrade nearest-neighbor list characteristics. Analysis suggests this condition is possibly attributable to the validity of characterizing spectral density by a single variable representing data scale for each pixel. Additionally, it is shown that imposing mutuality hurts the performance of adaptive edge-allocation techniques or any technique that aims to assign a low number of edges (<10) to any pixel. A simple k bias addresses this problem. Many of the adaptive edge-reweighting techniques are based on the concept of codensity, so we explore codensity properties as they relate to density-based edge reweighting. We find that codensity may not be the best estimator of local scale due to variations in cluster density, so we introduce and compare two inherently density-weighted graph construction techniques from the data mining literature: shared nearest neighbors (SNN) and mutual proximity (MP). MP and SNN are not reliant upon a codensity measure, hence are not susceptible to its shortcomings. Neither has been used for hyperspectral analyses, so this presents the first study of these techniques on HSI data. We demonstrate MP and SNN can offer better performance, but in general none of the reweighting techniques improve the quality of these spectral graphs in our neighborhood structure tests. As such, these complex adaptive edge-reweighting techniques may need to be modified to increase their effectiveness. During this investigation, we probe deeper into properties of high-dimensional data and introduce the concept of concentration of measure (CoM)--the degradation in the efficacy of many common distance measures with increasing dimensionality--as it relates to spectral graph construction. CoM exists in pairwise distances between HSI pixels, but not to the degree experienced in random data of the same extrinsic dimension; a characteristic we demonstrate is due to the rich correlation and cluster structure present in HSI data. CoM can lead to hubness--a condition wherein some nodes have short distances (high similarities) to an exceptionally large number of nodes. We study hub presence in 49 HSI datasets of varying resolutions, altitudes, and spectral bands to demonstrate hubness effects are negligible in a k-NN classification example (generalized counting scenarios), but we note its impact on methods that use edge weights to derive manifold coordinates or splitting clusters based on spectral graph theory requires more investigation. Many of these new graph-related quantities can be exploited to demonstrate new techniques for HSI classification and anomaly detection. We present an initial exploration into this relatively new and exciting field based on an enhanced Schroedinger Eigenmap classification example and compare results to the current state-of-the-art approach. We produce equivalent results, but demonstrate different types of misclassifications, opening the door to combine the best of both approaches to achieve truly superior performance. A separate less mature hubness-assisted anomaly detector (HAAD) is also presented.
An ensemble of dissimilarity based classifiers for Mackerel gender determination
NASA Astrophysics Data System (ADS)
Blanco, A.; Rodriguez, R.; Martinez-Maranon, I.
2014-03-01
Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.
Online Feature Transformation Learning for Cross-Domain Object Category Recognition.
Zhang, Xuesong; Zhuang, Yan; Wang, Wei; Pedrycz, Witold
2017-06-09
In this paper, we introduce a new research problem termed online feature transformation learning in the context of multiclass object category recognition. The learning of a feature transformation is viewed as learning a global similarity metric function in an online manner. We first consider the problem of online learning a feature transformation matrix expressed in the original feature space and propose an online passive aggressive feature transformation algorithm. Then these original features are mapped to kernel space and an online single kernel feature transformation (OSKFT) algorithm is developed to learn a nonlinear feature transformation. Based on the OSKFT and the existing Hedge algorithm, a novel online multiple kernel feature transformation algorithm is also proposed, which can further improve the performance of online feature transformation learning in large-scale application. The classifier is trained with k nearest neighbor algorithm together with the learned similarity metric function. Finally, we experimentally examined the effect of setting different parameter values in the proposed algorithms and evaluate the model performance on several multiclass object recognition data sets. The experimental results demonstrate the validity and good performance of our methods on cross-domain and multiclass object recognition application.
Chaotic particle swarm optimization with mutation for classification.
Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza
2015-01-01
In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms.
Earthquake Declustering via a Nearest-Neighbor Approach in Space-Time-Magnitude Domain
NASA Astrophysics Data System (ADS)
Zaliapin, I. V.; Ben-Zion, Y.
2016-12-01
We propose a new method for earthquake declustering based on nearest-neighbor analysis of earthquakes in space-time-magnitude domain. The nearest-neighbor approach was recently applied to a variety of seismological problems that validate the general utility of the technique and reveal the existence of several different robust types of earthquake clusters. Notably, it was demonstrated that clustering associated with the largest earthquakes is statistically different from that of small-to-medium events. In particular, the characteristic bimodality of the nearest-neighbor distances that helps separating clustered and background events is often violated after the largest earthquakes in their vicinity, which is dominated by triggered events. This prevents using a simple threshold between the two modes of the nearest-neighbor distance distribution for declustering. The current study resolves this problem hence extending the nearest-neighbor approach to the problem of earthquake declustering. The proposed technique is applied to seismicity of different areas in California (San Jacinto, Coso, Salton Sea, Parkfield, Ventura, Mojave, etc.), as well as to the global seismicity, to demonstrate its stability and efficiency in treating various clustering types. The results are compared with those of alternative declustering methods.
A Novel Color Image Encryption Algorithm Based on Quantum Chaos Sequence
NASA Astrophysics Data System (ADS)
Liu, Hui; Jin, Cong
2017-03-01
In this paper, a novel algorithm of image encryption based on quantum chaotic is proposed. The keystreams are generated by the two-dimensional logistic map as initial conditions and parameters. And then general Arnold scrambling algorithm with keys is exploited to permute the pixels of color components. In diffusion process, a novel encryption algorithm, folding algorithm, is proposed to modify the value of diffused pixels. In order to get the high randomness and complexity, the two-dimensional logistic map and quantum chaotic map are coupled with nearest-neighboring coupled-map lattices. Theoretical analyses and computer simulations confirm that the proposed algorithm has high level of security.
A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data.
Baur, Brittany; Bozdag, Serdar
2016-01-01
DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.
Hash Bit Selection for Nearest Neighbor Search.
Xianglong Liu; Junfeng He; Shih-Fu Chang
2017-11-01
To overcome the barrier of storage and computation when dealing with gigantic-scale data sets, compact hashing has been studied extensively to approximate the nearest neighbor search. Despite the recent advances, critical design issues remain open in how to select the right features, hashing algorithms, and/or parameter settings. In this paper, we address these by posing an optimal hash bit selection problem, in which an optimal subset of hash bits are selected from a pool of candidate bits generated by different features, algorithms, or parameters. Inspired by the optimization criteria used in existing hashing algorithms, we adopt the bit reliability and their complementarity as the selection criteria that can be carefully tailored for hashing performance in different tasks. Then, the bit selection solution is discovered by finding the best tradeoff between search accuracy and time using a modified dynamic programming method. To further reduce the computational complexity, we employ the pairwise relationship among hash bits to approximate the high-order independence property, and formulate it as an efficient quadratic programming method that is theoretically equivalent to the normalized dominant set problem in a vertex- and edge-weighted graph. Extensive large-scale experiments have been conducted under several important application scenarios of hash techniques, where our bit selection framework can achieve superior performance over both the naive selection methods and the state-of-the-art hashing algorithms, with significant accuracy gains ranging from 10% to 50%, relatively.
NASA Astrophysics Data System (ADS)
Yin, Lucy; Andrews, Jennifer; Heaton, Thomas
2018-05-01
Earthquake parameter estimations using nearest neighbor searching among a large database of observations can lead to reliable prediction results. However, in the real-time application of Earthquake Early Warning (EEW) systems, the accurate prediction using a large database is penalized by a significant delay in the processing time. We propose to use a multidimensional binary search tree (KD tree) data structure to organize large seismic databases to reduce the processing time in nearest neighbor search for predictions. We evaluated the performance of KD tree on the Gutenberg Algorithm, a database-searching algorithm for EEW. We constructed an offline test to predict peak ground motions using a database with feature sets of waveform filter-bank characteristics, and compare the results with the observed seismic parameters. We concluded that large database provides more accurate predictions of the ground motion information, such as peak ground acceleration, velocity, and displacement (PGA, PGV, PGD), than source parameters, such as hypocenter distance. Application of the KD tree search to organize the database reduced the average searching process by 85% time cost of the exhaustive method, allowing the method to be feasible for real-time implementation. The algorithm is straightforward and the results will reduce the overall time of warning delivery for EEW.
NASA Astrophysics Data System (ADS)
Van Kha, Tran; Van Vuong, Hoang; Thanh, Do Duc; Hung, Duong Quoc; Anh, Le Duc
2018-05-01
The maximum horizontal gradient method was first proposed by Blakely and Simpson (1986) for determining the boundaries between geological bodies with different densities. The method involves the comparison of a center point with its eight nearest neighbors in four directions within each 3 × 3 calculation grid. The horizontal location and magnitude of the maximum values are found by interpolating a second-order polynomial through the trio of points provided that the magnitude of the middle point is greater than its two nearest neighbors in one direction. In theoretical models of multiple sources, however, the above condition does not allow the maximum horizontal locations to be fully located, and it could be difficult to correlate the edges of complicated sources. In this paper, the authors propose an additional condition to identify more maximum horizontal locations within the calculation grid. This additional condition will improve the method algorithm for interpreting the boundaries of magnetic and/or gravity sources. The improved algorithm was tested on gravity models and applied to gravity data for the Phu Khanh basin on the continental shelf of the East Vietnam Sea. The results show that the additional locations of the maximum horizontal gradient could be helpful for connecting the edges of complicated source bodies.
A novel feature ranking algorithm for biometric recognition with PPG signals.
Reşit Kavsaoğlu, A; Polat, Kemal; Recep Bozkurt, M
2014-06-01
This study is intended for describing the application of the Photoplethysmography (PPG) signal and the time domain features acquired from its first and second derivatives for biometric identification. For this purpose, a sum of 40 features has been extracted and a feature-ranking algorithm is proposed. This proposed algorithm calculates the contribution of each feature to biometric recognition and collocates the features, the contribution of which is from great to small. While identifying the contribution of the features, the Euclidean distance and absolute distance formulas are used. The efficiency of the proposed algorithms is demonstrated by the results of the k-NN (k-nearest neighbor) classifier applications of the features. During application, each 15-period-PPG signal belonging to two different durations from each of the thirty healthy subjects were used with a PPG data acquisition card. The first PPG signals recorded from the subjects were evaluated as the 1st configuration; the PPG signals recorded later at a different time as the 2nd configuration and the combination of both were evaluated as the 3rd configuration. When the results were evaluated for the k-NN classifier model created along with the proposed algorithm, an identification of 90.44% for the 1st configuration, 94.44% for the 2nd configuration, and 87.22% for the 3rd configuration has successfully been attained. The obtained results showed that both the proposed algorithm and the biometric identification model based on this developed PPG signal are very promising for contactless recognizing the people with the proposed method. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Ghaemi, Mehrdad; Javadi, Nabi
2017-11-01
The phase diagrams of the three-layer Ising model on the honeycomb lattice with a diluted surface have been constructed using the probabilistic cellular automata based on Glauber algorithm. The effects of the exchange interactions on the phase diagrams have been investigated. A general mathematical expression for the critical temperature is obtained in terms of relative coupling r = J1/J and Δs = (Js/J) - 1, where J and Js represent the nearest neighbor coupling within inner- and surface-layers, respectively, and each magnetic site in the surface-layer is coupled with the nearest neighbor site in the inner-layer via the exchange coupling J1. In the case of antiferromagnetic coupling between surface-layer and inner-layer, system reveals many interesting phenomena, such as the possibility of existence of compensation line before the critical temperature.
Implementation of Nearest Neighbor using HSV to Identify Skin Disease
NASA Astrophysics Data System (ADS)
Gerhana, Y. A.; Zulfikar, W. B.; Ramdani, A. H.; Ramdhani, M. A.
2018-01-01
Today, Android is one of the most widely used operating system in the world. Most of android device has a camera that could capture an image, this feature could be optimized to identify skin disease. The disease is one of health problem caused by bacterium, fungi, and virus. The symptoms of skin disease usually visible. In this work, the symptoms that captured as image contains HSV in every pixel of the image. HSV can extracted and then calculate to earn euclidean value. The value compared using nearest neighbor algorithm to discover closer value between image testing and image training to get highest value that decide class label or type of skin disease. The testing result show that 166 of 200 or about 80% is accurate. There are some reasons that influence the result of classification model like number of image training and quality of android device’s camera.
2013-01-01
Background Identifying the emotional state is helpful in applications involving patients with autism and other intellectual disabilities; computer-based training, human computer interaction etc. Electrocardiogram (ECG) signals, being an activity of the autonomous nervous system (ANS), reflect the underlying true emotional state of a person. However, the performance of various methods developed so far lacks accuracy, and more robust methods need to be developed to identify the emotional pattern associated with ECG signals. Methods Emotional ECG data was obtained from sixty participants by inducing the six basic emotional states (happiness, sadness, fear, disgust, surprise and neutral) using audio-visual stimuli. The non-linear feature ‘Hurst’ was computed using Rescaled Range Statistics (RRS) and Finite Variance Scaling (FVS) methods. New Hurst features were proposed by combining the existing RRS and FVS methods with Higher Order Statistics (HOS). The features were then classified using four classifiers – Bayesian Classifier, Regression Tree, K- nearest neighbor and Fuzzy K-nearest neighbor. Seventy percent of the features were used for training and thirty percent for testing the algorithm. Results Analysis of Variance (ANOVA) conveyed that Hurst and the proposed features were statistically significant (p < 0.001). Hurst computed using RRS and FVS methods showed similar classification accuracy. The features obtained by combining FVS and HOS performed better with a maximum accuracy of 92.87% and 76.45% for classifying the six emotional states using random and subject independent validation respectively. Conclusions The results indicate that the combination of non-linear analysis and HOS tend to capture the finer emotional changes that can be seen in healthy ECG data. This work can be further fine tuned to develop a real time system. PMID:23680041
Introducing two Random Forest based methods for cloud detection in remote sensing images
NASA Astrophysics Data System (ADS)
Ghasemian, Nafiseh; Akhoondzadeh, Mehdi
2018-07-01
Cloud detection is a necessary phase in satellite images processing to retrieve the atmospheric and lithospheric parameters. Currently, some cloud detection methods based on Random Forest (RF) model have been proposed but they do not consider both spectral and textural characteristics of the image. Furthermore, they have not been tested in the presence of snow/ice. In this paper, we introduce two RF based algorithms, Feature Level Fusion Random Forest (FLFRF) and Decision Level Fusion Random Forest (DLFRF) to incorporate visible, infrared (IR) and thermal spectral and textural features (FLFRF) including Gray Level Co-occurrence Matrix (GLCM) and Robust Extended Local Binary Pattern (RELBP_CI) or visible, IR and thermal classifiers (DLFRF) for highly accurate cloud detection on remote sensing images. FLFRF first fuses visible, IR and thermal features. Thereafter, it uses the RF model to classify pixels to cloud, snow/ice and background or thick cloud, thin cloud and background. DLFRF considers visible, IR and thermal features (both spectral and textural) separately and inserts each set of features to RF model. Then, it holds vote matrix of each run of the model. Finally, it fuses the classifiers using the majority vote method. To demonstrate the effectiveness of the proposed algorithms, 10 Terra MODIS and 15 Landsat 8 OLI/TIRS images with different spatial resolutions are used in this paper. Quantitative analyses are based on manually selected ground truth data. Results show that after adding RELBP_CI to input feature set cloud detection accuracy improves. Also, the average cloud kappa values of FLFRF and DLFRF on MODIS images (1 and 0.99) are higher than other machine learning methods, Linear Discriminate Analysis (LDA), Classification And Regression Tree (CART), K Nearest Neighbor (KNN) and Support Vector Machine (SVM) (0.96). The average snow/ice kappa values of FLFRF and DLFRF on MODIS images (1 and 0.85) are higher than other traditional methods. The quantitative values on Landsat 8 images show similar trend. Consequently, while SVM and K-nearest neighbor show overestimation in predicting cloud and snow/ice pixels, our Random Forest (RF) based models can achieve higher cloud, snow/ice kappa values on MODIS and thin cloud, thick cloud and snow/ice kappa values on Landsat 8 images. Our algorithms predict both thin and thick cloud on Landsat 8 images while the existing cloud detection algorithm, Fmask cannot discriminate them. Compared to the state-of-the-art methods, our algorithms have acquired higher average cloud and snow/ice kappa values for different spatial resolutions.
Standard and inverse bond percolation of straight rigid rods on square lattices
NASA Astrophysics Data System (ADS)
Ramirez, L. S.; Centres, P. M.; Ramirez-Pastor, A. J.
2018-04-01
Numerical simulations and finite-size scaling analysis have been carried out to study standard and inverse bond percolation of straight rigid rods on square lattices. In the case of standard percolation, the lattice is initially empty. Then, linear bond k -mers (sets of k linear nearest-neighbor bonds) are randomly and sequentially deposited on the lattice. Jamming coverage pj ,k and percolation threshold pc ,k are determined for a wide range of k (1 ≤k ≤120 ). pj ,k and pc ,k exhibit a decreasing behavior with increasing k , pj ,k →∞=0.7476 (1 ) and pc ,k →∞=0.0033 (9 ) being the limit values for large k -mer sizes. pj ,k is always greater than pc ,k, and consequently, the percolation phase transition occurs for all values of k . In the case of inverse percolation, the process starts with an initial configuration where all lattice bonds are occupied and, given that periodic boundary conditions are used, the opposite sides of the lattice are connected by nearest-neighbor occupied bonds. Then, the system is diluted by randomly removing linear bond k -mers from the lattice. The central idea here is based on finding the maximum concentration of occupied bonds (minimum concentration of empty bonds) for which connectivity disappears. This particular value of concentration is called the inverse percolation threshold pc,k i, and determines a geometrical phase transition in the system. On the other hand, the inverse jamming coverage pj,k i is the coverage of the limit state, in which no more objects can be removed from the lattice due to the absence of linear clusters of nearest-neighbor bonds of appropriate size. It is easy to understand that pj,k i=1 -pj ,k . The obtained results for pc,k i show that the inverse percolation threshold is a decreasing function of k in the range 1 ≤k ≤18 . For k >18 , all jammed configurations are percolating states, and consequently, there is no nonpercolating phase. In other words, the lattice remains connected even when the highest allowed concentration of removed bonds pj,k i is reached. In terms of network attacks, this striking behavior indicates that random attacks on single nodes (k =1 ) are much more effective than correlated attacks on groups of close nodes (large k 's). Finally, the accurate determination of critical exponents reveals that standard and inverse bond percolation models on square lattices belong to the same universality class as the random percolation, regardless of the size k considered.
The distance function effect on k-nearest neighbor classification for medical datasets.
Hu, Li-Yu; Huang, Min-Wei; Ke, Shih-Wen; Tsai, Chih-Fong
2016-01-01
K-nearest neighbor (k-NN) classification is conventional non-parametric classifier, which has been used as the baseline classifier in many pattern classification problems. It is based on measuring the distances between the test data and each of the training data to decide the final classification output. Since the Euclidean distance function is the most widely used distance metric in k-NN, no study examines the classification performance of k-NN by different distance functions, especially for various medical domain problems. Therefore, the aim of this paper is to investigate whether the distance function can affect the k-NN performance over different medical datasets. Our experiments are based on three different types of medical datasets containing categorical, numerical, and mixed types of data and four different distance functions including Euclidean, cosine, Chi square, and Minkowsky are used during k-NN classification individually. The experimental results show that using the Chi square distance function is the best choice for the three different types of datasets. However, using the cosine and Euclidean (and Minkowsky) distance function perform the worst over the mixed type of datasets. In this paper, we demonstrate that the chosen distance function can affect the classification accuracy of the k-NN classifier. For the medical domain datasets including the categorical, numerical, and mixed types of data, K-NN based on the Chi square distance function performs the best.
Estimating affective word covariates using word association data.
Van Rensbergen, Bram; De Deyne, Simon; Storms, Gert
2016-12-01
Word ratings on affective dimensions are an important tool in psycholinguistic research. Traditionally, they are obtained by asking participants to rate words on each dimension, a time-consuming procedure. As such, there has been some interest in computationally generating norms, by extrapolating words' affective ratings using their semantic similarity to words for which these values are already known. So far, most attempts have derived similarity from word co-occurrence in text corpora. In the current paper, we obtain similarity from word association data. We use these similarity ratings to predict the valence, arousal, and dominance of 14,000 Dutch words with the help of two extrapolation methods: Orientation towards Paradigm Words and k-Nearest Neighbors. The resulting estimates show very high correlations with human ratings when using Orientation towards Paradigm Words, and even higher correlations when using k-Nearest Neighbors. We discuss possible theoretical accounts of our results and compare our findings with previous attempts at computationally generating affective norms.
NASA Astrophysics Data System (ADS)
Jiang, Yuning; Kang, Jinfeng; Wang, Xinan
2017-03-01
Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.
Heterogeneous Multi-Metric Learning for Multi-Sensor Fusion
2011-07-01
distance”. One of the most widely used methods is the k-nearest neighbor ( KNN ) method [4], which labels an input data sample to be the class with majority...despite of its simplicity, it can be an effective candidate and can be easily extended to handle multiple sensors. Distance based method such as KNN relies...Neighbor (LMNN) method [21] which will be briefly reviewed in the sequel. LMNN method tries to learn an optimal metric specifically for KNN classifier. The
Dashti, Ali; Komarov, Ivan; D'Souza, Roshan M
2013-01-01
This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible [Formula: see text]-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.
Transportation Modes Classification Using Sensors on Smartphones.
Fang, Shih-Hau; Liao, Hao-Hsiang; Fei, Yu-Xiang; Chen, Kai-Hsiang; Huang, Jen-Wei; Lu, Yu-Ding; Tsao, Yu
2016-08-19
This paper investigates the transportation and vehicular modes classification by using big data from smartphone sensors. The three types of sensors used in this paper include the accelerometer, magnetometer, and gyroscope. This study proposes improved features and uses three machine learning algorithms including decision trees, K-nearest neighbor, and support vector machine to classify the user's transportation and vehicular modes. In the experiments, we discussed and compared the performance from different perspectives including the accuracy for both modes, the executive time, and the model size. Results show that the proposed features enhance the accuracy, in which the support vector machine provides the best performance in classification accuracy whereas it consumes the largest prediction time. This paper also investigates the vehicle classification mode and compares the results with that of the transportation modes.
Transportation Modes Classification Using Sensors on Smartphones
Fang, Shih-Hau; Liao, Hao-Hsiang; Fei, Yu-Xiang; Chen, Kai-Hsiang; Huang, Jen-Wei; Lu, Yu-Ding; Tsao, Yu
2016-01-01
This paper investigates the transportation and vehicular modes classification by using big data from smartphone sensors. The three types of sensors used in this paper include the accelerometer, magnetometer, and gyroscope. This study proposes improved features and uses three machine learning algorithms including decision trees, K-nearest neighbor, and support vector machine to classify the user’s transportation and vehicular modes. In the experiments, we discussed and compared the performance from different perspectives including the accuracy for both modes, the executive time, and the model size. Results show that the proposed features enhance the accuracy, in which the support vector machine provides the best performance in classification accuracy whereas it consumes the largest prediction time. This paper also investigates the vehicle classification mode and compares the results with that of the transportation modes. PMID:27548182
Fabelo, Himar; Ortega, Samuel; Ravi, Daniele; Kiran, B Ravi; Sosa, Coralia; Bulters, Diederik; Callicó, Gustavo M; Bulstrode, Harry; Szolna, Adam; Piñeiro, Juan F; Kabwama, Silvester; Madroñal, Daniel; Lazcano, Raquel; J-O'Shanahan, Aruma; Bisshopp, Sara; Hernández, María; Báez, Abelardo; Yang, Guang-Zhong; Stanciulescu, Bogdan; Salvador, Rubén; Juárez, Eduardo; Sarmiento, Roberto
2018-01-01
Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries during surgery is challenging. Hyperspectral imaging is a non-contact, non-ionizing and non-invasive technique suitable for medical diagnosis. This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor. The algorithm proposed in this study to approach an efficient solution consists of a hybrid framework that combines both supervised and unsupervised machine learning methods. Firstly, a supervised pixel-wise classification using a Support Vector Machine classifier is performed. The generated classification map is spatially homogenized using a one-band representation of the HS cube, employing the Fixed Reference t-Stochastic Neighbors Embedding dimensional reduction algorithm, and performing a K-Nearest Neighbors filtering. The information generated by the supervised stage is combined with a segmentation map obtained via unsupervised clustering employing a Hierarchical K-Means algorithm. The fusion is performed using a majority voting approach that associates each cluster with a certain class. To evaluate the proposed approach, five hyperspectral images of surface of the brain affected by glioblastoma tumor in vivo from five different patients have been used. The final classification maps obtained have been analyzed and validated by specialists. These preliminary results are promising, obtaining an accurate delineation of the tumor area.
Kabwama, Silvester; Madroñal, Daniel; Lazcano, Raquel; J-O’Shanahan, Aruma; Bisshopp, Sara; Hernández, María; Báez, Abelardo; Yang, Guang-Zhong; Stanciulescu, Bogdan; Salvador, Rubén; Juárez, Eduardo; Sarmiento, Roberto
2018-01-01
Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries during surgery is challenging. Hyperspectral imaging is a non-contact, non-ionizing and non-invasive technique suitable for medical diagnosis. This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor. The algorithm proposed in this study to approach an efficient solution consists of a hybrid framework that combines both supervised and unsupervised machine learning methods. Firstly, a supervised pixel-wise classification using a Support Vector Machine classifier is performed. The generated classification map is spatially homogenized using a one-band representation of the HS cube, employing the Fixed Reference t-Stochastic Neighbors Embedding dimensional reduction algorithm, and performing a K-Nearest Neighbors filtering. The information generated by the supervised stage is combined with a segmentation map obtained via unsupervised clustering employing a Hierarchical K-Means algorithm. The fusion is performed using a majority voting approach that associates each cluster with a certain class. To evaluate the proposed approach, five hyperspectral images of surface of the brain affected by glioblastoma tumor in vivo from five different patients have been used. The final classification maps obtained have been analyzed and validated by specialists. These preliminary results are promising, obtaining an accurate delineation of the tumor area. PMID:29554126
Vector dissimilarity and clustering.
Lefkovitch, L P
1991-04-01
Based on the description of objects by m attributes, an m-element vector dissimilarity function is defined that, unlike scalar functions, retains the distinction among attributes. This function, which satisfies the conditions for a metric, allows the definition of betweenness, which can then be used for clustering. Applications to the subset-generation phase of conditional clustering and to nearest-neighbor-type algorithms are described.
Optimal Superpositioning of Flexible Molecule Ensembles
Gapsys, Vytautas; de Groot, Bert L.
2013-01-01
Analysis of the internal dynamics of a biological molecule requires the successful removal of overall translation and rotation. Particularly for flexible or intrinsically disordered peptides, this is a challenging task due to the absence of a well-defined reference structure that could be used for superpositioning. In this work, we started the analysis with a widely known formulation of an objective for the problem of superimposing a set of multiple molecules as variance minimization over an ensemble. A negative effect of this superpositioning method is the introduction of ambiguous rotations, where different rotation matrices may be applied to structurally similar molecules. We developed two algorithms to resolve the suboptimal rotations. The first approach minimizes the variance together with the distance of a structure to a preceding molecule in the ensemble. The second algorithm seeks for minimal variance together with the distance to the nearest neighbors of each structure. The newly developed methods were applied to molecular-dynamics trajectories and normal-mode ensembles of the Aβ peptide, RS peptide, and lysozyme. These new (to our knowledge) superpositioning methods combine the benefits of variance and distance between nearest-neighbor(s) minimization, providing a solution for the analysis of intrinsic motions of flexible molecules and resolving ambiguous rotations. PMID:23332072
Least-dependent-component analysis based on mutual information
NASA Astrophysics Data System (ADS)
Stögbauer, Harald; Kraskov, Alexander; Astakhov, Sergey A.; Grassberger, Peter
2004-12-01
We propose to use precise estimators of mutual information (MI) to find the least dependent components in a linearly mixed signal. On the one hand, this seems to lead to better blind source separation than with any other presently available algorithm. On the other hand, it has the advantage, compared to other implementations of “independent” component analysis (ICA), some of which are based on crude approximations for MI, that the numerical values of the MI can be used for (i) estimating residual dependencies between the output components; (ii) estimating the reliability of the output by comparing the pairwise MIs with those of remixed components; and (iii) clustering the output according to the residual interdependencies. For the MI estimator, we use a recently proposed k -nearest-neighbor-based algorithm. For time sequences, we combine this with delay embedding, in order to take into account nontrivial time correlations. After several tests with artificial data, we apply the resulting MILCA (mutual-information-based least dependent component analysis) algorithm to a real-world dataset, the ECG of a pregnant woman.
K-Nearest Neighbors Relevance Annotation Model for Distance Education
ERIC Educational Resources Information Center
Ke, Xiao; Li, Shaozi; Cao, Donglin
2011-01-01
With the rapid development of Internet technologies, distance education has become a popular educational mode. In this paper, the authors propose an online image automatic annotation distance education system, which could effectively help children learn interrelations between image content and corresponding keywords. Image automatic annotation is…
Structure of Ordinary Ice Ih. Part 1: Ideal Structure of Ice
1993-10-01
T., H . Onuki and R. Onaka (1977) Electronic structures of water and ice. Journal of the Physics Society of Japan, 42: 152-158. Shimaoka, K. (1960...nearest neighbors .................................................................................................................. 5 6. H -bond...8 12. Positions of oxygen atoms in the ice % h crystal
Physical Human Activity Recognition Using Wearable Sensors.
Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine
2015-12-11
This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors' placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject.
Physical Human Activity Recognition Using Wearable Sensors
Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine
2015-01-01
This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors’ placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject. PMID:26690450
A False Alarm Reduction Method for a Gas Sensor Based Electronic Nose
Rahman, Mohammad Mizanur; Suksompong, Prapun; Toochinda, Pisanu; Taparugssanagorn, Attaphongse
2017-01-01
Electronic noses (E-Noses) are becoming popular for food and fruit quality assessment due to their robustness and repeated usability without fatigue, unlike human experts. An E-Nose equipped with classification algorithms and having open ended classification boundaries such as the k-nearest neighbor (k-NN), support vector machine (SVM), and multilayer perceptron neural network (MLPNN), are found to suffer from false classification errors of irrelevant odor data. To reduce false classification and misclassification errors, and to improve correct rejection performance; algorithms with a hyperspheric boundary, such as a radial basis function neural network (RBFNN) and generalized regression neural network (GRNN) with a Gaussian activation function in the hidden layer should be used. The simulation results presented in this paper show that GRNN has more correct classification efficiency and false alarm reduction capability compared to RBFNN. As the design of a GRNN and RBFNN is complex and expensive due to large numbers of neuron requirements, a simple hyperspheric classification method based on minimum, maximum, and mean (MMM) values of each class of the training dataset was presented. The MMM algorithm was simple and found to be fast and efficient in correctly classifying data of training classes, and correctly rejecting data of extraneous odors, and thereby reduced false alarms. PMID:28895910
A False Alarm Reduction Method for a Gas Sensor Based Electronic Nose.
Rahman, Mohammad Mizanur; Charoenlarpnopparut, Chalie; Suksompong, Prapun; Toochinda, Pisanu; Taparugssanagorn, Attaphongse
2017-09-12
Electronic noses (E-Noses) are becoming popular for food and fruit quality assessment due to their robustness and repeated usability without fatigue, unlike human experts. An E-Nose equipped with classification algorithms and having open ended classification boundaries such as the k -nearest neighbor ( k -NN), support vector machine (SVM), and multilayer perceptron neural network (MLPNN), are found to suffer from false classification errors of irrelevant odor data. To reduce false classification and misclassification errors, and to improve correct rejection performance; algorithms with a hyperspheric boundary, such as a radial basis function neural network (RBFNN) and generalized regression neural network (GRNN) with a Gaussian activation function in the hidden layer should be used. The simulation results presented in this paper show that GRNN has more correct classification efficiency and false alarm reduction capability compared to RBFNN. As the design of a GRNN and RBFNN is complex and expensive due to large numbers of neuron requirements, a simple hyperspheric classification method based on minimum, maximum, and mean (MMM) values of each class of the training dataset was presented. The MMM algorithm was simple and found to be fast and efficient in correctly classifying data of training classes, and correctly rejecting data of extraneous odors, and thereby reduced false alarms.
Efficient protein structure search using indexing methods
2013-01-01
Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively. PMID:23691543
Efficient protein structure search using indexing methods.
Kim, Sungchul; Sael, Lee; Yu, Hwanjo
2013-01-01
Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.
Reija Haapanen; Kimmo Lehtinen; Jukka Miettinen; Marvin E. Bauer; Alan R. Ek
2002-01-01
The k-nearest neighbor (k-NN) method has been undergoing development and testing for applications with USDA Forest Service Forest Inventory and Analysis (FIA) data in Minnesota since 1997. Research began using the 1987-1990 FIA inventory of the state, the then standard 10-point cluster plots, and Landsat TM imagery. In the past year, research has moved to examine...
NASA Astrophysics Data System (ADS)
Nishizuka, N.; Sugiura, K.; Kubo, Y.; Den, M.; Watari, S.; Ishii, M.
2017-02-01
We developed a flare prediction model using machine learning, which is optimized to predict the maximum class of flares occurring in the following 24 hr. Machine learning is used to devise algorithms that can learn from and make decisions on a huge amount of data. We used solar observation data during the period 2010-2015, such as vector magnetograms, ultraviolet (UV) emission, and soft X-ray emission taken by the Solar Dynamics Observatory and the Geostationary Operational Environmental Satellite. We detected active regions (ARs) from the full-disk magnetogram, from which ˜60 features were extracted with their time differentials, including magnetic neutral lines, the current helicity, the UV brightening, and the flare history. After standardizing the feature database, we fully shuffled and randomly separated it into two for training and testing. To investigate which algorithm is best for flare prediction, we compared three machine-learning algorithms: the support vector machine, k-nearest neighbors (k-NN), and extremely randomized trees. The prediction score, the true skill statistic, was higher than 0.9 with a fully shuffled data set, which is higher than that for human forecasts. It was found that k-NN has the highest performance among the three algorithms. The ranking of the feature importance showed that previous flare activity is most effective, followed by the length of magnetic neutral lines, the unsigned magnetic flux, the area of UV brightening, and the time differentials of features over 24 hr, all of which are strongly correlated with the flux emergence dynamics in an AR.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishizuka, N.; Kubo, Y.; Den, M.
We developed a flare prediction model using machine learning, which is optimized to predict the maximum class of flares occurring in the following 24 hr. Machine learning is used to devise algorithms that can learn from and make decisions on a huge amount of data. We used solar observation data during the period 2010–2015, such as vector magnetograms, ultraviolet (UV) emission, and soft X-ray emission taken by the Solar Dynamics Observatory and the Geostationary Operational Environmental Satellite . We detected active regions (ARs) from the full-disk magnetogram, from which ∼60 features were extracted with their time differentials, including magnetic neutralmore » lines, the current helicity, the UV brightening, and the flare history. After standardizing the feature database, we fully shuffled and randomly separated it into two for training and testing. To investigate which algorithm is best for flare prediction, we compared three machine-learning algorithms: the support vector machine, k-nearest neighbors (k-NN), and extremely randomized trees. The prediction score, the true skill statistic, was higher than 0.9 with a fully shuffled data set, which is higher than that for human forecasts. It was found that k-NN has the highest performance among the three algorithms. The ranking of the feature importance showed that previous flare activity is most effective, followed by the length of magnetic neutral lines, the unsigned magnetic flux, the area of UV brightening, and the time differentials of features over 24 hr, all of which are strongly correlated with the flux emergence dynamics in an AR.« less
NASA Astrophysics Data System (ADS)
Georganos, Stefanos; Grippa, Tais; Vanhuysse, Sabine; Lennert, Moritz; Shimoni, Michal; Wolff, Eléonore
2017-10-01
This study evaluates the impact of three Feature Selection (FS) algorithms in an Object Based Image Analysis (OBIA) framework for Very-High-Resolution (VHR) Land Use-Land Cover (LULC) classification. The three selected FS algorithms, Correlation Based Selection (CFS), Mean Decrease in Accuracy (MDA) and Random Forest (RF) based Recursive Feature Elimination (RFE), were tested on Support Vector Machine (SVM), K-Nearest Neighbor, and Random Forest (RF) classifiers. The results demonstrate that the accuracy of SVM and KNN classifiers are the most sensitive to FS. The RF appeared to be more robust to high dimensionality, although a significant increase in accuracy was found by using the RFE method. In terms of classification accuracy, SVM performed the best using FS, followed by RF and KNN. Finally, only a small number of features is needed to achieve the highest performance using each classifier. This study emphasizes the benefits of rigorous FS for maximizing performance, as well as for minimizing model complexity and interpretation.
NASA Technical Reports Server (NTRS)
Bokhari, S. H.; Raza, A. D.
1984-01-01
Three methods of augmenting computer networks by adding at most one link per processor are discussed: (1) A tree of N nodes may be augmented such that the resulting graph has diameter no greater than 4log sub 2((N+2)/3)-2. Thi O(N(3)) algorithm can be applied to any spanning tree of a connected graph to reduce the diameter of that graph to O(log N); (2) Given a binary tree T and a chain C of N nodes each, C may be augmented to produce C so that T is a subgraph of C. This algorithm is O(N) and may be used to produce augmented chains or rings that have diameter no greater than 2log sub 2((N+2)/3) and are planar; (3) Any rectangular two-dimensional 4 (8) nearest neighbor array of size N = 2(k) may be augmented so that it can emulate a single step shuffle-exchange network of size N/2 in 3(t) time steps.
Improving RNA nearest neighbor parameters for helices by going beyond the two-state model.
Spasic, Aleksandar; Berger, Kyle D; Chen, Jonathan L; Seetin, Matthew G; Turner, Douglas H; Mathews, David H
2018-06-01
RNA folding free energy change nearest neighbor parameters are widely used to predict folding stabilities of secondary structures. They were determined by linear regression to datasets of optical melting experiments on small model systems. Traditionally, the optical melting experiments are analyzed assuming a two-state model, i.e. a structure is either complete or denatured. Experimental evidence, however, shows that structures exist in an ensemble of conformations. Partition functions calculated with existing nearest neighbor parameters predict that secondary structures can be partially denatured, which also directly conflicts with the two-state model. Here, a new approach for determining RNA nearest neighbor parameters is presented. Available optical melting data for 34 Watson-Crick helices were fit directly to a partition function model that allows an ensemble of conformations. Fitting parameters were the enthalpy and entropy changes for helix initiation, terminal AU pairs, stacks of Watson-Crick pairs and disordered internal loops. The resulting set of nearest neighbor parameters shows a 38.5% improvement in the sum of residuals in fitting the experimental melting curves compared to the current literature set.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shimo-Oka, T.; Miwa, S.; Suzuki, Y.
2015-04-13
Individual nuclear spins in diamond can be optically detected through hyperfine couplings with the electron spin of a single nitrogen-vacancy (NV) center; such nuclear spins have outstandingly long coherence times. Among the hyperfine couplings in the NV center, the nearest neighbor {sup 13}C nuclear spins have the largest coupling strength. Nearest neighbor {sup 13}C nuclear spins have the potential to perform fastest gate operations, providing highest fidelity in quantum computing. Herein, we report on the control of coherences in the NV center where all three nearest neighbor carbons are of the {sup 13}C isotope. Coherence among the three and fourmore » qubits are generated and analyzed at room temperature.« less
NASA Astrophysics Data System (ADS)
Bentz, Jonathan L.; Kozak, John J.; Nicolis, Gregoire
2005-08-01
The influence of non-nearest-neighbor displacements on the efficiency of diffusion-reaction processes involving one and two mobile diffusing reactants is studied. An exact analytic result is given for dimension d=1 from which, for large lattices, one can recover the asymptotic estimate reported 30 years ago by Lakatos-Lindenberg and Shuler. For dimensions d=2,3 we present numerically exact values for the mean time to reaction, as gauged by the mean walklength before reactive encounter, obtained via the theory of finite Markov processes and supported by Monte Carlo simulations. Qualitatively different results are found between processes occurring on d=1 versus d>1 lattices, and between results obtained assuming nearest-neighbor (only) versus non-nearest-neighbor displacements.
NASA Astrophysics Data System (ADS)
Schmalz, M.; Ritter, G.; Key, R.
Accurate and computationally efficient spectral signature classification is a crucial step in the nonimaging detection and recognition of spaceborne objects. In classical hyperspectral recognition applications using linear mixing models, signature classification accuracy depends on accurate spectral endmember discrimination [1]. If the endmembers cannot be classified correctly, then the signatures cannot be classified correctly, and object recognition from hyperspectral data will be inaccurate. In practice, the number of endmembers accurately classified often depends linearly on the number of inputs. This can lead to potentially severe classification errors in the presence of noise or densely interleaved signatures. In this paper, we present an comparison of emerging technologies for nonimaging spectral signature classfication based on a highly accurate, efficient search engine called Tabular Nearest Neighbor Encoding (TNE) [3,4] and a neural network technology called Morphological Neural Networks (MNNs) [5]. Based on prior results, TNE can optimize its classifier performance to track input nonergodicities, as well as yield measures of confidence or caution for evaluation of classification results. Unlike neural networks, TNE does not have a hidden intermediate data structure (e.g., the neural net weight matrix). Instead, TNE generates and exploits a user-accessible data structure called the agreement map (AM), which can be manipulated by Boolean logic operations to effect accurate classifier refinement algorithms. The open architecture and programmability of TNE's agreement map processing allows a TNE programmer or user to determine classification accuracy, as well as characterize in detail the signatures for which TNE did not obtain classification matches, and why such mis-matches occurred. In this study, we will compare TNE and MNN based endmember classification, using performance metrics such as probability of correct classification (Pd) and rate of false detections (Rfa). As proof of principle, we analyze classification of multiple closely spaced signatures from a NASA database of space material signatures. Additional analysis pertains to computational complexity and noise sensitivity, which are superior to Bayesian techniques based on classical neural networks. [1] Winter, M.E. "Fast autonomous spectral end-member determination in hyperspectral data," in Proceedings of the 13th International Conference On Applied Geologic Remote Sensing, Vancouver, B.C., Canada, pp. 337-44 (1999). [2] N. Keshava, "A survey of spectral unmixing algorithms," Lincoln Laboratory Journal 14:55-78 (2003). [3] Key, G., M.S. SCHMALZ, F.M. Caimi, and G.X. Ritter. "Performance analysis of tabular nearest neighbor encoding algorithm for joint compression and ATR", in Proceedings SPIE 3814:115-126 (1999). [4] Schmalz, M.S. and G. Key. "Algorithms for hyperspectral signature classification in unresolved object detection using tabular nearest neighbor encoding" in Proceedings of the 2007 AMOS Conference, Maui HI (2007). [5] Ritter, G.X., G. Urcid, and M.S. Schmalz. "Autonomous single-pass endmember approximation using lattice auto-associative memories", Neurocomputing (Elsevier), accepted (June 2008).
Extended magnetic exchange interactions in the high-temperature ferromagnet MnBi
Christianson, Andrew D.; Hahn, Steven E.; Fishman, Randy Scott; ...
2016-05-09
Here, the high-temperature ferromagnet MnBi continues to receive attention as a candidate to replace rare-earth-containing permanent magnets in applications above room temperature. This is due to a high Curie temperature, large magnetic moments, and a coercivity that increases with temperature. The synthesis of MnBi also allows for crystals that are free of interstitial Mn, enabling more direct access to the key interactions underlying the physical properties of binary Mn-based ferromagnets. In this work, we use inelastic neutron scattering to measure the spin waves of MnBi in order to characterize the magnetic exchange at low temperature. Consistent with the spin reorientationmore » that occurs below 140~K, we do not observe a spin gap in this system above our experimental resolution. A Heisenberg model was fit to the spin wave data in order to characterize the long-range nature of the exchange. It was found that interactions up to sixth nearest neighbor are required to fully parameterize the spin waves. Surprisingly, the nearest-neighbor term is antiferromagnetic, and the realization of a ferromagnetic ground state relies on the more numerous ferromagnetic terms beyond nearest neighbor, suggesting that the ferromagnetic ground state arises as a consequence of the long-ranged interactions in the system.« less
NASA Astrophysics Data System (ADS)
Mathavan, Senthan; Kumar, Akash; Kamal, Khurram; Nieminen, Michael; Shah, Hitesh; Rahman, Mujib
2016-09-01
Thousands of pavement images are collected by road authorities daily for condition monitoring surveys. These images typically have intensity variations and texture nonuniformities that make their segmentation challenging. The automated segmentation of such pavement images is crucial for accurate, thorough, and expedited health monitoring of roads. In the pavement monitoring area, well-known texture descriptors, such as gray-level co-occurrence matrices and local binary patterns, are often used for surface segmentation and identification. These, despite being the established methods for texture discrimination, are inherently slow. This work evaluates Laws texture energy measures as a viable alternative for pavement images for the first time. k-means clustering is used to partition the feature space, limiting the human subjectivity in the process. Data classification, hence image segmentation, is performed by the k-nearest neighbor method. Laws texture energy masks are shown to perform well with resulting accuracy and precision values of more than 80%. The implementations of the algorithm, in both MATLAB® and OpenCV/C++, are extensively compared against the state of the art for execution speed, clearly showing the advantages of the proposed method. Furthermore, the OpenCV-based segmentation shows a 100% increase in processing speed when compared to the fastest algorithm available in literature.
Identification of Anisomerous Motor Imagery EEG Signals Based on Complex Algorithms
Zhang, Zhiwen; Duan, Feng; Zhou, Xin; Meng, Zixuan
2017-01-01
Motor imagery (MI) electroencephalograph (EEG) signals are widely applied in brain-computer interface (BCI). However, classified MI states are limited, and their classification accuracy rates are low because of the characteristics of nonlinearity and nonstationarity. This study proposes a novel MI pattern recognition system that is based on complex algorithms for classifying MI EEG signals. In electrooculogram (EOG) artifact preprocessing, band-pass filtering is performed to obtain the frequency band of MI-related signals, and then, canonical correlation analysis (CCA) combined with wavelet threshold denoising (WTD) is used for EOG artifact preprocessing. We propose a regularized common spatial pattern (R-CSP) algorithm for EEG feature extraction by incorporating the principle of generic learning. A new classifier combining the K-nearest neighbor (KNN) and support vector machine (SVM) approaches is used to classify four anisomerous states, namely, imaginary movements with the left hand, right foot, and right shoulder and the resting state. The highest classification accuracy rate is 92.5%, and the average classification accuracy rate is 87%. The proposed complex algorithm identification method can significantly improve the identification rate of the minority samples and the overall classification performance. PMID:28874909
Spatial Statistics for Tumor Cell Counting and Classification
NASA Astrophysics Data System (ADS)
Wirjadi, Oliver; Kim, Yoo-Jin; Breuel, Thomas
To count and classify cells in histological sections is a standard task in histology. One example is the grading of meningiomas, benign tumors of the meninges, which requires to assess the fraction of proliferating cells in an image. As this process is very time consuming when performed manually, automation is required. To address such problems, we propose a novel application of Markov point process methods in computer vision, leading to algorithms for computing the locations of circular objects in images. In contrast to previous algorithms using such spatial statistics methods in image analysis, the present one is fully trainable. This is achieved by combining point process methods with statistical classifiers. Using simulated data, the method proposed in this paper will be shown to be more accurate and more robust to noise than standard image processing methods. On the publicly available SIMCEP benchmark for cell image analysis algorithms, the cell count performance of the present paper is significantly more accurate than results published elsewhere, especially when cells form dense clusters. Furthermore, the proposed system performs as well as a state-of-the-art algorithm for the computer-aided histological grading of meningiomas when combined with a simple k-nearest neighbor classifier for identifying proliferating cells.
Pattern Recognition for a Flight Dynamics Monte Carlo Simulation
NASA Technical Reports Server (NTRS)
Restrepo, Carolina; Hurtado, John E.
2011-01-01
The design, analysis, and verification and validation of a spacecraft relies heavily on Monte Carlo simulations. Modern computational techniques are able to generate large amounts of Monte Carlo data but flight dynamics engineers lack the time and resources to analyze it all. The growing amounts of data combined with the diminished available time of engineers motivates the need to automate the analysis process. Pattern recognition algorithms are an innovative way of analyzing flight dynamics data efficiently. They can search large data sets for specific patterns and highlight critical variables so analysts can focus their analysis efforts. This work combines a few tractable pattern recognition algorithms with basic flight dynamics concepts to build a practical analysis tool for Monte Carlo simulations. Current results show that this tool can quickly and automatically identify individual design parameters, and most importantly, specific combinations of parameters that should be avoided in order to prevent specific system failures. The current version uses a kernel density estimation algorithm and a sequential feature selection algorithm combined with a k-nearest neighbor classifier to find and rank important design parameters. This provides an increased level of confidence in the analysis and saves a significant amount of time.
Chaotic Particle Swarm Optimization with Mutation for Classification
Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza
2015-01-01
In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms. PMID:25709937
Technique for fast and efficient hierarchical clustering
Stork, Christopher
2013-10-08
A fast and efficient technique for hierarchical clustering of samples in a dataset includes compressing the dataset to reduce a number of variables within each of the samples of the dataset. A nearest neighbor matrix is generated to identify nearest neighbor pairs between the samples based on differences between the variables of the samples. The samples are arranged into a hierarchy that groups the samples based on the nearest neighbor matrix. The hierarchy is rendered to a display to graphically illustrate similarities or differences between the samples.
Correlations and analytical approaches to co-evolving voter models
NASA Astrophysics Data System (ADS)
Ji, M.; Xu, C.; Choi, C. W.; Hui, P. M.
2013-11-01
The difficulty in formulating analytical treatments in co-evolving networks is studied in light of the Vazquez-Eguíluz-San Miguel voter model (VM) and a modified VM (MVM) that introduces a random mutation of the opinion as a noise in the VM. The density of active links, which are links that connect the nodes of opposite opinions, is shown to be highly sensitive to both the degree k of a node and the active links n among the neighbors of a node. We test the validity in the formalism of analytical approaches and show explicitly that the assumptions behind the commonly used homogeneous pair approximation scheme in formulating a mean-field theory are the source of the theory's failure due to the strong correlations between k, n and n2. An improved approach that incorporates spatial correlation to the nearest-neighbors explicitly and a random approximation for the next-nearest neighbors is formulated for the VM and the MVM, and it gives better agreement with the simulation results. We introduce an empirical approach that quantifies the correlations more accurately and gives results in good agreement with the simulation results. The work clarifies why simply mean-field theory fails and sheds light on how to analyze the correlations in the dynamic equations that are often generated in co-evolving processes.
Medical Image Retrieval Using Multi-Texton Assignment.
Tang, Qiling; Yang, Jirong; Xia, Xianfu
2018-02-01
In this paper, we present a multi-texton representation method for medical image retrieval, which utilizes the locality constraint to encode each filter bank response within its local-coordinate system consisting of the k nearest neighbors in texton dictionary and subsequently employs spatial pyramid matching technique to implement feature vector representation. Comparison with the traditional nearest neighbor assignment followed by texton histogram statistics method, our strategies reduce the quantization errors in mapping process and add information about the spatial layout of texton distributions and, thus, increase the descriptive power of the image representation. We investigate the effects of different parameters on system performance in order to choose the appropriate ones for our datasets and carry out experiments on the IRMA-2009 medical collection and the mammographic patch dataset. The extensive experimental results demonstrate that the proposed method has superior performance.
The Effective Resistance of the -Cycle Graph with Four Nearest Neighbors
NASA Astrophysics Data System (ADS)
Chair, Noureddine
2014-02-01
The exact expression for the effective resistance between any two vertices of the -cycle graph with four nearest neighbors , is given. It turns out that this expression is written in terms of the effective resistance of the -cycle graph , the square of the Fibonacci numbers, and the bisected Fibonacci numbers. As a consequence closed form formulas for the total effective resistance, the first passage time, and the mean first passage time for the simple random walk on the the -cycle graph with four nearest neighbors are obtained. Finally, a closed form formula for the effective resistance of with all first neighbors removed is obtained.
Estimating forest attribute parameters for small areas using nearest neighbors techniques
Ronald E. McRoberts
2012-01-01
Nearest neighbors techniques have become extremely popular, particularly for use with forest inventory data. With these techniques, a population unit prediction is calculated as a linear combination of observations for a selected number of population units in a sample that are most similar, or nearest, in a space of ancillary variables to the population unit requiring...
A triboelectric motion sensor in wearable body sensor network for human activity recognition.
Hui Huang; Xian Li; Ye Sun
2016-08-01
The goal of this study is to design a novel triboelectric motion sensor in wearable body sensor network for human activity recognition. Physical activity recognition is widely used in well-being management, medical diagnosis and rehabilitation. Other than traditional accelerometers, we design a novel wearable sensor system based on triboelectrification. The triboelectric motion sensor can be easily attached to human body and collect motion signals caused by physical activities. The experiments are conducted to collect five common activity data: sitting and standing, walking, climbing upstairs, downstairs, and running. The k-Nearest Neighbor (kNN) clustering algorithm is adopted to recognize these activities and validate the feasibility of this new approach. The results show that our system can perform physical activity recognition with a successful rate over 80% for walking, sitting and standing. The triboelectric structure can also be used as an energy harvester for motion harvesting due to its high output voltage in random low-frequency motion.
Multi-texture local ternary pattern for face recognition
NASA Astrophysics Data System (ADS)
Essa, Almabrok; Asari, Vijayan
2017-05-01
In imagery and pattern analysis domain a variety of descriptors have been proposed and employed for different computer vision applications like face detection and recognition. Many of them are affected under different conditions during the image acquisition process such as variations in illumination and presence of noise, because they totally rely on the image intensity values to encode the image information. To overcome these problems, a novel technique named Multi-Texture Local Ternary Pattern (MTLTP) is proposed in this paper. MTLTP combines the edges and corners based on the local ternary pattern strategy to extract the local texture features of the input image. Then returns a spatial histogram feature vector which is the descriptor for each image that we use to recognize a human being. Experimental results using a k-nearest neighbors classifier (k-NN) on two publicly available datasets justify our algorithm for efficient face recognition in the presence of extreme variations of illumination/lighting environments and slight variation of pose conditions.
Georgouli, Konstantia; Martinez Del Rincon, Jesus; Koidis, Anastasios
2017-02-15
The main objective of this work was to develop a novel dimensionality reduction technique as a part of an integrated pattern recognition solution capable of identifying adulterants such as hazelnut oil in extra virgin olive oil at low percentages based on spectroscopic chemical fingerprints. A novel Continuous Locality Preserving Projections (CLPP) technique is proposed which allows the modelling of the continuous nature of the produced in-house admixtures as data series instead of discrete points. The maintenance of the continuous structure of the data manifold enables the better visualisation of this examined classification problem and facilitates the more accurate utilisation of the manifold for detecting the adulterants. The performance of the proposed technique is validated with two different spectroscopic techniques (Raman and Fourier transform infrared, FT-IR). In all cases studied, CLPP accompanied by k-Nearest Neighbors (kNN) algorithm was found to outperform any other state-of-the-art pattern recognition techniques. Copyright © 2016 Elsevier Ltd. All rights reserved.
Unexpected magnetism, and transport properties in mixed lanthanide compound
NASA Astrophysics Data System (ADS)
Pathak, Arjun; Gschneidner, Karl, Jr.; Pecharsky, Vitalij; Ames Laboratory Team
For intelligent materials design it is desirable to have compounds which have multiple functionalities such as a large magnetoresistance, ferromagnetic and ferrimagnetic states, and field-induced first-order metamagnetic transitions. Here, we discuss one such example where we have combined two lanthanide elements Pr and Er in Pr0.6Er0.4Al2. This compound exhibits multiple functionalities in magnetic fields between 1 and 40 kOe. It undergoes only a trivial ferrimagnetism to paramagnetism transition in a zero magnetic field, but Pr0.6Er0.4Al2 exhibits a large positive magnetoresistance (MR) for H >=40 kOe, a small but non negligible negative MR for H <=30 kOe, and a clear Griffiths-like phase behavior at <1 kOe. The compound also exhibits an asymmetry of hysteresis loop, or exchange bias (EB) effect after field cooling from the paramagnetic state. These phenomena are attributed to the competition between single-ion anisotropies of Pr and Er ions coupled with the opposite nearest-neighbor and next-nearest-neighbor exchange interactions. This work was supported by the US Department of Energy, Office of Basic Energy Science, Division of Material Sciences and Engineering. The research was performed at the Ames Laboratory. The Ames Laboratory is operated by Iowa State University for the US D.
NASA Technical Reports Server (NTRS)
Jayroe, R. R., Jr.
1976-01-01
Geographical correction effects on LANDSAT image data are identified, using the nearest neighbor, bilinear interpolation and bicubic interpolation techniques. Potential impacts of registration on image compression and classification are explored.
Vercoutere, Wenonah A.; Winters-Hilt, Stephen; DeGuzman, Veronica S.; Deamer, David; Ridino, Sam E.; Rodgers, Joseph T.; Olsen, Hugh E.; Marziali, Andre; Akeson, Mark
2003-01-01
Nanoscale α-hemolysin pores can be used to analyze individual DNA or RNA molecules. Serial examination of hundreds to thousands of molecules per minute is possible using ionic current impedance as the measured property. In a recent report, we showed that a nanopore device coupled with machine learning algorithms could automatically discriminate among the four combinations of Watson–Crick base pairs and their orientations at the ends of individual DNA hairpin molecules. Here we use kinetic analysis to demonstrate that ionic current signatures caused by these hairpin molecules depend on the number of hydrogen bonds within the terminal base pair, stacking between the terminal base pair and its nearest neighbor, and 5′ versus 3′ orientation of the terminal bases independent of their nearest neighbors. This report constitutes evidence that single Watson–Crick base pairs can be identified within individual unmodified DNA hairpin molecules based on their dynamic behavior in a nanoscale pore. PMID:12582251
Nearest neighbor 3D segmentation with context features
NASA Astrophysics Data System (ADS)
Hristova, Evelin; Schulz, Heinrich; Brosch, Tom; Heinrich, Mattias P.; Nickisch, Hannes
2018-03-01
Automated and fast multi-label segmentation of medical images is challenging and clinically important. This paper builds upon a supervised machine learning framework that uses training data sets with dense organ annotations and vantage point trees to classify voxels in unseen images based on similarity of binary feature vectors extracted from the data. Without explicit model knowledge, the algorithm is applicable to different modalities and organs, and achieves high accuracy. The method is successfully tested on 70 abdominal CT and 42 pelvic MR images. With respect to ground truth, an average Dice overlap score of 0.76 for the CT segmentation of liver, spleen and kidneys is achieved. The mean score for the MR delineation of bladder, bones, prostate and rectum is 0.65. Additionally, we benchmark several variations of the main components of the method and reduce the computation time by up to 47% without significant loss of accuracy. The segmentation results are - for a nearest neighbor method - surprisingly accurate, robust as well as data and time efficient.
NASA Astrophysics Data System (ADS)
Chernick, Julian A.; Perlovsky, Leonid I.; Tye, David M.
1994-06-01
This paper describes applications of maximum likelihood adaptive neural system (MLANS) to the characterization of clutter in IR images and to the identification of targets. The characterization of image clutter is needed to improve target detection and to enhance the ability to compare performance of different algorithms using diverse imagery data. Enhanced unambiguous IFF is important for fratricide reduction while automatic cueing and targeting is becoming an ever increasing part of operations. We utilized MLANS which is a parametric neural network that combines optimal statistical techniques with a model-based approach. This paper shows that MLANS outperforms classical classifiers, the quadratic classifier and the nearest neighbor classifier, because on the one hand it is not limited to the usual Gaussian distribution assumption and can adapt in real time to the image clutter distribution; on the other hand MLANS learns from fewer samples and is more robust than the nearest neighbor classifiers. Future research will address uncooperative IFF using fused IR and MMW data.
USDA-ARS?s Scientific Manuscript database
Mitochondria are essential subcellular organelles found in eukaryotic cells. Knowing information on a protein’s subcellular or sub subcellular location provides in-depth insights about the microenvironment where it interacts with other molecules and is crucial for inferring the protein’s function. T...
Emotion recognition from multichannel EEG signals using K-nearest neighbor classification.
Li, Mi; Xu, Hongpei; Liu, Xingwang; Lu, Shengfu
2018-04-27
Many studies have been done on the emotion recognition based on multi-channel electroencephalogram (EEG) signals. This paper explores the influence of the emotion recognition accuracy of EEG signals in different frequency bands and different number of channels. We classified the emotional states in the valence and arousal dimensions using different combinations of EEG channels. Firstly, DEAP default preprocessed data were normalized. Next, EEG signals were divided into four frequency bands using discrete wavelet transform, and entropy and energy were calculated as features of K-nearest neighbor Classifier. The classification accuracies of the 10, 14, 18 and 32 EEG channels based on the Gamma frequency band were 89.54%, 92.28%, 93.72% and 95.70% in the valence dimension and 89.81%, 92.24%, 93.69% and 95.69% in the arousal dimension. As the number of channels increases, the classification accuracy of emotional states also increases, the classification accuracy of the gamma frequency band is greater than that of the beta frequency band followed by the alpha and theta frequency bands. This paper provided better frequency bands and channels reference for emotion recognition based on EEG.
Kumar, Sameer; Heidelberger, Philip; Chen, Dong; Hines, Michael
2010-04-19
We explore the multisend interface as a data mover interface to optimize applications with neighborhood collective communication operations. One of the limitations of the current MPI 2.1 standard is that the vector collective calls require counts and displacements (zero and nonzero bytes) to be specified for all the processors in the communicator. Further, all the collective calls in MPI 2.1 are blocking and do not permit overlap of communication with computation. We present the record replay persistent optimization to the multisend interface that minimizes the processor overhead of initiating the collective. We present four different case studies with the multisend API on Blue Gene/P (i) 3D-FFT, (ii) 4D nearest neighbor exchange as used in Quantum Chromodynamics, (iii) NAMD and (iv) neural network simulator NEURON. Performance results show 1.9× speedup with 32(3) 3D-FFTs, 1.9× speedup for 4D nearest neighbor exchange with the 2(4) problem, 1.6× speedup in NAMD and almost 3× speedup in NEURON with 256K cells and 1k connections/cell.
A floor-map-aided WiFi/pseudo-odometry integration algorithm for an indoor positioning system.
Wang, Jian; Hu, Andong; Liu, Chunyan; Li, Xin
2015-03-24
This paper proposes a scheme for indoor positioning by fusing floor map, WiFi and smartphone sensor data to provide meter-level positioning without additional infrastructure. A topology-constrained K nearest neighbor (KNN) algorithm based on a floor map layout provides the coordinates required to integrate WiFi data with pseudo-odometry (P-O) measurements simulated using a pedestrian dead reckoning (PDR) approach. One method of further improving the positioning accuracy is to use a more effective multi-threshold step detection algorithm, as proposed by the authors. The "go and back" phenomenon caused by incorrect matching of the reference points (RPs) of a WiFi algorithm is eliminated using an adaptive fading-factor-based extended Kalman filter (EKF), taking WiFi positioning coordinates, P-O measurements and fused heading angles as observations. The "cross-wall" problem is solved based on the development of a floor-map-aided particle filter algorithm by weighting the particles, thereby also eliminating the gross-error effects originating from WiFi or P-O measurements. The performance observed in a field experiment performed on the fourth floor of the School of Environmental Science and Spatial Informatics (SESSI) building on the China University of Mining and Technology (CUMT) campus confirms that the proposed scheme can reliably achieve meter-level positioning.
Using machine learning algorithms to guide rehabilitation planning for home care clients.
Zhu, Mu; Zhang, Zhanyang; Hirdes, John P; Stolee, Paul
2007-12-20
Targeting older clients for rehabilitation is a clinical challenge and a research priority. We investigate the potential of machine learning algorithms - Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) - to guide rehabilitation planning for home care clients. This study is a secondary analysis of data on 24,724 longer-term clients from eight home care programs in Ontario. Data were collected with the RAI-HC assessment system, in which the Activities of Daily Living Clinical Assessment Protocol (ADLCAP) is used to identify clients with rehabilitation potential. For study purposes, a client is defined as having rehabilitation potential if there was: i) improvement in ADL functioning, or ii) discharge home. SVM and KNN results are compared with those obtained using the ADLCAP. For comparison, the machine learning algorithms use the same functional and health status indicators as the ADLCAP. The KNN and SVM algorithms achieved similar substantially improved performance over the ADLCAP, although false positive and false negative rates were still fairly high (FP > .18, FN > .34 versus FP > .29, FN. > .58 for ADLCAP). Results are used to suggest potential revisions to the ADLCAP. Machine learning algorithms achieved superior predictions than the current protocol. Machine learning results are less readily interpretable, but can also be used to guide development of improved clinical protocols.
Seleson, Pablo; Du, Qiang; Parks, Michael L.
2016-08-16
The peridynamic theory of solid mechanics is a nonlocal reformulation of the classical continuum mechanics theory. At the continuum level, it has been demonstrated that classical (local) elasticity is a special case of peridynamics. Such a connection between these theories has not been extensively explored at the discrete level. This paper investigates the consistency between nearest-neighbor discretizations of linear elastic peridynamic models and finite difference discretizations of the Navier–Cauchy equation of classical elasticity. While nearest-neighbor discretizations in peridynamics have been numerically observed to present grid-dependent crack paths or spurious microcracks, this paper focuses on a different, analytical aspect of suchmore » discretizations. We demonstrate that, even in the absence of cracks, such discretizations may be problematic unless a proper selection of weights is used. Specifically, we demonstrate that using the standard meshfree approach in peridynamics, nearest-neighbor discretizations do not reduce, in general, to discretizations of corresponding classical models. We study nodal-based quadratures for the discretization of peridynamic models, and we derive quadrature weights that result in consistency between nearest-neighbor discretizations of peridynamic models and discretized classical models. The quadrature weights that lead to such consistency are, however, model-/discretization-dependent. We motivate the choice of those quadrature weights through a quadratic approximation of displacement fields. The stability of nearest-neighbor peridynamic schemes is demonstrated through a Fourier mode analysis. Finally, an approach based on a normalization of peridynamic constitutive constants at the discrete level is explored. This approach results in the desired consistency for one-dimensional models, but does not work in higher dimensions. The results of the work presented in this paper suggest that even though nearest-neighbor discretizations should be avoided in peridynamic simulations involving cracks, such discretizations are viable, for example for verification or validation purposes, in problems characterized by smooth deformations. Furthermore, we demonstrate that better quadrature rules in peridynamics can be obtained based on the functional form of solutions.« less
Competitive code-based fast palmprint identification using a set of cover trees
NASA Astrophysics Data System (ADS)
Yue, Feng; Zuo, Wangmeng; Zhang, David; Wang, Kuanquan
2009-06-01
A palmprint identification system recognizes a query palmprint image by searching for its nearest neighbor from among all the templates in a database. When applied on a large-scale identification system, it is often necessary to speed up the nearest-neighbor searching process. We use competitive code, which has very fast feature extraction and matching speed, for palmprint identification. To speed up the identification process, we extend the cover tree method and propose to use a set of cover trees to facilitate the fast and accurate nearest-neighbor searching. We can use the cover tree method because, as we show, the angular distance used in competitive code can be decomposed into a set of metrics. Using the Hong Kong PolyU palmprint database (version 2) and a large-scale palmprint database, our experimental results show that the proposed method searches for nearest neighbors faster than brute force searching.
Detecting the Difficulty Level of Foreign Language Texts
2010-02-01
continuous tenses), as well as part- of- speech labels for words. The authors used a k-Nearest Neighbor ( kNN ) classifier (Cover and Hart, 1967; Mitchell, 1997...anticipate, and influence these situations and to operate in them is found in foreign language speech and text. For this reason, military linguists are...the language model system, LGR is the prediction of one of the grammar-based classifiers, and CkNN is a confidence value of the kNN prediction for the
Spatial patterns in vegetation fires in the Indian region.
Vadrevu, Krishna Prasad; Badarinath, K V S; Anuradha, Eaturu
2008-12-01
In this study, we used fire count datasets derived from Along Track Scanning Radiometer (ATSR) satellite to characterize spatial patterns in fire occurrences across highly diverse geographical, vegetation and topographic gradients in the Indian region. For characterizing the spatial patterns of fire occurrences, observed fire point patterns were tested against the hypothesis of a complete spatial random (CSR) pattern using three different techniques, the quadrat analysis, nearest neighbor analysis and Ripley's K function. Hierarchical nearest neighboring technique was used to depict the 'hotspots' of fire incidents. Of the different states, highest fire counts were recorded in Madhya Pradesh (14.77%) followed by Gujarat (10.86%), Maharastra (9.92%), Mizoram (7.66%), Jharkhand (6.41%), etc. With respect to the vegetation categories, highest number of fires were recorded in agricultural regions (40.26%) followed by tropical moist deciduous vegetation (12.72), dry deciduous vegetation (11.40%), abandoned slash and burn secondary forests (9.04%), tropical montane forests (8.07%) followed by others. Analysis of fire counts based on elevation and slope range suggested that maximum number of fires occurred in low and medium elevation types and in very low to low-slope categories. Results from three different spatial techniques for spatial pattern suggested clustered pattern in fire events compared to CSR. Most importantly, results from Ripley's K statistic suggested that fire events are highly clustered at a lag-distance of 125 miles. Hierarchical nearest neighboring clustering technique identified significant clusters of fire 'hotspots' in different states in northeast and central India. The implications of these results in fire management and mitigation were discussed. Also, this study highlights the potential of spatial point pattern statistics in environmental monitoring and assessment studies with special reference to fire events in the Indian region.
Band structure and orbital character of monolayer MoS2 with eleven-band tight-binding model
NASA Astrophysics Data System (ADS)
Shahriari, Majid; Ghalambor Dezfuli, Abdolmohammad; Sabaeian, Mohammad
2018-02-01
In this paper, based on a tight-binding (TB) model, first we present the calculations of eigenvalues as band structure and then present the eigenvectors as probability amplitude for finding electron in atomic orbitals for monolayer MoS2 in the first Brillouin zone. In these calculations we are considering hopping processes between the nearest-neighbor Mo-S, the next nearest-neighbor in-plan Mo-Mo, and the next nearest-neighbor in-plan and out-of-plan S-S atoms in a three-atom based unit cell of two-dimensional rhombic MoS2. The hopping integrals have been solved in terms of Slater-Koster and crystal field parameters. These parameters are calculated by comparing TB model with the density function theory (DFT) in the high-symmetry k-points (i.e. the K- and Γ-points). In our TB model all the 4d Mo orbitals and the 3p S orbitals are considered and detailed analysis of the orbital character of each energy level at the main high-symmetry points of the Brillouin zone is described. In comparison with DFT calculations, our results of TB model show a very good agreement for bands near the Fermi level. However for other bands which are far from the Fermi level, some discrepancies between our TB model and DFT calculations are observed. Upon the accuracy of Slater-Koster and crystal field parameters, on the contrary of DFT, our model provide enough accuracy to calculate all allowed transitions between energy bands that are very crucial for investigating the linear and nonlinear optical properties of monolayer MoS2.
Wang, ShaoPeng; Zhang, Yu-Hang; Lu, Jing; Cui, Weiren; Hu, Jerry; Cai, Yu-Dong
2016-01-01
The development of biochemistry and molecular biology has revealed an increasingly important role of compounds in several biological processes. Like the aptamer-protein interaction, aptamer-compound interaction attracts increasing attention. However, it is time-consuming to select proper aptamers against compounds using traditional methods, such as exponential enrichment. Thus, there is an urgent need to design effective computational methods for searching effective aptamers against compounds. This study attempted to extract important features for aptamer-compound interactions using feature selection methods, such as Maximum Relevance Minimum Redundancy, as well as incremental feature selection. Each aptamer-compound pair was represented by properties derived from the aptamer and compound, including frequencies of single nucleotides and dinucleotides for the aptamer, as well as the constitutional, electrostatic, quantum-chemical, and space conformational descriptors of the compounds. As a result, some important features were obtained. To confirm the importance of the obtained features, we further discussed the associations between them and aptamer-compound interactions. Simultaneously, an optimal prediction model based on the nearest neighbor algorithm was built to identify aptamer-compound interactions, which has the potential to be a useful tool for the identification of novel aptamer-compound interactions. The program is available upon the request. PMID:26955638
Wang, ShaoPeng; Zhang, Yu-Hang; Lu, Jing; Cui, Weiren; Hu, Jerry; Cai, Yu-Dong
2016-01-01
The development of biochemistry and molecular biology has revealed an increasingly important role of compounds in several biological processes. Like the aptamer-protein interaction, aptamer-compound interaction attracts increasing attention. However, it is time-consuming to select proper aptamers against compounds using traditional methods, such as exponential enrichment. Thus, there is an urgent need to design effective computational methods for searching effective aptamers against compounds. This study attempted to extract important features for aptamer-compound interactions using feature selection methods, such as Maximum Relevance Minimum Redundancy, as well as incremental feature selection. Each aptamer-compound pair was represented by properties derived from the aptamer and compound, including frequencies of single nucleotides and dinucleotides for the aptamer, as well as the constitutional, electrostatic, quantum-chemical, and space conformational descriptors of the compounds. As a result, some important features were obtained. To confirm the importance of the obtained features, we further discussed the associations between them and aptamer-compound interactions. Simultaneously, an optimal prediction model based on the nearest neighbor algorithm was built to identify aptamer-compound interactions, which has the potential to be a useful tool for the identification of novel aptamer-compound interactions. The program is available upon the request.
Chen, Wei; Wang, Weiping; Li, Qun; Chang, Qiang; Hou, Hongtao
2016-01-01
Indoor positioning based on existing Wi-Fi fingerprints is becoming more and more common. Unfortunately, the Wi-Fi fingerprint is susceptible to multiple path interferences, signal attenuation, and environmental changes, which leads to low accuracy. Meanwhile, with the recent advances in charge-coupled device (CCD) technologies and the processing speed of smartphones, indoor positioning using the optical camera on a smartphone has become an attractive research topic; however, the major challenge is its high computational complexity; as a result, real-time positioning cannot be achieved. In this paper we introduce a crowd-sourcing indoor localization algorithm via an optical camera and orientation sensor on a smartphone to address these issues. First, we use Wi-Fi fingerprint based on the K Weighted Nearest Neighbor (KWNN) algorithm to make a coarse estimation. Second, we adopt a mean-weighted exponent algorithm to fuse optical image features and orientation sensor data as well as KWNN in the smartphone to refine the result. Furthermore, a crowd-sourcing approach is utilized to update and supplement the positioning database. We perform several experiments comparing our approach with other positioning algorithms on a common smartphone to evaluate the performance of the proposed sensor-calibrated algorithm, and the results demonstrate that the proposed algorithm could significantly improve accuracy, stability, and applicability of positioning. PMID:27007379
Chen, Wei; Wang, Weiping; Li, Qun; Chang, Qiang; Hou, Hongtao
2016-03-19
Indoor positioning based on existing Wi-Fi fingerprints is becoming more and more common. Unfortunately, the Wi-Fi fingerprint is susceptible to multiple path interferences, signal attenuation, and environmental changes, which leads to low accuracy. Meanwhile, with the recent advances in charge-coupled device (CCD) technologies and the processing speed of smartphones, indoor positioning using the optical camera on a smartphone has become an attractive research topic; however, the major challenge is its high computational complexity; as a result, real-time positioning cannot be achieved. In this paper we introduce a crowd-sourcing indoor localization algorithm via an optical camera and orientation sensor on a smartphone to address these issues. First, we use Wi-Fi fingerprint based on the K Weighted Nearest Neighbor (KWNN) algorithm to make a coarse estimation. Second, we adopt a mean-weighted exponent algorithm to fuse optical image features and orientation sensor data as well as KWNN in the smartphone to refine the result. Furthermore, a crowd-sourcing approach is utilized to update and supplement the positioning database. We perform several experiments comparing our approach with other positioning algorithms on a common smartphone to evaluate the performance of the proposed sensor-calibrated algorithm, and the results demonstrate that the proposed algorithm could significantly improve accuracy, stability, and applicability of positioning.
Hussain, Lal
2018-06-01
Epilepsy is a neurological disorder produced due to abnormal excitability of neurons in the brain. The research reveals that brain activity is monitored through electroencephalogram (EEG) of patients suffered from seizure to detect the epileptic seizure. The performance of EEG detection based epilepsy require feature extracting strategies. In this research, we have extracted varying features extracting strategies based on time and frequency domain characteristics, nonlinear, wavelet based entropy and few statistical features. A deeper study was undertaken using novel machine learning classifiers by considering multiple factors. The support vector machine kernels are evaluated based on multiclass kernel and box constraint level. Likewise, for K-nearest neighbors (KNN), we computed the different distance metrics, Neighbor weights and Neighbors. Similarly, the decision trees we tuned the paramours based on maximum splits and split criteria and ensemble classifiers are evaluated based on different ensemble methods and learning rate. For training/testing tenfold Cross validation was employed and performance was evaluated in form of TPR, NPR, PPV, accuracy and AUC. In this research, a deeper analysis approach was performed using diverse features extracting strategies using robust machine learning classifiers with more advanced optimal options. Support Vector Machine linear kernel and KNN with City block distance metric give the overall highest accuracy of 99.5% which was higher than using the default parameters for these classifiers. Moreover, highest separation (AUC = 0.9991, 0.9990) were obtained at different kernel scales using SVM. Additionally, the K-nearest neighbors with inverse squared distance weight give higher performance at different Neighbors. Moreover, to distinguish the postictal heart rate oscillations from epileptic ictal subjects, and highest performance of 100% was obtained using different machine learning classifiers.
Performance-driven Multimodality Sensor Fusion
2012-01-23
in IEEE Intl Conf. on Acoust., Speech , Signal Processing, (Dallas), Mar. 2010. [10] K. Sricharan, R. Raich, and A. Hero III, “Boundary compensated knn ...nearest neighbor ( kNN ) plug-in estima- tors, we have developed a generally applicable theory that gives analytical closed-form expressions for asymptotic...Co-PI’s Raich and Hero and was published in the IEEE Proc. of 2011 Intl Conf. on Acoustics, Speech , and Signal Processing. 2.4 Dimension estimation in
Rezaee, Kh.; Azizi, E.; Haddadnia, J.
2016-01-01
Background Epilepsy is a severe disorder of the central nervous system that predisposes the person to recurrent seizures. Fifty million people worldwide suffer from epilepsy; after Alzheimer’s and stroke, it is the third widespread nervous disorder. Objective In this paper, an algorithm to detect the onset of epileptic seizures based on the analysis of brain electrical signals (EEG) has been proposed. 844 hours of EEG were recorded form 23 pediatric patients consecutively with 163 occurrences of seizures. Signals had been collected from Children’s Hospital Boston with a sampling frequency of 256 Hz through 18 channels in order to assess epilepsy surgery. By selecting effective features from seizure and non-seizure signals of each individual and putting them into two categories, the proposed algorithm detects the onset of seizures quickly and with high sensitivity. Method In this algorithm, L-sec epochs of signals are displayed in form of a third-order tensor in spatial, spectral and temporal spaces by applying wavelet transform. Then, after applying general tensor discriminant analysis (GTDA) on tensors and calculating mapping matrix, feature vectors are extracted. GTDA increases the sensitivity of the algorithm by storing data without deleting them. Finally, K-Nearest neighbors (KNN) is used to classify the selected features. Results The results of simulating algorithm on algorithm standard dataset shows that the algorithm is capable of detecting 98 percent of seizures with an average delay of 4.7 seconds and the average error rate detection of three errors in 24 hours. Conclusion Today, the lack of an automated system to detect or predict the seizure onset is strongly felt. PMID:27672628
Linear model for fast background subtraction in oligonucleotide microarrays.
Kroll, K Myriam; Barkema, Gerard T; Carlon, Enrico
2009-11-16
One important preprocessing step in the analysis of microarray data is background subtraction. In high-density oligonucleotide arrays this is recognized as a crucial step for the global performance of the data analysis from raw intensities to expression values. We propose here an algorithm for background estimation based on a model in which the cost function is quadratic in a set of fitting parameters such that minimization can be performed through linear algebra. The model incorporates two effects: 1) Correlated intensities between neighboring features in the chip and 2) sequence-dependent affinities for non-specific hybridization fitted by an extended nearest-neighbor model. The algorithm has been tested on 360 GeneChips from publicly available data of recent expression experiments. The algorithm is fast and accurate. Strong correlations between the fitted values for different experiments as well as between the free-energy parameters and their counterparts in aqueous solution indicate that the model captures a significant part of the underlying physical chemistry.
NASA Astrophysics Data System (ADS)
Ramazanov, M. K.; Murtazaev, A. K.; Magomedov, M. A.; Badiev, M. K.
2018-06-01
We study phase transitions and thermodynamic properties in the two-dimensional antiferromagnetic Ising model with next-nearest-neighbor interaction on a Kagomé lattice by Monte Carlo simulations. A histogram data analysis shows that a second-order transition occurs in the model. From the analysis of obtained data, we can assume that next-nearest-neighbor ferromagnetic interactions in two-dimensional antiferromagnetic Ising model on a Kagomé lattice excite the occurrence of a second-order transition and unusual behavior of thermodynamic properties on the temperature dependence.
Elliptic Painlevé equations from next-nearest-neighbor translations on the E_8^{(1)} lattice
NASA Astrophysics Data System (ADS)
Joshi, Nalini; Nakazono, Nobutaka
2017-07-01
The well known elliptic discrete Painlevé equation of Sakai is constructed by a standard translation on the E_8(1) lattice, given by nearest neighbor vectors. In this paper, we give a new elliptic discrete Painlevé equation obtained by translations along next-nearest-neighbor vectors. This equation is a generic (8-parameter) version of a 2-parameter elliptic difference equation found by reduction from Adler’s partial difference equation, the so-called Q4 equation. We also provide a projective reduction of the well known equation of Sakai.
NASA Astrophysics Data System (ADS)
Salman, S. S.; Abbas, W. A.
2018-05-01
The goal of the study is to support analysis Enhancement of Resolution and study effect on classification methods on bands spectral information of specific and quantitative approaches. In this study introduce a method to enhancement resolution Landsat 8 of combining the bands spectral of 30 meters resolution with panchromatic band 8 of 15 meters resolution, because of importance multispectral imagery to extracting land - cover. Classification methods used in this study to classify several lands -covers recorded from OLI- 8 imagery. Two methods of Data mining can be classified as either supervised or unsupervised. In supervised methods, there is a particular predefined target, that means the algorithm learn which values of the target are associated with which values of the predictor sample. K-nearest neighbors and maximum likelihood algorithms examine in this work as supervised methods. In other hand, no sample identified as target in unsupervised methods, the algorithm of data extraction searches for structure and patterns between all the variables, represented by Fuzzy C-mean clustering method as one of the unsupervised methods, NDVI vegetation index used to compare the results of classification method, the percent of dense vegetation in maximum likelihood method give a best results.
Creating Profiles from User Network Behavior
2013-09-01
We varied the m-estimate in Naïve Bayes, m for pruning in Learning Tree, and how many k nearest neighbors to select from in KNN, before settling on the...N. Taft, “The cubicle vs. the coffee shop: behavioral modes in enterprise end-users,” in Proc. of the 9th Int. Conf. on Passive and Active Network
Mark D. Nelson; Ronald E. McRoberts; Greg C. Liknes; Geoffrey R. Holden
2005-01-01
Landsat Thematic Mapper (TM) satellite imagery and Forest Inventory and Analysis (FIA) plot data were used to construct forest/nonforest maps of Mapping Zone 41, National Land Cover Dataset 2000 (NLCD 2000). Stratification approaches resulting from Maximum Likelihood, Fuzzy Convolution, Logistic Regression, and k-Nearest Neighbors classification/prediction methods were...
NASA Astrophysics Data System (ADS)
Baskaran, G.
2016-12-01
Doped band insulators, HfNCl, WO3, diamond, Bi2Se3, BiS2 families, STO/LAO interface, gate doped SrTiO3, MoS2 and so on are unusual superconductors. With an aim to build a general theory for superconductivity in doped band insulators, we focus on the BiS2 family which was discovered by Mizuguchi et al in 2012. While maximum Tc is only ˜11 K in {{LaO}}1-{{x}}{{{F}}}{{x}}{{BiS}}2, a number of experimental results are puzzling and anomalous in the sense that they resemble high T c and unconventional superconductors. Using a two orbital model of Usui, Suzuki and Kuroki, we show that the uniform low density free Fermi sea in {{LaO}}{0,5}{{{F}}}0.5{{BiS}}2 is unstable towards formation of the next nearest neighbor Bi-S-Bi diagonal valence bond (charged -2e Cooper pair) and their Wigner crystallization. Instability to this novel state of matter is caused by unscreened nearest neighbor coulomb repulsions (V ˜ 1 eV) and a hopping pattern with sulfur mediated diagonal next nearest neighbor Bi-S-Bi hopping t’ ˜ 0.88 eV, as well as larger than nearest neighbor Bi-Bi hopping, t ˜ 0.16 eV. Wigner crystals of Cooper pairs quantum melt for doping around x = 0.5 and stabilize certain resonating valence bond states and superconductivity. We study a few variational RVB states and suggest that BiS2 family members are latent high Tc superconductors, but challenged by competing orders and the fragile nature of many body states sustained by unscreened Coulomb forces. One of our superconducting states has d XY symmetry and a gap. We also predict a 2d Bose metal or vortex liquid normal state, as charged -2e valence bonds survive in the normal state.
Estimation of Carcinogenicity using Hierarchical Clustering and Nearest Neighbor Methodologies
Previously a hierarchical clustering (HC) approach and a nearest neighbor (NN) approach were developed to model acute aquatic toxicity end points. These approaches were developed to correlate the toxicity for large, noncongeneric data sets. In this study these approaches applie...
Computing sparse derivatives and consecutive zeros problem
NASA Astrophysics Data System (ADS)
Chandra, B. V. Ravi; Hossain, Shahadat
2013-02-01
We describe a substitution based sparse Jacobian matrix determination method using algorithmic differentiation. Utilizing the a priori known sparsity pattern, a compression scheme is determined using graph coloring. The "compressed pattern" of the Jacobian matrix is then reordered into a form suitable for computation by substitution. We show that the column reordering of the compressed pattern matrix (so as to align the zero entries into consecutive locations in each row) can be viewed as a variant of traveling salesman problem. Preliminary computational results show that on the test problems the performance of nearest-neighbor type heuristic algorithms is highly encouraging.
Adaptive local linear regression with application to printer color management.
Gupta, Maya R; Garcia, Eric K; Chin, Erika
2008-06-01
Local learning methods, such as local linear regression and nearest neighbor classifiers, base estimates on nearby training samples, neighbors. Usually, the number of neighbors used in estimation is fixed to be a global "optimal" value, chosen by cross validation. This paper proposes adapting the number of neighbors used for estimation to the local geometry of the data, without need for cross validation. The term enclosing neighborhood is introduced to describe a set of neighbors whose convex hull contains the test point when possible. It is proven that enclosing neighborhoods yield bounded estimation variance under some assumptions. Three such enclosing neighborhood definitions are presented: natural neighbors, natural neighbors inclusive, and enclosing k-NN. The effectiveness of these neighborhood definitions with local linear regression is tested for estimating lookup tables for color management. Significant improvements in error metrics are shown, indicating that enclosing neighborhoods may be a promising adaptive neighborhood definition for other local learning tasks as well, depending on the density of training samples.
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification
1999-05-17
Experimental Results In this section, we compare kNN -mut which uses the weight vector obtained using mutual information as the fi- nal weight vector and...WAKNN against kNN , C4.5 [Qui93], RIPPER [Coh95], PEBLS [CS93], Rainbow [McC96], VSM [Low95] on several synthetic and real data sets. VSM is another k...obtained without this option. 3 C4.5 RIPPER PEBLS Rainbow kNN WAKNN Syn-1 100.0 100.0 100.0 100.0 77.3 100.0 Syn-2 67.5 69.5 62.0 50.0 66.0 68.8 Syn
Feasibility of a special-purpose computer to solve the Navier-Stokes equations
NASA Technical Reports Server (NTRS)
Gritton, E. C.; King, W. S.; Sutherland, I.; Gaines, R. S.; Gazley, C., Jr.; Grosch, C.; Juncosa, M.; Petersen, H.
1978-01-01
Orders-of-magnitude improvements in computer performance can be realized with a parallel array of thousands of fast microprocessors. In this architecture, wiring congestion is minimized by limiting processor communication to nearest neighbors. When certain standard algorithms are applied to a viscous flow problem and existing LSI technology is used, performance estimates of this conceptual design show a dramatic decrease in computational time when compared to the CDC 7600.
Retinopathy of Prematurity-assist: Novel Software for Detecting Plus Disease
Pour, Elias Khalili; Pourreza, Hamidreza; Zamani, Kambiz Ameli; Mahmoudi, Alireza; Sadeghi, Arash Mir Mohammad; Shadravan, Mahla; Karkhaneh, Reza; Pour, Ramak Rouhi
2017-01-01
Purpose To design software with a novel algorithm, which analyzes the tortuosity and vascular dilatation in fundal images of retinopathy of prematurity (ROP) patients with an acceptable accuracy for detecting plus disease. Methods Eighty-seven well-focused fundal images taken with RetCam were classified to three groups of plus, non-plus, and pre-plus by agreement between three ROP experts. Automated algorithms in this study were designed based on two methods: the curvature measure and distance transform for assessment of tortuosity and vascular dilatation, respectively as two major parameters of plus disease detection. Results Thirty-eight plus, 12 pre-plus, and 37 non-plus images, which were classified by three experts, were tested by an automated algorithm and software evaluated the correct grouping of images in comparison to expert voting with three different classifiers, k-nearest neighbor, support vector machine and multilayer perceptron network. The plus, pre-plus, and non-plus images were analyzed with 72.3%, 83.7%, and 84.4% accuracy, respectively. Conclusions The new automated algorithm used in this pilot scheme for diagnosis and screening of patients with plus ROP has acceptable accuracy. With more improvements, it may become particularly useful, especially in centers without a skilled person in the ROP field. PMID:29022295
Prediction of Effective Drug Combinations by an Improved Naïve Bayesian Algorithm.
Bai, Li-Yue; Dai, Hao; Xu, Qin; Junaid, Muhammad; Peng, Shao-Liang; Zhu, Xiaolei; Xiong, Yi; Wei, Dong-Qing
2018-02-05
Drug combinatorial therapy is a promising strategy for combating complex diseases due to its fewer side effects, lower toxicity and better efficacy. However, it is not feasible to determine all the effective drug combinations in the vast space of possible combinations given the increasing number of approved drugs in the market, since the experimental methods for identification of effective drug combinations are both labor- and time-consuming. In this study, we conducted systematic analysis of various types of features to characterize pairs of drugs. These features included information about the targets of the drugs, the pathway in which the target protein of a drug was involved in, side effects of drugs, metabolic enzymes of the drugs, and drug transporters. The latter two features (metabolic enzymes and drug transporters) were related to the metabolism and transportation properties of drugs, which were not analyzed or used in previous studies. Then, we devised a novel improved naïve Bayesian algorithm to construct classification models to predict effective drug combinations by using the individual types of features mentioned above. Our results indicated that the performance of our proposed method was indeed better than the naïve Bayesian algorithm and other conventional classification algorithms such as support vector machine and K-nearest neighbor.
NASA Astrophysics Data System (ADS)
Ziemann, Amanda K.; Messinger, David W.; Albano, James A.; Basener, William F.
2012-06-01
Anomaly detection algorithms have historically been applied to hyperspectral imagery in order to identify pixels whose material content is incongruous with the background material in the scene. Typically, the application involves extracting man-made objects from natural and agricultural surroundings. A large challenge in designing these algorithms is determining which pixels initially constitute the background material within an image. The topological anomaly detection (TAD) algorithm constructs a graph theory-based, fully non-parametric topological model of the background in the image scene, and uses codensity to measure deviation from this background. In TAD, the initial graph theory structure of the image data is created by connecting an edge between any two pixel vertices x and y if the Euclidean distance between them is less than some resolution r. While this type of proximity graph is among the most well-known approaches to building a geometric graph based on a given set of data, there is a wide variety of dierent geometrically-based techniques. In this paper, we present a comparative test of the performance of TAD across four dierent constructs of the initial graph: mutual k-nearest neighbor graph, sigma-local graph for two different values of σ > 1, and the proximity graph originally implemented in TAD.
Validation of optical codes based on 3D nanostructures
NASA Astrophysics Data System (ADS)
Carnicer, Artur; Javidi, Bahram
2017-05-01
Image information encoding using random phase masks produce speckle-like noise distributions when the sample is propagated in the Fresnel domain. As a result, information cannot be accessed by simple visual inspection. Phase masks can be easily implemented in practice by attaching cello-tape to the plain-text message. Conventional 2D-phase masks can be generalized to 3D by combining glass and diffusers resulting in a more complex, physical unclonable function. In this communication, we model the behavior of a 3D phase mask using a simple approach: light is propagated trough glass using the angular spectrum of plane waves whereas the diffusor is described as a random phase mask and a blurring effect on the amplitude of the propagated wave. Using different designs for the 3D phase mask and multiple samples, we demonstrate that classification is possible using the k-nearest neighbors and random forests machine learning algorithms.
Automatic correction of dental artifacts in PET/MRI
Ladefoged, Claes N.; Andersen, Flemming L.; Keller, Sune. H.; Beyer, Thomas; Law, Ian; Højgaard, Liselotte; Darkner, Sune; Lauze, Francois
2015-01-01
Abstract. A challenge when using current magnetic resonance (MR)-based attenuation correction in positron emission tomography/MR imaging (PET/MRI) is that the MRIs can have a signal void around the dental fillings that is segmented as artificial air-regions in the attenuation map. For artifacts connected to the background, we propose an extension to an existing active contour algorithm to delineate the outer contour using the nonattenuation corrected PET image and the original attenuation map. We propose a combination of two different methods for differentiating the artifacts within the body from the anatomical air-regions by first using a template of artifact regions, and second, representing the artifact regions with a combination of active shape models and k-nearest-neighbors. The accuracy of the combined method has been evaluated using 25 F18-fluorodeoxyglucose PET/MR patients. Results showed that the approach was able to correct an average of 97±3% of the artifact areas. PMID:26158104
Identifying Dyscalculia Symptoms Related to Magnocellular Reasoning Using Smartphones.
Knudsen, Greger Siem; Babic, Ankica
2016-01-01
This paper presents a study that has developed a mobile software application for assisting diagnosis of learning disabilities in mathematics, called dyscalculia, and measuring correlations between dyscalculia symptoms and magnocellular reasoning. Usually, software aids for dyscalculic individuals are focused on both assisting diagnosis and teaching the material. The software developed in this study however maintains a specific focus on the former, and in the process attempts to capture alleged correlations between dyscalculia symptoms and possible underlying causes of the condition. Classification of symptoms is performed by k-Nearest Neighbor algorithm classifying five parameters evaluating user's skills, returning calculated performance in each category as well as correlation strength between detected symptoms and magnocellular reasoning abilities. Expert evaluations has found the application to be appropriate and productive for its intended purpose, proving that mobile software is a suitable and valuable tool for assisting dyscalculia diagnosis and identifying root causes of developing the condition.
Emotional State Classification in Virtual Reality Using Wearable Electroencephalography
NASA Astrophysics Data System (ADS)
Suhaimi, N. S.; Teo, J.; Mountstephens, J.
2018-03-01
This paper presents the classification of emotions on EEG signals. One of the key issues in this research is the lack of mental classification using VR as the medium to stimulate emotion. The approach towards this research is by using K-nearest neighbor (KNN) and Support Vector Machine (SVM). Firstly, each of the participant will be required to wear the EEG headset and recording their brainwaves when they are immersed inside the VR. The data points are then marked if they showed any physical signs of emotion or by observing the brainwave pattern. Secondly, the data will then be tested and trained with KNN and SVM algorithms. The accuracy achieved from both methods were approximately 82% throughout the brainwave spectrum (α, β, γ, δ, θ). These methods showed promising results and will be further enhanced using other machine learning approaches in VR stimulus.
Ding, Huijun; He, Qing; Zhou, Yongjin; Dan, Guo; Cui, Song
2017-01-01
Motion-intent-based finger gesture recognition systems are crucial for many applications such as prosthesis control, sign language recognition, wearable rehabilitation system, and human–computer interaction. In this article, a motion-intent-based finger gesture recognition system is designed to correctly identify the tapping of every finger for the first time. Two auto-event annotation algorithms are firstly applied and evaluated for detecting the finger tapping frame. Based on the truncated signals, the Wavelet packet transform (WPT) coefficients are calculated and compressed as the features, followed by a feature selection method that is able to improve the performance by optimizing the feature set. Finally, three popular classifiers including naive Bayes (NBC), K-nearest neighbor (KNN), and support vector machine (SVM) are applied and evaluated. The recognition accuracy can be achieved up to 94%. The design and the architecture of the system are presented with full system characterization results. PMID:29167655
Electronic spectrum of trilayer graphene
NASA Astrophysics Data System (ADS)
Kumar, S.; Ajay
2014-08-01
Present work deals with the analysis of the single particle electronic spectral function in trilayer (ABC-, ABA- and AAA-stacked) graphene. Tight binding Hamiltonian containing intralayer nearest-neighbor and next-nearest neighbor hopping along-with the interlayer coupling parameter within two triangular sub-lattice approach for trilayer graphene has been employed. The expression of single particle spectral functions A(kw) is obtained within mean-field Green's function equations of motion approach. Spectral function at Γ, M and K points of the Brillouin zone has been numerically computed. It is pointed out that the nature of electronic states at different points of Brillouin zone is found to be influenced by stacking order and Coulomb interactions. At Γ and M points, a trilayer splitting is predicted while at K point a bilayer splitting effect is observed due to crossing of two bands (at K point). Interlayer coupling ( t_{ bot } ) is found to be responsible for the splitting of quasi-particle peaks at each point of Brillouin zone. The influence of t_{ bot } in trilayer graphene is prominent for AAA-stacking compared to ABC- and ABA-stacking. On the other hand, onsite Coulomb interaction reduces the trilayer splitting effect into bilayer splitting at Γ and M points of Brillouin zone and bilayer splitting into single peak spectral function at K point with a shifting of the peak away from Fermi level.
Diagnosis of Tempromandibular Disorders Using Local Binary Patterns.
Haghnegahdar, A A; Kolahi, S; Khojastepour, L; Tajeripour, F
2018-03-01
Temporomandibular joint disorder (TMD) might be manifested as structural changes in bone through modification, adaptation or direct destruction. We propose to use Local Binary Pattern (LBP) characteristics and histogram-oriented gradients on the recorded images as a diagnostic tool in TMD assessment. CBCT images of 66 patients (132 joints) with TMD and 66 normal cases (132 joints) were collected and 2 coronal cut prepared from each condyle, although images were limited to head of mandibular condyle. In order to extract features of images, first we use LBP and then histogram of oriented gradients. To reduce dimensionality, the linear algebra Singular Value Decomposition (SVD) is applied to the feature vectors matrix of all images. For evaluation, we used K nearest neighbor (K-NN), Support Vector Machine, Naïve Bayesian and Random Forest classifiers. We used Receiver Operating Characteristic (ROC) to evaluate the hypothesis. K nearest neighbor classifier achieves a very good accuracy (0.9242), moreover, it has desirable sensitivity (0.9470) and specificity (0.9015) results, when other classifiers have lower accuracy, sensitivity and specificity. We proposed a fully automatic approach to detect TMD using image processing techniques based on local binary patterns and feature extraction. K-NN has been the best classifier for our experiments in detecting patients from healthy individuals, by 92.42% accuracy, 94.70% sensitivity and 90.15% specificity. The proposed method can help automatically diagnose TMD at its initial stages.
NASA Astrophysics Data System (ADS)
Jiang, Li; Shi, Tielin; Xuan, Jianping
2012-05-01
Generally, the vibration signals of fault bearings are non-stationary and highly nonlinear under complicated operating conditions. Thus, it's a big challenge to extract optimal features for improving classification and simultaneously decreasing feature dimension. Kernel Marginal Fisher analysis (KMFA) is a novel supervised manifold learning algorithm for feature extraction and dimensionality reduction. In order to avoid the small sample size problem in KMFA, we propose regularized KMFA (RKMFA). A simple and efficient intelligent fault diagnosis method based on RKMFA is put forward and applied to fault recognition of rolling bearings. So as to directly excavate nonlinear features from the original high-dimensional vibration signals, RKMFA constructs two graphs describing the intra-class compactness and the inter-class separability, by combining traditional manifold learning algorithm with fisher criteria. Therefore, the optimal low-dimensional features are obtained for better classification and finally fed into the simplest K-nearest neighbor (KNN) classifier to recognize different fault categories of bearings. The experimental results demonstrate that the proposed approach improves the fault classification performance and outperforms the other conventional approaches.
Classification Model for Damage Localization in a Plate Structure
NASA Astrophysics Data System (ADS)
Janeliukstis, R.; Ruchevskis, S.; Chate, A.
2018-01-01
The present study is devoted to the problem of damage localization by means of data classification. The commercial ANSYS finite-elements program was used to make a model of a cantilevered composite plate equipped with numerous strain sensors. The plate was divided into zones, and, for data classification purposes, each of them housed several points to which a point mass of magnitude 5 and 10% of plate mass was applied. At each of these points, a numerical modal analysis was performed, from which the first few natural frequencies and strain readings were extracted. The strain data for every point were the input for a classification procedure involving k nearest neighbors and decision trees. The classification model was trained and optimized by finetuning the key parameters of both algorithms. Finally, two new query points were simulated and subjected to a classification in terms of assigning a label to one of the zones of the plate, thus localizing these points. Damage localization results were compared for both algorithms and were found to be in good agreement with the actual application positions of point load.
Ma, Xiaolei; Dai, Zhuang; He, Zhengbing; Ma, Jihui; Wang, Yong; Wang, Yunpeng
2017-04-10
This paper proposes a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy. Spatiotemporal traffic dynamics are converted to images describing the time and space relations of traffic flow via a two-dimensional time-space matrix. A CNN is applied to the image following two consecutive steps: abstract traffic feature extraction and network-wide traffic speed prediction. The effectiveness of the proposed method is evaluated by taking two real-world transportation networks, the second ring road and north-east transportation network in Beijing, as examples, and comparing the method with four prevailing algorithms, namely, ordinary least squares, k-nearest neighbors, artificial neural network, and random forest, and three deep learning architectures, namely, stacked autoencoder, recurrent neural network, and long-short-term memory network. The results show that the proposed method outperforms other algorithms by an average accuracy improvement of 42.91% within an acceptable execution time. The CNN can train the model in a reasonable time and, thus, is suitable for large-scale transportation networks.
An ant colony optimization based feature selection for web page classification.
Saraç, Esra; Özel, Selma Ayşe
2014-01-01
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.
Ma, Xiaolei; Dai, Zhuang; He, Zhengbing; Ma, Jihui; Wang, Yong; Wang, Yunpeng
2017-01-01
This paper proposes a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy. Spatiotemporal traffic dynamics are converted to images describing the time and space relations of traffic flow via a two-dimensional time-space matrix. A CNN is applied to the image following two consecutive steps: abstract traffic feature extraction and network-wide traffic speed prediction. The effectiveness of the proposed method is evaluated by taking two real-world transportation networks, the second ring road and north-east transportation network in Beijing, as examples, and comparing the method with four prevailing algorithms, namely, ordinary least squares, k-nearest neighbors, artificial neural network, and random forest, and three deep learning architectures, namely, stacked autoencoder, recurrent neural network, and long-short-term memory network. The results show that the proposed method outperforms other algorithms by an average accuracy improvement of 42.91% within an acceptable execution time. The CNN can train the model in a reasonable time and, thus, is suitable for large-scale transportation networks. PMID:28394270
Constrained dictionary learning and probabilistic hypergraph ranking for person re-identification
NASA Astrophysics Data System (ADS)
He, You; Wu, Song; Pu, Nan; Qian, Li; Xiao, Guoqiang
2018-04-01
Person re-identification is a fundamental and inevitable task in public security. In this paper, we propose a novel framework to improve the performance of this task. First, two different types of descriptors are extracted to represent a pedestrian: (1) appearance-based superpixel features, which are constituted mainly by conventional color features and extracted from the supepixel rather than a whole picture and (2) due to the limitation of discrimination of appearance features, the deep features extracted by feature fusion Network are also used. Second, a view invariant subspace is learned by dictionary learning constrained by the minimum negative sample (termed as DL-cMN) to reduce the noise in appearance-based superpixel feature domain. Then, we use deep features and sparse codes transformed by appearancebased features to establish the hyperedges respectively by k-nearest neighbor, rather than jointing different features simply. Finally, a final ranking is performed by probabilistic hypergraph ranking algorithm. Extensive experiments on three challenging datasets (VIPeR, PRID450S and CUHK01) demonstrate the advantages and effectiveness of our proposed algorithm.
Developing a radiomics framework for classifying non-small cell lung carcinoma subtypes
NASA Astrophysics Data System (ADS)
Yu, Dongdong; Zang, Yali; Dong, Di; Zhou, Mu; Gevaert, Olivier; Fang, Mengjie; Shi, Jingyun; Tian, Jie
2017-03-01
Patient-targeted treatment of non-small cell lung carcinoma (NSCLC) has been well documented according to the histologic subtypes over the past decade. In parallel, recent development of quantitative image biomarkers has recently been highlighted as important diagnostic tools to facilitate histological subtype classification. In this study, we present a radiomics analysis that classifies the adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). We extract 52-dimensional, CT-based features (7 statistical features and 45 image texture features) to represent each nodule. We evaluate our approach on a clinical dataset including 324 ADCs and 110 SqCCs patients with CT image scans. Classification of these features is performed with four different machine-learning classifiers including Support Vector Machines with Radial Basis Function kernel (RBF-SVM), Random forest (RF), K-nearest neighbor (KNN), and RUSBoost algorithms. To improve the classifiers' performance, optimal feature subset is selected from the original feature set by using an iterative forward inclusion and backward eliminating algorithm. Extensive experimental results demonstrate that radiomics features achieve encouraging classification results on both complete feature set (AUC=0.89) and optimal feature subset (AUC=0.91).
Goshvarpour, Ateke; Goshvarpour, Atefeh
2018-04-30
Heart rate variability (HRV) analysis has become a widely used tool for monitoring pathological and psychological states in medical applications. In a typical classification problem, information fusion is a process whereby the effective combination of the data can achieve a more accurate system. The purpose of this article was to provide an accurate algorithm for classifying HRV signals in various psychological states. Therefore, a novel feature level fusion approach was proposed. First, using the theory of information, two similarity indicators of the signal were extracted, including correntropy and Cauchy-Schwarz divergence. Applying probabilistic neural network (PNN) and k-nearest neighbor (kNN), the performance of each index in the classification of meditators and non-meditators HRV signals was appraised. Then, three fusion rules, including division, product, and weighted sum rules were used to combine the information of both similarity measures. For the first time, we propose an algorithm to define the weights of each feature based on the statistical p-values. The performance of HRV classification using combined features was compared with the non-combined features. Totally, the accuracy of 100% was obtained for discriminating all states. The results showed the strong ability and proficiency of division and weighted sum rules in the improvement of the classifier accuracies.
Automatic Classification of Tremor Severity in Parkinson's Disease Using a Wearable Device.
Jeon, Hyoseon; Lee, Woongwoo; Park, Hyeyoung; Lee, Hong Ji; Kim, Sang Kyong; Kim, Han Byul; Jeon, Beomseok; Park, Kwang Suk
2017-09-09
Although there is clinical demand for new technology that can accurately measure Parkinsonian tremors, automatic scoring of Parkinsonian tremors using machine-learning approaches has not yet been employed. This study aims to fill this gap by proposing machine-learning algorithms as a way to predict the Unified Parkinson's Disease Rating Scale (UPDRS), which are similar to how neurologists rate scores in actual clinical practice. In this study, the tremor signals of 85 patients with Parkinson's disease (PD) were measured using a wrist-watch-type wearable device consisting of an accelerometer and a gyroscope. The displacement and angle signals were calculated from the measured acceleration and angular velocity, and the acceleration, angular velocity, displacement, and angle signals were used for analysis. Nineteen features were extracted from each signal, and the pairwise correlation strategy was used to reduce the number of feature dimensions. With the selected features, a decision tree (DT), support vector machine (SVM), discriminant analysis (DA), random forest (RF), and k -nearest-neighbor ( k NN) algorithm were explored for automatic scoring of the Parkinsonian tremor severity. The performance of the employed classifiers was analyzed using accuracy, recall, and precision, and compared to other findings in similar studies. Finally, the limitations and plans for further study are discussed.
Liu, Kai-Chun; Chan, Chia-Tai
2017-01-01
The proportion of the aging population is rapidly increasing around the world, which will cause stress on society and healthcare systems. In recent years, advances in technology have created new opportunities for automatic activities of daily living (ADL) monitoring to improve the quality of life and provide adequate medical service for the elderly. Such automatic ADL monitoring requires reliable ADL information on a fine-grained level, especially for the status of interaction between body gestures and the environment in the real-world. In this work, we propose a significant change spotting mechanism for periodic human motion segmentation during cleaning task performance. A novel approach is proposed based on the search for a significant change of gestures, which can manage critical technical issues in activity recognition, such as continuous data segmentation, individual variance, and category ambiguity. Three typical machine learning classification algorithms are utilized for the identification of the significant change candidate, including a Support Vector Machine (SVM), k-Nearest Neighbors (kNN), and Naive Bayesian (NB) algorithm. Overall, the proposed approach achieves 96.41% in the F1-score by using the SVM classifier. The results show that the proposed approach can fulfill the requirement of fine-grained human motion segmentation for automatic ADL monitoring. PMID:28106853
Nearest unlike neighbor (NUN): an aid to decision confidence estimation
NASA Astrophysics Data System (ADS)
Dasarathy, Belur V.
1995-09-01
The concept of nearest unlike neighbor (NUN), proposed and explored previously in the design of nearest neighbor (NN) based decision systems, is further exploited in this study to develop a measure of confidence in the decisions made by NN-based decision systems. This measure of confidence, on the basis of comparison with a user-defined threshold, may be used to determine the acceptability of the decision provided by the NN-based decision system. The concepts, associated methodology, and some illustrative numerical examples using the now classical Iris data to bring out the ease of implementation and effectiveness of the proposed innovations are presented.
Influence of the number of topologically interacting neighbors on swarm dynamics
Shang, Yilun; Bouffanais, Roland
2014-01-01
Recent empirical and theoretical works on collective behaviors based on a topological interaction are beginning to offer some explanations as for the physical reasons behind the selection of a particular number of nearest neighbors locally affecting each individual's dynamics. Recently, flocking starlings have been shown to topologically interact with a very specific number of neighbors, between six to eight, while metric-free interactions were found to govern human crowd dynamics. Here, we use network- and graph-theoretic approaches combined with a dynamical model of locally interacting self-propelled particles to study how the consensus reaching process and its dynamics are influenced by the number k of topological neighbors. Specifically, we prove exactly that, in the absence of noise, consensus is always attained with a speed to consensus strictly increasing with k. The analysis of both speed and time to consensus reveals that, irrespective of the swarm size, a value of k ~ 10 speeds up the rate of convergence to consensus to levels close to the one of the optimal all-to-all interaction signaling. Furthermore, this effect is found to be more pronounced in the presence of environmental noise. PMID:24567077
Creating peer groups for assessing and comparing nursing home performance.
Byrne, Margaret M; Daw, Christina; Pietz, Ken; Reis, Brian; Petersen, Laura A
2013-11-01
Publicly reported performance data for hospitals and nursing homes are becoming ubiquitous. For such comparisons to be fair, facilities must be compared with their peers. To adapt a previously published methodology for developing hospital peer groupings so that it is applicable to nursing homes and to explore the characteristics of "nearest-neighbor" peer groupings. Analysis of Department of Veterans Affairs administrative databases and nursing home facility characteristics. The nearest-neighbor methodology for developing peer groupings involves calculating the Euclidean distance between facilities based on facility characteristics. We describe our steps in selection of facility characteristics, describe the characteristics of nearest-neighbor peer groups, and compare them with peer groups derived through classical cluster analysis. The facility characteristics most pertinent to nursing home groupings were found to be different from those that were most relevant for hospitals. Unlike classical cluster groups, nearest neighbor groups are not mutually exclusive, and the nearest-neighbor methodology resulted in nursing home peer groupings that were substantially less diffuse than nursing home peer groups created using traditional cluster analysis. It is essential that healthcare policy makers and administrators have a means of fairly grouping facilities for the purposes of quality, cost, or efficiency comparisons. In this research, we show that a previously published methodology can be successfully applied to a nursing home setting. The same approach could be applied in other clinical settings such as primary care.
Thanh Noi, Phan; Kappas, Martin
2017-01-01
In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets. PMID:29271909
Thanh Noi, Phan; Kappas, Martin
2017-12-22
In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.
Efficient Estimation of Mutual Information for Strongly Dependent Variables
2015-05-11
the two possibilities: for a fixed dimension d and near- est neighbor parameter k, we find a constant ↵ k,d , such that if V̄ (i)/V (i) < ↵ k,d , then...also compare the results to several baseline estima- tors: KSG (Kraskov et al., 2004), generalized near- est neighbor graph (GNN) (Pál et al., 2010...Amaury Lendasse, and Francesco Corona. A boundary corrected expansion of the moments of near- est neighbor distributions. Random Struct. Algorithms
ERIC Educational Resources Information Center
Pankau, Brian L.
2009-01-01
This empirical study evaluates the document category prediction effectiveness of Naive Bayes (NB) and K-Nearest Neighbor (KNN) classifier treatments built from different feature selection and machine learning settings and trained and tested against textual corpora of 2300 Gang-Of-Four (GOF) design pattern documents. Analysis of the experiment's…
A Comparison of Rule-Based, K-Nearest Neighbor, and Neural Net Classifiers for Automated
Tai-Hoon Cho; Richard W. Conners; Philip A. Araman
1991-01-01
Over the last few years the authors have been involved in research aimed at developing a machine vision system for locating and identifying surface defects on materials. The particular problem being studied involves locating surface defects on hardwood lumber in a species independent manner. Obviously, the accurate location and identification of defects is of paramount...
Mathias, Patrick C; Turner, Emily H; Scroggins, Sheena M; Salipante, Stephen J; Hoffman, Noah G; Pritchard, Colin C; Shirts, Brian H
2016-03-01
To apply techniques for ancestry and sex computation from next-generation sequencing (NGS) data as an approach to confirm sample identity and detect sample processing errors. We combined a principal component analysis method with k-nearest neighbors classification to compute the ancestry of patients undergoing NGS testing. By combining this calculation with X chromosome copy number data, we determined the sex and ancestry of patients for comparison with self-report. We also modeled the sensitivity of this technique in detecting sample processing errors. We applied this technique to 859 patient samples with reliable self-report data. Our k-nearest neighbors ancestry screen had an accuracy of 98.7% for patients reporting a single ancestry. Visual inspection of principal component plots was consistent with self-report in 99.6% of single-ancestry and mixed-ancestry patients. Our model demonstrates that approximately two-thirds of potential sample swaps could be detected in our patient population using this technique. Patient ancestry can be estimated from NGS data incidentally sequenced in targeted panels, enabling an inexpensive quality control method when coupled with patient self-report. © American Society for Clinical Pathology, 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Shore, Joel D.; Thurston, George M.
2018-01-01
We report a charge-patterning phase transition on two-dimensional square lattices of titratable sites, here regarded as protonation sites, placed in a low-dielectric medium just below the planar interface between this medium and a salt solution. We calculate the work-of-charging matrix of the lattice with use of a linear Debye-Hückel model, as input to a grand-canonical partition function for the distribution of occupancy patterns. For a large range of parameter values, this model exhibits an approximate inverse cubic power-law decrease of the voltage produced by an individual charge, as a function of its in-lattice separation from neighboring titratable sites. Thus, the charge coupling voltage biases the local probabilities of proton binding as a function of the occupancy of sites for many neighbors beyond the nearest ones. We find that even in the presence of these longer-range interactions, the site couplings give rise to a phase transition in which the site occupancies exhibit an alternating, checkerboard pattern that is an analog of antiferromagnetic ordering. The overall strength W of this canonical charge coupling voltage, per unit charge, is a function of the Debye length, the charge depth, the Bjerrum length, and the dielectric coefficients of the medium and the solvent. The alternating occupancy transition occurs above a curve of thermodynamic critical points in the (pH-pK,W) plane, the curve representing a charge-regulation analog of variation of the Néel temperature of an Ising antiferromagnet as a function of an applied, uniform magnetic field. The analog of a uniform magnetic field in the antiferromagnet problem is a combination of pH-pK and W, and 1/W is the analog of the temperature in the antiferromagnet problem. We use Monte Carlo simulations to study the occupancy patterns of the titratable sites, including interactions out to the 37th nearest-neighbor category (a distance of 74 lattice constants), first validating simulations through comparison with exact and approximate results for the nearest-neighbor case. We then use the simulations to map the charge-patterning phase boundary in the (pH-pK,W) plane. The physical parameters that determine W provide a framework for identifying and designing real surfaces that could exhibit charge-patterning phase transitions. PMID:26764648
Shore, Joel D; Thurston, George M
2015-12-01
We report a charge-patterning phase transition on two-dimensional square lattices of titratable sites, here regarded as protonation sites, placed in a low-dielectric medium just below the planar interface between this medium and a salt solution. We calculate the work-of-charging matrix of the lattice with use of a linear Debye-Hückel model, as input to a grand-canonical partition function for the distribution of occupancy patterns. For a large range of parameter values, this model exhibits an approximate inverse cubic power-law decrease of the voltage produced by an individual charge, as a function of its in-lattice separation from neighboring titratable sites. Thus, the charge coupling voltage biases the local probabilities of proton binding as a function of the occupancy of sites for many neighbors beyond the nearest ones. We find that even in the presence of these longer-range interactions, the site couplings give rise to a phase transition in which the site occupancies exhibit an alternating, checkerboard pattern that is an analog of antiferromagnetic ordering. The overall strength W of this canonical charge coupling voltage, per unit charge, is a function of the Debye length, the charge depth, the Bjerrum length, and the dielectric coefficients of the medium and the solvent. The alternating occupancy transition occurs above a curve of thermodynamic critical points in the (pH-pK,W) plane, the curve representing a charge-regulation analog of variation of the Néel temperature of an Ising antiferromagnet as a function of an applied, uniform magnetic field. The analog of a uniform magnetic field in the antiferromagnet problem is a combination of pH-pK and W, and 1/W is the analog of the temperature in the antiferromagnet problem. We use Monte Carlo simulations to study the occupancy patterns of the titratable sites, including interactions out to the 37th nearest-neighbor category (a distance of √74 lattice constants), first validating simulations through comparison with exact and approximate results for the nearest-neighbor case. We then use the simulations to map the charge-patterning phase boundary in the (pH-pK,W) plane. The physical parameters that determine W provide a framework for identifying and designing real surfaces that could exhibit charge-patterning phase transitions.
NASA Astrophysics Data System (ADS)
Shore, Joel D.; Thurston, George M.
2015-12-01
We report a charge-patterning phase transition on two-dimensional square lattices of titratable sites, here regarded as protonation sites, placed in a low-dielectric medium just below the planar interface between this medium and a salt solution. We calculate the work-of-charging matrix of the lattice with use of a linear Debye-Hückel model, as input to a grand-canonical partition function for the distribution of occupancy patterns. For a large range of parameter values, this model exhibits an approximate inverse cubic power-law decrease of the voltage produced by an individual charge, as a function of its in-lattice separation from neighboring titratable sites. Thus, the charge coupling voltage biases the local probabilities of proton binding as a function of the occupancy of sites for many neighbors beyond the nearest ones. We find that even in the presence of these longer-range interactions, the site couplings give rise to a phase transition in which the site occupancies exhibit an alternating, checkerboard pattern that is an analog of antiferromagnetic ordering. The overall strength W of this canonical charge coupling voltage, per unit charge, is a function of the Debye length, the charge depth, the Bjerrum length, and the dielectric coefficients of the medium and the solvent. The alternating occupancy transition occurs above a curve of thermodynamic critical points in the (p H-p K ,W ) plane, the curve representing a charge-regulation analog of variation of the Néel temperature of an Ising antiferromagnet as a function of an applied, uniform magnetic field. The analog of a uniform magnetic field in the antiferromagnet problem is a combination of p H-p K and W , and 1 /W is the analog of the temperature in the antiferromagnet problem. We use Monte Carlo simulations to study the occupancy patterns of the titratable sites, including interactions out to the 37th nearest-neighbor category (a distance of √{74 } lattice constants), first validating simulations through comparison with exact and approximate results for the nearest-neighbor case. We then use the simulations to map the charge-patterning phase boundary in the (p H-p K ,W ) plane. The physical parameters that determine W provide a framework for identifying and designing real surfaces that could exhibit charge-patterning phase transitions.
NASA Astrophysics Data System (ADS)
Fatollahi, Amir H.; Khorrami, Mohammad; Shariati, Ahmad; Aghamohammadi, Amir
2011-04-01
A complete classification is given for one-dimensional chains with nearest-neighbor interactions having two states in each site, for which a matrix product ground state exists. The Hamiltonians and their corresponding matrix product ground states are explicitly obtained.
NASA Astrophysics Data System (ADS)
Wigdahl, J.; Agurto, C.; Murray, V.; Barriga, S.; Soliz, P.
2013-03-01
Diabetic retinopathy (DR) affects more than 4.4 million Americans age 40 and over. Automatic screening for DR has shown to be an efficient and cost-effective way to lower the burden on the healthcare system, by triaging diabetic patients and ensuring timely care for those presenting with DR. Several supervised algorithms have been developed to detect pathologies related to DR, but little work has been done in determining the size of the training set that optimizes an algorithm's performance. In this paper we analyze the effect of the training sample size on the performance of a top-down DR screening algorithm for different types of statistical classifiers. Results are based on partial least squares (PLS), support vector machines (SVM), k-nearest neighbor (kNN), and Naïve Bayes classifiers. Our dataset consisted of digital retinal images collected from a total of 745 cases (595 controls, 150 with DR). We varied the number of normal controls in the training set, while keeping the number of DR samples constant, and repeated the procedure 10 times using randomized training sets to avoid bias. Results show increasing performance in terms of area under the ROC curve (AUC) when the number of DR subjects in the training set increased, with similar trends for each of the classifiers. Of these, PLS and k-NN had the highest average AUC. Lower standard deviation and a flattening of the AUC curve gives evidence that there is a limit to the learning ability of the classifiers and an optimal number of cases to train on.
A novel topology control approach to maintain the node degree in dynamic wireless sensor networks.
Huang, Yuanjiang; Martínez, José-Fernán; Díaz, Vicente Hernández; Sendra, Juana
2014-03-07
Topology control is an important technique to improve the connectivity and the reliability of Wireless Sensor Networks (WSNs) by means of adjusting the communication range of wireless sensor nodes. In this paper, a novel Fuzzy-logic Topology Control (FTC) is proposed to achieve any desired average node degree by adaptively changing communication range, thus improving the network connectivity, which is the main target of FTC. FTC is a fully localized control algorithm, and does not rely on location information of neighbors. Instead of designing membership functions and if-then rules for fuzzy-logic controller, FTC is constructed from the training data set to facilitate the design process. FTC is proved to be accurate, stable and has short settling time. In order to compare it with other representative localized algorithms (NONE, FLSS, k-Neighbor and LTRT), FTC is evaluated through extensive simulations. The simulation results show that: firstly, similar to k-Neighbor algorithm, FTC is the best to achieve the desired average node degree as node density varies; secondly, FTC is comparable to FLSS and k-Neighbor in terms of energy-efficiency, but is better than LTRT and NONE; thirdly, FTC has the lowest average maximum communication range than other algorithms, which indicates that the most energy-consuming node in the network consumes the lowest power.
Optimal Geometrical Set for Automated Marker Placement to Virtualized Real-Time Facial Emotions
Maruthapillai, Vasanthan; Murugappan, Murugappan
2016-01-01
In recent years, real-time face recognition has been a major topic of interest in developing intelligent human-machine interaction systems. Over the past several decades, researchers have proposed different algorithms for facial expression recognition, but there has been little focus on detection in real-time scenarios. The present work proposes a new algorithmic method of automated marker placement used to classify six facial expressions: happiness, sadness, anger, fear, disgust, and surprise. Emotional facial expressions were captured using a webcam, while the proposed algorithm placed a set of eight virtual markers on each subject’s face. Facial feature extraction methods, including marker distance (distance between each marker to the center of the face) and change in marker distance (change in distance between the original and new marker positions), were used to extract three statistical features (mean, variance, and root mean square) from the real-time video sequence. The initial position of each marker was subjected to the optical flow algorithm for marker tracking with each emotional facial expression. Finally, the extracted statistical features were mapped into corresponding emotional facial expressions using two simple non-linear classifiers, K-nearest neighbor and probabilistic neural network. The results indicate that the proposed automated marker placement algorithm effectively placed eight virtual markers on each subject’s face and gave a maximum mean emotion classification rate of 96.94% using the probabilistic neural network. PMID:26859884
A Floor-Map-Aided WiFi/Pseudo-Odometry Integration Algorithm for an Indoor Positioning System
Wang, Jian; Hu, Andong; Liu, Chunyan; Li, Xin
2015-01-01
This paper proposes a scheme for indoor positioning by fusing floor map, WiFi and smartphone sensor data to provide meter-level positioning without additional infrastructure. A topology-constrained K nearest neighbor (KNN) algorithm based on a floor map layout provides the coordinates required to integrate WiFi data with pseudo-odometry (P-O) measurements simulated using a pedestrian dead reckoning (PDR) approach. One method of further improving the positioning accuracy is to use a more effective multi-threshold step detection algorithm, as proposed by the authors. The “go and back” phenomenon caused by incorrect matching of the reference points (RPs) of a WiFi algorithm is eliminated using an adaptive fading-factor-based extended Kalman filter (EKF), taking WiFi positioning coordinates, P-O measurements and fused heading angles as observations. The “cross-wall” problem is solved based on the development of a floor-map-aided particle filter algorithm by weighting the particles, thereby also eliminating the gross-error effects originating from WiFi or P-O measurements. The performance observed in a field experiment performed on the fourth floor of the School of Environmental Science and Spatial Informatics (SESSI) building on the China University of Mining and Technology (CUMT) campus confirms that the proposed scheme can reliably achieve meter-level positioning. PMID:25811224
Optimal Geometrical Set for Automated Marker Placement to Virtualized Real-Time Facial Emotions.
Maruthapillai, Vasanthan; Murugappan, Murugappan
2016-01-01
In recent years, real-time face recognition has been a major topic of interest in developing intelligent human-machine interaction systems. Over the past several decades, researchers have proposed different algorithms for facial expression recognition, but there has been little focus on detection in real-time scenarios. The present work proposes a new algorithmic method of automated marker placement used to classify six facial expressions: happiness, sadness, anger, fear, disgust, and surprise. Emotional facial expressions were captured using a webcam, while the proposed algorithm placed a set of eight virtual markers on each subject's face. Facial feature extraction methods, including marker distance (distance between each marker to the center of the face) and change in marker distance (change in distance between the original and new marker positions), were used to extract three statistical features (mean, variance, and root mean square) from the real-time video sequence. The initial position of each marker was subjected to the optical flow algorithm for marker tracking with each emotional facial expression. Finally, the extracted statistical features were mapped into corresponding emotional facial expressions using two simple non-linear classifiers, K-nearest neighbor and probabilistic neural network. The results indicate that the proposed automated marker placement algorithm effectively placed eight virtual markers on each subject's face and gave a maximum mean emotion classification rate of 96.94% using the probabilistic neural network.
Tear fluid proteomics multimarkers for diabetic retinopathy screening
2013-01-01
Background The aim of the project was to develop a novel method for diabetic retinopathy screening based on the examination of tear fluid biomarker changes. In order to evaluate the usability of protein biomarkers for pre-screening purposes several different approaches were used, including machine learning algorithms. Methods All persons involved in the study had diabetes. Diabetic retinopathy (DR) was diagnosed by capturing 7-field fundus images, evaluated by two independent ophthalmologists. 165 eyes were examined (from 119 patients), 55 were diagnosed healthy and 110 images showed signs of DR. Tear samples were taken from all eyes and state-of-the-art nano-HPLC coupled ESI-MS/MS mass spectrometry protein identification was performed on all samples. Applicability of protein biomarkers was evaluated by six different optimally parameterized machine learning algorithms: Support Vector Machine, Recursive Partitioning, Random Forest, Naive Bayes, Logistic Regression, K-Nearest Neighbor. Results Out of the six investigated machine learning algorithms the result of Recursive Partitioning proved to be the most accurate. The performance of the system realizing the above algorithm reached 74% sensitivity and 48% specificity. Conclusions Protein biomarkers selected and classified with machine learning algorithms alone are at present not recommended for screening purposes because of low specificity and sensitivity values. This tool can be potentially used to improve the results of image processing methods as a complementary tool in automatic or semiautomatic systems. PMID:23919537
An Examination of Diameter Density Prediction with k-NN and Airborne Lidar
Strunk, Jacob L.; Gould, Peter J.; Packalen, Petteri; ...
2017-11-16
While lidar-based forest inventory methods have been widely demonstrated, performances of methods to predict tree diameters with airborne lidar (lidar) are not well understood. One cause for this is that the performance metrics typically used in studies for prediction of diameters can be difficult to interpret, and may not support comparative inferences between sampling designs and study areas. To help with this problem we propose two indices and use them to evaluate a variety of lidar and k nearest neighbor (k-NN) strategies for prediction of tree diameter distributions. The indices are based on the coefficient of determination ( R 2),more » and root mean square deviation (RMSD). Both of the indices are highly interpretable, and the RMSD-based index facilitates comparisons with alternative (non-lidar) inventory strategies, and with projects in other regions. K-NN diameter distribution prediction strategies were examined using auxiliary lidar for 190 training plots distribute across the 800 km 2 Savannah River Site in South Carolina, USA. In conclusion, we evaluate the performance of k-NN with respect to distance metrics, number of neighbors, predictor sets, and response sets. K-NN and lidar explained 80% of variability in diameters, and Mahalanobis distance with k = 3 neighbors performed best according to a number of criteria.« less
An Examination of Diameter Density Prediction with k-NN and Airborne Lidar
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strunk, Jacob L.; Gould, Peter J.; Packalen, Petteri
While lidar-based forest inventory methods have been widely demonstrated, performances of methods to predict tree diameters with airborne lidar (lidar) are not well understood. One cause for this is that the performance metrics typically used in studies for prediction of diameters can be difficult to interpret, and may not support comparative inferences between sampling designs and study areas. To help with this problem we propose two indices and use them to evaluate a variety of lidar and k nearest neighbor (k-NN) strategies for prediction of tree diameter distributions. The indices are based on the coefficient of determination ( R 2),more » and root mean square deviation (RMSD). Both of the indices are highly interpretable, and the RMSD-based index facilitates comparisons with alternative (non-lidar) inventory strategies, and with projects in other regions. K-NN diameter distribution prediction strategies were examined using auxiliary lidar for 190 training plots distribute across the 800 km 2 Savannah River Site in South Carolina, USA. In conclusion, we evaluate the performance of k-NN with respect to distance metrics, number of neighbors, predictor sets, and response sets. K-NN and lidar explained 80% of variability in diameters, and Mahalanobis distance with k = 3 neighbors performed best according to a number of criteria.« less
Design of Quantum Algorithms Using Physics Tools
2014-06-02
invariant spin-1 chain that has a unique highly entangled ground state and exhibits some signatures of critical behavior. The entanglement entropy of one... entangled and found them hard to approximate using the MPS method. In follow on work Shor along with Sergey Bravyi, Libor Caha, Movassagh and Nagaj...They asked how entangled the ground state of a FF quantum spin-s chain with nearest-neighbor interactions can be for small values of s. While FF spin-1
Symmetrized Nearest Neighbor Regression Estimates.
1987-12-01
TELEPHONE NUMBER 22C. OFFICE SYMBO0L (Inetude A me. Code) Major Brian Woodruff 1(202) 767-5026 1 Dr -’ 00 PORN 147,303- APR EDI1TION OF I JAN 73 IS...in tenth of a pence) in 1973. The data come from the Family Ex- penditure Survey, Annual Base Tapes 1968-198S, Department of Employment, Statistics...Statistics, 13, 1465- 1481. Hildenbrand, K. and Hildenbrand, W. (1986). On the mean income effect: a data analysis of the U.K. family expenditure
Tomcho, Jeremy C; Tillman, Magdalena R; Znosko, Brent M
2015-09-01
Predicting the secondary structure of RNA is an intermediate in predicting RNA three-dimensional structure. Commonly, determining RNA secondary structure from sequence uses free energy minimization and nearest neighbor parameters. Current algorithms utilize a sequence-independent model to predict free energy contributions of dinucleotide bulges. To determine if a sequence-dependent model would be more accurate, short RNA duplexes containing dinucleotide bulges with different sequences and nearest neighbor combinations were optically melted to derive thermodynamic parameters. These data suggested energy contributions of dinucleotide bulges were sequence-dependent, and a sequence-dependent model was derived. This model assigns free energy penalties based on the identity of nucleotides in the bulge (3.06 kcal/mol for two purines, 2.93 kcal/mol for two pyrimidines, 2.71 kcal/mol for 5'-purine-pyrimidine-3', and 2.41 kcal/mol for 5'-pyrimidine-purine-3'). The predictive model also includes a 0.45 kcal/mol penalty for an A-U pair adjacent to the bulge and a -0.28 kcal/mol bonus for a G-U pair adjacent to the bulge. The new sequence-dependent model results in predicted values within, on average, 0.17 kcal/mol of experimental values, a significant improvement over the sequence-independent model. This model and new experimental values can be incorporated into algorithms that predict RNA stability and secondary structure from sequence.
Distribution of Steps with Finite-Range Interactions: Analytic Approximations and Numerical Results
NASA Astrophysics Data System (ADS)
GonzáLez, Diego Luis; Jaramillo, Diego Felipe; TéLlez, Gabriel; Einstein, T. L.
2013-03-01
While most Monte Carlo simulations assume only nearest-neighbor steps interact elastically, most analytic frameworks (especially the generalized Wigner distribution) posit that each step elastically repels all others. In addition to the elastic repulsions, we allow for possible surface-state-mediated interactions. We investigate analytically and numerically how next-nearest neighbor (NNN) interactions and, more generally, interactions out to q'th nearest neighbor alter the form of the terrace-width distribution and of pair correlation functions (i.e. the sum over n'th neighbor distribution functions, which we investigated recently.[2] For physically plausible interactions, we find modest changes when NNN interactions are included and generally negligible changes when more distant interactions are allowed. We discuss methods for extracting from simulated experimental data the characteristic scale-setting terms in assumed potential forms.
NASA Astrophysics Data System (ADS)
Qian, Jing; Zhang, Lu; Zhai, Jingjing; Zhang, Weiping
2015-12-01
We theoretically investigate the dynamical phase diagram of a one-dimensional chain of laser-excited two-species Rydberg atoms. The existence of a variety of unique dynamical phases in the experimentally achievable parameter region is predicted under the mean-field approximation, and the change in those phases when the effect of the next-nearest-neighbor interaction is included is further discussed. In particular, we find that the com-petition of the strong Rydberg-Rydberg interactions and the optical excitation imbalance can lead to the presence of complex multiple chaotic phases, which are highly sensitive to the initial Rydberg-state population and the strength of the next-nearest-neighbor interactions.
Matrix-valued Boltzmann equation for the nonintegrable Hubbard chain.
Fürst, Martin L R; Mendl, Christian B; Spohn, Herbert
2013-07-01
The standard Fermi-Hubbard chain becomes nonintegrable by adding to the nearest neighbor hopping additional longer range hopping amplitudes. We assume that the quartic interaction is weak and investigate numerically the dynamics of the chain on the level of the Boltzmann type kinetic equation. Only the spatially homogeneous case is considered. We observe that the huge degeneracy of stationary states in the case of nearest neighbor hopping is lost and the convergence to the thermal Fermi-Dirac distribution is restored. The convergence to equilibrium is exponentially fast. However for small next-nearest neighbor hopping amplitudes one has a rapid relaxation towards the manifold of quasistationary states and slow relaxation to the final equilibrium state.
NASA Astrophysics Data System (ADS)
Niu, Yuekun; Sun, Jian; Ni, Yu; Song, Yun
2018-06-01
The dynamical mean-field theory is employed to study the orbital-selective Mott transition (OSMT) of the two-orbital Hubbard model with nearest neighbor hopping and next-nearest neighbor (NNN) hopping. The NNN hopping breaks the particle-hole symmetry at half filling and gives rise to an asymmetric density of states (DOS). Our calculations show that the broken symmetry of DOS benefits the OSMT, where the region of the orbital-selective Mott phase significantly extends with the increasing NNN hopping integral. We also find that Hund's rule coupling promotes OSMT by blocking the orbital fluctuations, but the influence of NNN hopping is more remarkable.
NASA Astrophysics Data System (ADS)
Lee, T. R.; Wood, W. T.; Dale, J.
2017-12-01
Empirical and theoretical models of sub-seafloor organic matter transformation, degradation and methanogenesis require estimates of initial seafloor total organic carbon (TOC). This subsurface methane, under the appropriate geophysical and geochemical conditions may manifest as methane hydrate deposits. Despite the importance of seafloor TOC, actual observations of TOC in the world's oceans are sparse and large regions of the seafloor yet remain unmeasured. To provide an estimate in areas where observations are limited or non-existent, we have implemented interpolation techniques that rely on existing data sets. Recent geospatial analyses have provided accurate accounts of global geophysical and geochemical properties (e.g. crustal heat flow, seafloor biomass, porosity) through machine learning interpolation techniques. These techniques find correlations between the desired quantity (in this case TOC) and other quantities (predictors, e.g. bathymetry, distance from coast, etc.) that are more widely known. Predictions (with uncertainties) of seafloor TOC in regions lacking direct observations are made based on the correlations. Global distribution of seafloor TOC at 1 x 1 arc-degree resolution was estimated from a dataset of seafloor TOC compiled by Seiter et al. [2004] and a non-parametric (i.e. data-driven) machine learning algorithm, specifically k-nearest neighbors (KNN). Built-in predictor selection and a ten-fold validation technique generated statistically optimal estimates of seafloor TOC and uncertainties. In addition, inexperience was estimated. Inexperience is effectively the distance in parameter space to the single nearest neighbor, and it indicates geographic locations where future data collection would most benefit prediction accuracy. These improved geospatial estimates of TOC in data deficient areas will provide new constraints on methane production and subsequent methane hydrate accumulation.
X-ray absorption studies of chlorine valence and local environments in borosilicate waste glasses
NASA Astrophysics Data System (ADS)
McKeown, David A.; Gan, Hao; Pegg, Ian L.; Stolte, W. C.; Demchenko, I. N.
2011-01-01
Chlorine (Cl) is a constituent of certain types of nuclear wastes and its presence can affect the physical and chemical properties of silicate melts and glasses developed for the immobilization of such wastes. Cl K-edge X-ray absorption spectra (XAS) were collected and analyzed to characterize the unknown Cl environments in borosilicate waste glass formulations, ranging in Cl-content from 0.23 to 0.94 wt.%. Both X-ray absorption near edge structure (XANES) and extended X-ray absorption fine structure (EXAFS) data for the glasses show trends dependent on calcium (Ca) content. Near-edge data for the Ca-rich glasses are most similar to the Cl XANES of CaCl 2, where Cl - is coordinated to three Ca atoms, while the XANES for the Ca-poor glasses are more similar to the mineral davyne, where Cl is most commonly coordinated to two Ca in one site, as well as Cl and oxygen nearest-neighbors in other sites. With increasing Ca content in the glass, Cl XANES for the glasses approach that for CaCl 2, indicating more Ca nearest-neighbors around Cl. Reliable structural information obtained from the EXAFS data for the glasses is limited, however, to Cl sbnd Cl, Cl sbnd O, and Cl sbnd Na distances; Cl sbnd Ca contributions could not be fit to the glass data, due to the narrow k-space range available for analysis. Structural models that best fit the glass EXAFS data include Cl sbnd Cl, Cl sbnd O, and Cl sbnd Na correlations, where Cl sbnd O and Cl sbnd Na distances decrease by approximately 0.16 Å as glass Ca content increases. XAS for the glasses indicates Cl - is found in multiple sites where most Cl-sites have Ca neighbors, with oxygen, and possibly, Na second-nearest neighbors. EXAFS analyses suggest that Cl sbnd Cl environments may also exist in the glasses in minor amounts. These results are generally consistent with earlier findings for silicate glasses, where Cl - was associated with Ca 2+ and Na + in network modifier sites.
Automated Identification of Abnormal Adult EEGs
López, S.; Suarez, G.; Jungreis, D.; Obeid, I.; Picone, J.
2016-01-01
The interpretation of electroencephalograms (EEGs) is a process that is still dependent on the subjective analysis of the examiners. Though interrater agreement on critical events such as seizures is high, it is much lower on subtler events (e.g., when there are benign variants). The process used by an expert to interpret an EEG is quite subjective and hard to replicate by machine. The performance of machine learning technology is far from human performance. We have been developing an interpretation system, AutoEEG, with a goal of exceeding human performance on this task. In this work, we are focusing on one of the early decisions made in this process – whether an EEG is normal or abnormal. We explore two baseline classification algorithms: k-Nearest Neighbor (kNN) and Random Forest Ensemble Learning (RF). A subset of the TUH EEG Corpus was used to evaluate performance. Principal Components Analysis (PCA) was used to reduce the dimensionality of the data. kNN achieved a 41.8% detection error rate while RF achieved an error rate of 31.7%. These error rates are significantly lower than those obtained by random guessing based on priors (49.5%). The majority of the errors were related to misclassification of normal EEGs. PMID:27195311
Streamflow variability and classification using false nearest neighbor method
NASA Astrophysics Data System (ADS)
Vignesh, R.; Jothiprakash, V.; Sivakumar, B.
2015-12-01
Understanding regional streamflow dynamics and patterns continues to be a challenging problem. The present study introduces the false nearest neighbor (FNN) algorithm, a nonlinear dynamic-based method, to examine the spatial variability of streamflow over a region. The FNN method is a dimensionality-based approach, where the dimension of the time series represents its variability. The method uses phase space reconstruction and nearest neighbor concepts, and identifies false neighbors in the reconstructed phase space. The FNN method is applied to monthly streamflow data monitored over a period of 53 years (1950-2002) in an extensive network of 639 stations in the contiguous United States (US). Since selection of delay time in phase space reconstruction may influence the FNN outcomes, analysis is carried out for five different delay time values: monthly, seasonal, and annual separation of data as well as delay time values obtained using autocorrelation function (ACF) and average mutual information (AMI) methods. The FNN dimensions for the 639 streamflow series are generally identified to range from 4 to 12 (with very few exceptional cases), indicating a wide range of variability in the dynamics of streamflow across the contiguous US. However, the FNN dimensions for a majority of the streamflow series are found to be low (less than or equal to 6), suggesting low level of complexity in streamflow dynamics in most of the individual stations and over many sub-regions. The FNN dimension estimates also reveal that streamflow dynamics in the western parts of the US (including far west, northwestern, and southwestern parts) generally exhibit much greater variability compared to that in the eastern parts of the US (including far east, northeastern, and southeastern parts), although there are also differences among 'pockets' within these regions. These results are useful for identification of appropriate model complexity at individual stations, patterns across regions and sub-regions, interpolation and extrapolation of data, and catchment classification. An attempt is also made to relate the FNN dimensions with catchment characteristics and streamflow statistical properties.
Phase diagram and quantum order by disorder in the Kitaev K1-K2 honeycomb magnet
NASA Astrophysics Data System (ADS)
Rousochatzakis, Ioannis; Reuther, Johannes; Thomale, Ronny; Rachel, Stephan; Perkins, Natalia
We show that the topological Kitaev spin liquid on the honeycomb lattice is extremely fragile against the second neighbor Kitaev coupling K2, which has been recently identified as the dominant perturbation away from the nearest neighbor model in iridate Na2IrO3, and may also play a role in α-RuCl3. This coupling explains naturally the zig-zag ordering and the special entanglement between real and spin space observed recently in Na2IrO3. The minimal K1-K2 model that we present here holds in addition the unique property that the classical and quantum phase diagrams and their respective order-by-disorder mechanisms are qualitatively different due to their fundamentally different symmetry structure. Nsf DMR-1511768; Freie Univ. Berlin Excellence Initiative of German Research Foundation; European Research Council, ERC-StG-336012; DFG-SFB 1170; DFG-SFB 1143, DFG-SPP 1666, and Helmholtz association VI-521.
Next nearest neighbors sites and the reactivity of the CO NO surface reaction
NASA Astrophysics Data System (ADS)
Cortés, Joaquín.; Valencia, Eliana
1998-04-01
Using Monte Carlo experiments of the reduction of NO by CO, a study is made of the effect on reactivity due to the formation of N 2O and to the increased coordination of the sites considering the next nearest neighbors sites (nnn) in a square lattice of superficial sites.
NASA Astrophysics Data System (ADS)
Ma, Li-Yuan; Ji, Jia-Liang; Xu, Zong-Wei; Zhu, Zuo-Nong
2018-03-01
We study a nonintegrable discrete nonlinear Schrödinger (dNLS) equation with the term of nonlinear nearest-neighbor interaction occurred in nonlinear optical waveguide arrays. By using discrete Fourier transformation, we obtain numerical approximations of stationary and travelling solitary wave solutions of the nonintegrable dNLS equation. The analysis of stability of stationary solitary waves is performed. It is shown that the nonlinear nearest-neighbor interaction term has great influence on the form of solitary wave. The shape of solitary wave is important in the electric field propagating. If we neglect the nonlinear nearest-neighbor interaction term, much important information in the electric field propagating may be missed. Our numerical simulation also demonstrates the difference of chaos phenomenon between the nonintegrable dNLS equation with nonlinear nearest-neighbor interaction and another nonintegrable dNLS equation without the term. Project supported by the National Natural Science Foundation of China (Grant Nos. 11671255 and 11701510), the Ministry of Economy and Competitiveness of Spain (Grant No. MTM2016-80276-P (AEI/FEDER, EU)), and the China Postdoctoral Science Foundation (Grant No. 2017M621964).
Large margin nearest neighbor classifiers.
Domeniconi, Carlotta; Gunopulos, Dimitrios; Peng, Jing
2005-07-01
The nearest neighbor technique is a simple and appealing approach to addressing classification problems. It relies on the assumption of locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with a finite number of examples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. The employment of a locally adaptive metric becomes crucial in order to keep class conditional probabilities close to uniform, thereby minimizing the bias of estimates. We propose a technique that computes a locally flexible metric by means of support vector machines (SVMs). The decision function constructed by SVMs is used to determine the most discriminant direction in a neighborhood around the query. Such a direction provides a local feature weighting scheme. We formally show that our method increases the margin in the weighted space where classification takes place. Moreover, our method has the important advantage of online computational efficiency over competing locally adaptive techniques for nearest neighbor classification. We demonstrate the efficacy of our method using both real and simulated data.
Distributed memory approaches for robotic neural controllers
NASA Technical Reports Server (NTRS)
Jorgensen, Charles C.
1990-01-01
The suitability is explored of two varieties of distributed memory neutral networks as trainable controllers for a simulated robotics task. The task requires that two cameras observe an arbitrary target point in space. Coordinates of the target on the camera image planes are passed to a neural controller which must learn to solve the inverse kinematics of a manipulator with one revolute and two prismatic joints. Two new network designs are evaluated. The first, radial basis sparse distributed memory (RBSDM), approximates functional mappings as sums of multivariate gaussians centered around previously learned patterns. The second network types involved variations of Adaptive Vector Quantizers or Self Organizing Maps. In these networks, random N dimensional points are given local connectivities. They are then exposed to training patterns and readjust their locations based on a nearest neighbor rule. Both approaches are tested based on their ability to interpolate manipulator joint coordinates for simulated arm movement while simultaneously performing stereo fusion of the camera data. Comparisons are made with classical k-nearest neighbor pattern recognition techniques.
Exchange interactions in CdMnTe/CdMgTe quantum wells under high magnetic fields
NASA Astrophysics Data System (ADS)
Yasuhira, T.; Uchida, K.; Matsuda, Y. H.; Miura, N.; Kuroda, S.; Takita, K.
2002-03-01
The sp-d exchange interaction Jsp-d and the exchange interaction between the nearest neighbor Mn ions JNN were studied by magneto-photoluminescence spectra of excitons in CdMnTe/CdMgTe quantum wells in pulsed high magnetic fields up to 45 T. The magnitude of Jsp-d estimated from the observed Zeeman splitting was found to decrease as the quantum well width was decreased. The decrease is partly due to the penetration of the electron and the hole wave functions into the non-magnetic CdMgTe barrier layers, and partly due to the k-dependence of the exchange interaction. It was found that the latter effect is much larger than theoretically predicted. The observed features are well explained by a model assuming the interface disorder within some thickness near the interface. In contrast to Jsp-d, the nearest neighbor interaction JNN estimated from the steps in the photoluminescence peak was found to be independent of the well width.
NASA Astrophysics Data System (ADS)
Yang, Dongzheng; Hu, Xixi; Zhang, Dong H.; Xie, Daiqian
2018-02-01
Solving the time-independent close coupling equations of a diatom-diatom inelastic collision system by using the rigorous close-coupling approach is numerically difficult because of its expensive matrix manipulation. The coupled-states approximation decouples the centrifugal matrix by neglecting the important Coriolis couplings completely. In this work, a new approximation method based on the coupled-states approximation is presented and applied to time-independent quantum dynamic calculations. This approach only considers the most important Coriolis coupling with the nearest neighbors and ignores weaker Coriolis couplings with farther K channels. As a result, it reduces the computational costs without a significant loss of accuracy. Numerical tests for para-H2+ortho-H2 and para-H2+HD inelastic collision were carried out and the results showed that the improved method dramatically reduces the errors due to the neglect of the Coriolis couplings in the coupled-states approximation. This strategy should be useful in quantum dynamics of other systems.
Rivas, Elena; Lang, Raymond; Eddy, Sean R
2012-02-01
The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.
Rivas, Elena; Lang, Raymond; Eddy, Sean R.
2012-01-01
The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308
Using Machine Learning for Advanced Anomaly Detection and Classification
NASA Astrophysics Data System (ADS)
Lane, B.; Poole, M.; Camp, M.; Murray-Krezan, J.
2016-09-01
Machine Learning (ML) techniques have successfully been used in a wide variety of applications to automatically detect and potentially classify changes in activity, or a series of activities by utilizing large amounts data, sometimes even seemingly-unrelated data. The amount of data being collected, processed, and stored in the Space Situational Awareness (SSA) domain has grown at an exponential rate and is now better suited for ML. This paper describes development of advanced algorithms to deliver significant improvements in characterization of deep space objects and indication and warning (I&W) using a global network of telescopes that are collecting photometric data on a multitude of space-based objects. The Phase II Air Force Research Laboratory (AFRL) Small Business Innovative Research (SBIR) project Autonomous Characterization Algorithms for Change Detection and Characterization (ACDC), contracted to ExoAnalytic Solutions Inc. is providing the ability to detect and identify photometric signature changes due to potential space object changes (e.g. stability, tumble rate, aspect ratio), and correlate observed changes to potential behavioral changes using a variety of techniques, including supervised learning. Furthermore, these algorithms run in real-time on data being collected and processed by the ExoAnalytic Space Operations Center (EspOC), providing timely alerts and warnings while dynamically creating collection requirements to the EspOC for the algorithms that generate higher fidelity I&W. This paper will discuss the recently implemented ACDC algorithms, including the general design approach and results to date. The usage of supervised algorithms, such as Support Vector Machines, Neural Networks, k-Nearest Neighbors, etc., and unsupervised algorithms, for example k-means, Principle Component Analysis, Hierarchical Clustering, etc., and the implementations of these algorithms is explored. Results of applying these algorithms to EspOC data both in an off-line "pattern of life" analysis as well as using the algorithms on-line in real-time, meaning as data is collected, will be presented. Finally, future work in applying ML for SSA will be discussed.
Mismatch and G-Stack Modulated Probe Signals on SNP Microarrays
Binder, Hans; Fasold, Mario; Glomb, Torsten
2009-01-01
Background Single nucleotide polymorphism (SNP) arrays are important tools widely used for genotyping and copy number estimation. This technology utilizes the specific affinity of fragmented DNA for binding to surface-attached oligonucleotide DNA probes. We analyze the variability of the probe signals of Affymetrix GeneChip SNP arrays as a function of the probe sequence to identify relevant sequence motifs which potentially cause systematic biases of genotyping and copy number estimates. Methodology/Principal Findings The probe design of GeneChip SNP arrays enables us to disentangle different sources of intensity modulations such as the number of mismatches per duplex, matched and mismatched base pairings including nearest and next-nearest neighbors and their position along the probe sequence. The effect of probe sequence was estimated in terms of triple-motifs with central matches and mismatches which include all 256 combinations of possible base pairings. The probe/target interactions on the chip can be decomposed into nearest neighbor contributions which correlate well with free energy terms of DNA/DNA-interactions in solution. The effect of mismatches is about twice as large as that of canonical pairings. Runs of guanines (G) and the particular type of mismatched pairings formed in cross-allelic probe/target duplexes constitute sources of systematic biases of the probe signals with consequences for genotyping and copy number estimates. The poly-G effect seems to be related to the crowded arrangement of probes which facilitates complex formation of neighboring probes with at minimum three adjacent G's in their sequence. Conclusions The applied method of “triple-averaging” represents a model-free approach to estimate the mean intensity contributions of different sequence motifs which can be applied in calibration algorithms to correct signal values for sequence effects. Rules for appropriate sequence corrections are suggested. PMID:19924253
Ronald E. McRoberts; Mark D. Nelson; Daniel G. Wendt
2002-01-01
For two large study areas in Minnesota, USA, stratified estimation using classified Landsat Thematic Mapper satellite imagery as the basis for stratification was used to estimate forest area. Measurements of forest inventory plots obtained for a 12-month period in 1998 and 1999 were used as the source of data for within-stratum estimates. These measurements further...
Stratified estimates of forest area using the k-nearest neighbors technique and satellite imagery
Ronald E. McRoberts; Mark D. Nelson; Daniel Wendt
2002-01-01
For two study areas in Minnesota, stratified estimation using Landsat Thematic Mapper satellite imagery as the basis for stratification was used to estimate forest area. Measurements of forest inventory plots obtained for a 12-month period in 1998 and 1999 were used as the source of data for within-strata estimates. These measurements further served as calibration data...
A k-nearest neighbor approach for estimation of single-tree biomass
Lutz Fehrmann; Christoph Kleinn
2007-01-01
Allometric biomass models are typically site and species specific. They are mostly based on a low number of independent variables such as diameter at breast height and tree height. Because of relatively small datasets, their validity is limited to the set of conditions of the study, such as site conditions and diameter range. One challenge in the context of the current...
Collective Behaviors of Mobile Robots Beyond the Nearest Neighbor Rules With Switching Topology.
Ning, Boda; Han, Qing-Long; Zuo, Zongyu; Jin, Jiong; Zheng, Jinchuan
2018-05-01
This paper is concerned with the collective behaviors of robots beyond the nearest neighbor rules, i.e., dispersion and flocking, when robots interact with others by applying an acute angle test (AAT)-based interaction rule. Different from a conventional nearest neighbor rule or its variations, the AAT-based interaction rule allows interactions with some far-neighbors and excludes unnecessary nearest neighbors. The resulting dispersion and flocking hold the advantages of scalability, connectivity, robustness, and effective area coverage. For the dispersion, a spring-like controller is proposed to achieve collision-free coordination. With switching topology, a new fixed-time consensus-based energy function is developed to guarantee the system stability. An upper bound of settling time for energy consensus is obtained, and a uniform time interval is accordingly set so that energy distribution is conducted in a fair manner. For the flocking, based on a class of generalized potential functions taking nonsmooth switching into account, a new controller is proposed to ensure that the same velocity for all robots is eventually reached. A co-optimizing problem is further investigated to accomplish additional tasks, such as enhancing communication performance, while maintaining the collective behaviors of mobile robots. Simulation results are presented to show the effectiveness of the theoretical results.
Improved Fuzzy K-Nearest Neighbor Using Modified Particle Swarm Optimization
NASA Astrophysics Data System (ADS)
Jamaluddin; Siringoringo, Rimbun
2017-12-01
Fuzzy k-Nearest Neighbor (FkNN) is one of the most powerful classification methods. The presence of fuzzy concepts in this method successfully improves its performance on almost all classification issues. The main drawbackof FKNN is that it is difficult to determine the parameters. These parameters are the number of neighbors (k) and fuzzy strength (m). Both parameters are very sensitive. This makes it difficult to determine the values of ‘m’ and ‘k’, thus making FKNN difficult to control because no theories or guides can deduce how proper ‘m’ and ‘k’ should be. This study uses Modified Particle Swarm Optimization (MPSO) to determine the best value of ‘k’ and ‘m’. MPSO is focused on the Constriction Factor Method. Constriction Factor Method is an improvement of PSO in order to avoid local circumstances optima. The model proposed in this study was tested on the German Credit Dataset. The test of the data/The data test has been standardized by UCI Machine Learning Repository which is widely applied to classification problems. The application of MPSO to the determination of FKNN parameters is expected to increase the value of classification performance. Based on the experiments that have been done indicating that the model offered in this research results in a better classification performance compared to the Fk-NN model only. The model offered in this study has an accuracy rate of 81%, while. With using Fk-NN model, it has the accuracy of 70%. At the end is done comparison of research model superiority with 2 other classification models;such as Naive Bayes and Decision Tree. This research model has a better performance level, where Naive Bayes has accuracy 75%, and the decision tree model has 70%
Silva, Carlos Alberto; Klauberg, Carine; Hudak, Andrew T; Vierling, Lee A; Liesenberg, Veraldo; Bernett, Luiz G; Scheraiber, Clewerson F; Schoeninger, Emerson R
2018-01-01
Accurate forest inventory is of great economic importance to optimize the entire supply chain management in pulp and paper companies. The aim of this study was to estimate stand dominate and mean heights (HD and HM) and tree density (TD) of Pinus taeda plantations located in South Brazil using in-situ measurements, airborne Light Detection and Ranging (LiDAR) data and the non- k-nearest neighbor (k-NN) imputation. Forest inventory attributes and LiDAR derived metrics were calculated at 53 regular sample plots and we used imputation models to retrieve the forest attributes at plot and landscape-levels. The best LiDAR-derived metrics to predict HD, HM and TD were H99TH, HSD, SKE and HMIN. The Imputation model using the selected metrics was more effective for retrieving height than tree density. The model coefficients of determination (adj.R2) and a root mean squared difference (RMSD) for HD, HM and TD were 0.90, 0.94, 0.38m and 6.99, 5.70, 12.92%, respectively. Our results show that LiDAR and k-NN imputation can be used to predict stand heights with high accuracy in Pinus taeda. However, furthers studies need to be realized to improve the accuracy prediction of TD and to evaluate and compare the cost of acquisition and processing of LiDAR data against the conventional inventory procedures.
Diagnosis of Tempromandibular Disorders Using Local Binary Patterns
Haghnegahdar, A.A.; Kolahi, S.; Khojastepour, L.; Tajeripour, F.
2018-01-01
Background: Temporomandibular joint disorder (TMD) might be manifested as structural changes in bone through modification, adaptation or direct destruction. We propose to use Local Binary Pattern (LBP) characteristics and histogram-oriented gradients on the recorded images as a diagnostic tool in TMD assessment. Material and Methods: CBCT images of 66 patients (132 joints) with TMD and 66 normal cases (132 joints) were collected and 2 coronal cut prepared from each condyle, although images were limited to head of mandibular condyle. In order to extract features of images, first we use LBP and then histogram of oriented gradients. To reduce dimensionality, the linear algebra Singular Value Decomposition (SVD) is applied to the feature vectors matrix of all images. For evaluation, we used K nearest neighbor (K-NN), Support Vector Machine, Naïve Bayesian and Random Forest classifiers. We used Receiver Operating Characteristic (ROC) to evaluate the hypothesis. Results: K nearest neighbor classifier achieves a very good accuracy (0.9242), moreover, it has desirable sensitivity (0.9470) and specificity (0.9015) results, when other classifiers have lower accuracy, sensitivity and specificity. Conclusion: We proposed a fully automatic approach to detect TMD using image processing techniques based on local binary patterns and feature extraction. K-NN has been the best classifier for our experiments in detecting patients from healthy individuals, by 92.42% accuracy, 94.70% sensitivity and 90.15% specificity. The proposed method can help automatically diagnose TMD at its initial stages. PMID:29732343
Acharya, P; Plashkevych, O; Morita, C; Yamada, S; Chattopadhyaya, J
2003-02-21
Direct intramolecular cation-pi interaction between phenyl and pyridinium moieties in 1a(+) has been experimentally evidenced through pH-dependent (1)H NMR titration. The basicity of the pyridinyl group (pK(a) 2.9) in 1a can be measured both from the pH-dependent chemical shifts of the pyridinyl protons as well as from the protons of the neighboring phenyl and methyl groups as a result of electrostatic interaction between the phenyl and the pyridinium ion in 1a(+) at the ground state. The net result of this nearest neighbor electrostatic interaction is that the pyridinium moiety in 1a becomes more basic (pK(a) 2.92) compared to that in the standard 2a (pK(a) 2.56) as a consequence of edge-to-face cation (pyridinium)-pi (phenyl) interaction, giving a free energy of stabilization (DeltaDeltaG(o)pKa) of -2.1 kJ mol(-1). The fact that the pH-dependent downfield shifts of the phenyl and methyl protons give the pK(a) of the pyridine moiety of 1a also suggests that the nearest neighbor cation (pyridinium)-pi (phenyl) interaction also steers the CH (methyl)-pi (phenyl) interaction in tandem. This means that the whole pyridine-phenyl-methyl system in 1a(+) is electronically coupled at the ground state, cross-modulating the physicochemical property of the next neighbor by using the electrostatics as the engine, and the origin of this electrostatics is a far away point in the molecule-the pyridinyl-nitrogen. The relative chemical shift changes and the pK(a) differences show that the cation (pyridinium)-pi (phenyl) interaction is indeed more stable (DeltaDeltaG(o)pKa = -2.1 kJ mol(-1)) than that of the CH (methyl)-pi (phenyl) interaction (DeltaDeltaG(o)pKa = -0.8 kJ mol(-1)). Since the pK(a) of the pyridine moiety in 1a is also obtained through the pH-dependent shifts of both phenyl and methyl protons, it suggests that the net electrostatic mediated charge transfer from the phenyl to the pyridinium and its effect on the CH (methyl)-pi (phenyl) interaction corresponds to DeltaG(o)pKa of the pyridinium ion (approximately 17.5 kJ mol(-1)), which means that the aromatic characters of the phenyl and the pyridinium rings in 1a(+) have been cross-modulated owing to the edge-to-face interaction proportional to this DeltaG(o)pKa change.
Huang, Yin-Fu; Wang, Chia-Ming; Liou, Sing-Wu
2013-01-01
A hybrid self-adaptive harmony search and back-propagation mining system was proposed to discover weighted patterns in human intron sequences. By testing the weights under a lazy nearest neighbor classifier, the numerical results revealed the significance of these weighted patterns. Comparing these weighted patterns with the popular intron consensus model, it is clear that the discovered weighted patterns make originally the ambiguous 5SS and 3SS header patterns more specific and concrete.
Wang, Chia-Ming; Liou, Sing-Wu
2013-01-01
A hybrid self-adaptive harmony search and back-propagation mining system was proposed to discover weighted patterns in human intron sequences. By testing the weights under a lazy nearest neighbor classifier, the numerical results revealed the significance of these weighted patterns. Comparing these weighted patterns with the popular intron consensus model, it is clear that the discovered weighted patterns make originally the ambiguous 5SS and 3SS header patterns more specific and concrete. PMID:23737711
Identity Recognition Algorithm Using Improved Gabor Feature Selection of Gait Energy Image
NASA Astrophysics Data System (ADS)
Chao, LIANG; Ling-yao, JIA; Dong-cheng, SHI
2017-01-01
This paper describes an effective gait recognition approach based on Gabor features of gait energy image. In this paper, the kernel Fisher analysis combined with kernel matrix is proposed to select dominant features. The nearest neighbor classifier based on whitened cosine distance is used to discriminate different gait patterns. The approach proposed is tested on the CASIA and USF gait databases. The results show that our approach outperforms other state of gait recognition approaches in terms of recognition accuracy and robustness.
Pattern recognition with parallel associative memory
NASA Technical Reports Server (NTRS)
Toth, Charles K.; Schenk, Toni
1990-01-01
An examination is conducted of the feasibility of searching targets in aerial photographs by means of a parallel associative memory (PAM) that is based on the nearest-neighbor algorithm; the Hamming distance is used as a measure of closeness, in order to discriminate patterns. Attention has been given to targets typically used for ground-control points. The method developed sorts out approximate target positions where precise localizations are needed, in the course of the data-acquisition process. The majority of control points in different images were correctly identified.
SOTXTSTREAM: Density-based self-organizing clustering of text streams.
Bryant, Avory C; Cios, Krzysztof J
2017-01-01
A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets.
Improved collaborative filtering recommendation algorithm of similarity measure
NASA Astrophysics Data System (ADS)
Zhang, Baofu; Yuan, Baoping
2017-05-01
The Collaborative filtering recommendation algorithm is one of the most widely used recommendation algorithm in personalized recommender systems. The key is to find the nearest neighbor set of the active user by using similarity measure. However, the methods of traditional similarity measure mainly focus on the similarity of user common rating items, but ignore the relationship between the user common rating items and all items the user rates. And because rating matrix is very sparse, traditional collaborative filtering recommendation algorithm is not high efficiency. In order to obtain better accuracy, based on the consideration of common preference between users, the difference of rating scale and score of common items, this paper presents an improved similarity measure method, and based on this method, a collaborative filtering recommendation algorithm based on similarity improvement is proposed. Experimental results show that the algorithm can effectively improve the quality of recommendation, thus alleviate the impact of data sparseness.
Hardware friendly probabilistic spiking neural network with long-term and short-term plasticity.
Hsieh, Hung-Yi; Tang, Kea-Tiong
2013-12-01
This paper proposes a probabilistic spiking neural network (PSNN) with unimodal weight distribution, possessing long- and short-term plasticity. The proposed algorithm is derived by both the arithmetic gradient decent calculation and bioinspired algorithms. The algorithm is benchmarked by the Iris and Wisconsin breast cancer (WBC) data sets. The network features fast convergence speed and high accuracy. In the experiment, the PSNN took not more than 40 epochs for convergence. The average testing accuracy for Iris and WBC data is 96.7% and 97.2%, respectively. To test the usefulness of the PSNN to real world application, the PSNN was also tested with the odor data, which was collected by our self-developed electronic nose (e-nose). Compared with the algorithm (K-nearest neighbor) that has the highest classification accuracy in the e-nose for the same odor data, the classification accuracy of the PSNN is only 1.3% less but the memory requirement can be reduced at least 40%. All the experiments suggest that the PSNN is hardware friendly. First, it requires only nine-bits weight resolution for training and testing. Second, the PSNN can learn complex data sets with a little number of neurons that in turn reduce the cost of VLSI implementation. In addition, the algorithm is insensitive to synaptic noise and the parameter variation induced by the VLSI fabrication. Therefore, the algorithm can be implemented by either software or hardware, making it suitable for wider application.
Multi-strategy based quantum cost reduction of linear nearest-neighbor quantum circuit
NASA Astrophysics Data System (ADS)
Tan, Ying-ying; Cheng, Xue-yun; Guan, Zhi-jin; Liu, Yang; Ma, Haiying
2018-03-01
With the development of reversible and quantum computing, study of reversible and quantum circuits has also developed rapidly. Due to physical constraints, most quantum circuits require quantum gates to interact on adjacent quantum bits. However, many existing quantum circuits nearest-neighbor have large quantum cost. Therefore, how to effectively reduce quantum cost is becoming a popular research topic. In this paper, we proposed multiple optimization strategies to reduce the quantum cost of the circuit, that is, we reduce quantum cost from MCT gates decomposition, nearest neighbor and circuit simplification, respectively. The experimental results show that the proposed strategies can effectively reduce the quantum cost, and the maximum optimization rate is 30.61% compared to the corresponding results.
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.
Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G
2017-09-01
To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
Indexing Volumetric Shapes with Matching and Packing
Koes, David Ryan; Camacho, Carlos J.
2014-01-01
We describe a novel algorithm for bulk-loading an index with high-dimensional data and apply it to the problem of volumetric shape matching. Our matching and packing algorithm is a general approach for packing data according to a similarity metric. First an approximate k-nearest neighbor graph is constructed using vantage-point initialization, an improvement to previous work that decreases construction time while improving the quality of approximation. Then graph matching is iteratively performed to pack related items closely together. The end result is a dense index with good performance. We define a new query specification for shape matching that uses minimum and maximum shape constraints to explicitly specify the spatial requirements of the desired shape. This specification provides a natural language for performing volumetric shape matching and is readily supported by the geometry-based similarity search (GSS) tree, an indexing structure that maintains explicit representations of volumetric shape. We describe our implementation of a GSS tree for volumetric shape matching and provide a comprehensive evaluation of parameter sensitivity, performance, and scalability. Compared to previous bulk-loading algorithms, we find that matching and packing can construct a GSS-tree index in the same amount of time that is denser, flatter, and better performing, with an observed average performance improvement of 2X. PMID:26085707
Simulations of linear and Hamming codes using SageMath
NASA Astrophysics Data System (ADS)
Timur, Tahta D.; Adzkiya, Dieky; Soleha
2018-03-01
Digital data transmission over a noisy channel could distort the message being transmitted. The goal of coding theory is to ensure data integrity, that is, to find out if and where this noise has distorted the message and what the original message was. Data transmission consists of three stages: encoding, transmission, and decoding. Linear and Hamming codes are codes that we discussed in this work, where encoding algorithms are parity check and generator matrix, and decoding algorithms are nearest neighbor and syndrome. We aim to show that we can simulate these processes using SageMath software, which has built-in class of coding theory in general and linear codes in particular. First we consider the message as a binary vector of size k. This message then will be encoded to a vector with size n using given algorithms. And then a noisy channel with particular value of error probability will be created where the transmission will took place. The last task would be decoding, which will correct and revert the received message back to the original message whenever possible, that is, if the number of error occurred is smaller or equal to the correcting radius of the code. In this paper we will use two types of data for simulations, namely vector and text data.
A three-parameter model for classifying anurans into four genera based on advertisement calls.
Gingras, Bruno; Fitch, William Tecumseh
2013-01-01
The vocalizations of anurans are innate in structure and may therefore contain indicators of phylogenetic history. Thus, advertisement calls of species which are more closely related phylogenetically are predicted to be more similar than those of distant species. This hypothesis was evaluated by comparing several widely used machine-learning algorithms. Recordings of advertisement calls from 142 species belonging to four genera were analyzed. A logistic regression model, using mean values for dominant frequency, coefficient of variation of root-mean square energy, and spectral flux, correctly classified advertisement calls with regard to genus with an accuracy above 70%. Similar accuracy rates were obtained using these parameters with a support vector machine model, a K-nearest neighbor algorithm, and a multivariate Gaussian distribution classifier, whereas a Gaussian mixture model performed slightly worse. In contrast, models based on mel-frequency cepstral coefficients did not fare as well. Comparable accuracy levels were obtained on out-of-sample recordings from 52 of the 142 original species. The results suggest that a combination of low-level acoustic attributes is sufficient to discriminate efficiently between the vocalizations of these four genera, thus supporting the initial premise and validating the use of high-throughput algorithms on animal vocalizations to evaluate phylogenetic hypotheses.
A Novel Topology Control Approach to Maintain the Node Degree in Dynamic Wireless Sensor Networks
Huang, Yuanjiang; Martínez, José-Fernán; Díaz, Vicente Hernández; Sendra, Juana
2014-01-01
Topology control is an important technique to improve the connectivity and the reliability of Wireless Sensor Networks (WSNs) by means of adjusting the communication range of wireless sensor nodes. In this paper, a novel Fuzzy-logic Topology Control (FTC) is proposed to achieve any desired average node degree by adaptively changing communication range, thus improving the network connectivity, which is the main target of FTC. FTC is a fully localized control algorithm, and does not rely on location information of neighbors. Instead of designing membership functions and if-then rules for fuzzy-logic controller, FTC is constructed from the training data set to facilitate the design process. FTC is proved to be accurate, stable and has short settling time. In order to compare it with other representative localized algorithms (NONE, FLSS, k-Neighbor and LTRT), FTC is evaluated through extensive simulations. The simulation results show that: firstly, similar to k-Neighbor algorithm, FTC is the best to achieve the desired average node degree as node density varies; secondly, FTC is comparable to FLSS and k-Neighbor in terms of energy-efficiency, but is better than LTRT and NONE; thirdly, FTC has the lowest average maximum communication range than other algorithms, which indicates that the most energy-consuming node in the network consumes the lowest power. PMID:24608008
Tenório, Josceli Maria; Hummel, Anderson Diniz; Cohrs, Frederico Molina; Sdepanian, Vera Lucia; Pisa, Ivan Torres; de Fátima Marin, Heimar
2013-01-01
Background Celiac disease (CD) is a difficult-to-diagnose condition because of its multiple clinical presentations and symptoms shared with other diseases. Gold-standard diagnostic confirmation of suspected CD is achieved by biopsying the small intestine. Objective To develop a clinical decision–support system (CDSS) integrated with an automated classifier to recognize CD cases, by selecting from experimental models developed using intelligence artificial techniques. Methods A web-based system was designed for constructing a retrospective database that included 178 clinical cases for training. Tests were run on 270 automated classifiers available in Weka 3.6.1 using five artificial intelligence techniques, namely decision trees, Bayesian inference, k-nearest neighbor algorithm, support vector machines and artificial neural networks. The parameters evaluated were accuracy, sensitivity, specificity and area under the ROC curve (AUC). AUC was used as a criterion for selecting the CDSS algorithm. A testing database was constructed including 38 clinical CD cases for CDSS evaluation. The diagnoses suggested by CDSS were compared with those made by physicians during patient consultations. Results The most accurate method during the training phase was the averaged one-dependence estimator (AODE) algorithm (a Bayesian classifier), which showed accuracy 80.0%, sensitivity 0.78, specificity 0.80 and AUC 0.84. This classifier was integrated into the web-based decision–support system. The gold-standard validation of CDSS achieved accuracy of 84.2% and k = 0.68 (p < 0.0001) with good agreement. The same accuracy was achieved in the comparison between the physician’s diagnostic impression and the gold standard k = 0. 64 (p < 0.0001). There was moderate agreement between the physician’s diagnostic impression and CDSS k = 0.46 (p = 0.0008). Conclusions The study results suggest that CDSS could be used to help in diagnosing CD, since the algorithm tested achieved excellent accuracy in differentiating possible positive from negative CD diagnoses. This study may contribute towards developing of a computer-assisted environment to support CD diagnosis. PMID:21917512
Tenório, Josceli Maria; Hummel, Anderson Diniz; Cohrs, Frederico Molina; Sdepanian, Vera Lucia; Pisa, Ivan Torres; de Fátima Marin, Heimar
2011-11-01
Celiac disease (CD) is a difficult-to-diagnose condition because of its multiple clinical presentations and symptoms shared with other diseases. Gold-standard diagnostic confirmation of suspected CD is achieved by biopsying the small intestine. To develop a clinical decision-support system (CDSS) integrated with an automated classifier to recognize CD cases, by selecting from experimental models developed using intelligence artificial techniques. A web-based system was designed for constructing a retrospective database that included 178 clinical cases for training. Tests were run on 270 automated classifiers available in Weka 3.6.1 using five artificial intelligence techniques, namely decision trees, Bayesian inference, k-nearest neighbor algorithm, support vector machines and artificial neural networks. The parameters evaluated were accuracy, sensitivity, specificity and area under the ROC curve (AUC). AUC was used as a criterion for selecting the CDSS algorithm. A testing database was constructed including 38 clinical CD cases for CDSS evaluation. The diagnoses suggested by CDSS were compared with those made by physicians during patient consultations. The most accurate method during the training phase was the averaged one-dependence estimator (AODE) algorithm (a Bayesian classifier), which showed accuracy 80.0%, sensitivity 0.78, specificity 0.80 and AUC 0.84. This classifier was integrated into the web-based decision-support system. The gold-standard validation of CDSS achieved accuracy of 84.2% and k=0.68 (p<0.0001) with good agreement. The same accuracy was achieved in the comparison between the physician's diagnostic impression and the gold standard k=0. 64 (p<0.0001). There was moderate agreement between the physician's diagnostic impression and CDSS k=0.46 (p=0.0008). The study results suggest that CDSS could be used to help in diagnosing CD, since the algorithm tested achieved excellent accuracy in differentiating possible positive from negative CD diagnoses. This study may contribute towards developing of a computer-assisted environment to support CD diagnosis. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Pashaei, Elnaz; Pashaei, Elham; Aydin, Nizamettin
2018-04-14
In cancer classification, gene selection is an important data preprocessing technique, but it is a difficult task due to the large search space. Accordingly, the objective of this study is to develop a hybrid meta-heuristic Binary Black Hole Algorithm (BBHA) and Binary Particle Swarm Optimization (BPSO) (4-2) model that emphasizes gene selection. In this model, the BBHA is embedded in the BPSO (4-2) algorithm to make the BPSO (4-2) more effective and to facilitate the exploration and exploitation of the BPSO (4-2) algorithm to further improve the performance. This model has been associated with Random Forest Recursive Feature Elimination (RF-RFE) pre-filtering technique. The classifiers which are evaluated in the proposed framework are Sparse Partial Least Squares Discriminant Analysis (SPLSDA); k-nearest neighbor and Naive Bayes. The performance of the proposed method was evaluated on two benchmark and three clinical microarrays. The experimental results and statistical analysis confirm the better performance of the BPSO (4-2)-BBHA compared with the BBHA, the BPSO (4-2) and several state-of-the-art methods in terms of avoiding local minima, convergence rate, accuracy and number of selected genes. The results also show that the BPSO (4-2)-BBHA model can successfully identify known biologically and statistically significant genes from the clinical datasets. Copyright © 2018 Elsevier Inc. All rights reserved.
Go, Taesik; Byeon, Hyeokjun; Lee, Sang Joon
2018-04-30
Cell types of erythrocytes should be identified because they are closely related to their functionality and viability. Conventional methods for classifying erythrocytes are time consuming and labor intensive. Therefore, an automatic and accurate erythrocyte classification system is indispensable in healthcare and biomedical fields. In this study, we proposed a new label-free sensor for automatic identification of erythrocyte cell types using a digital in-line holographic microscopy (DIHM) combined with machine learning algorithms. A total of 12 features, including information on intensity distributions, morphological descriptors, and optical focusing characteristics, is quantitatively obtained from numerically reconstructed holographic images. All individual features for discocytes, echinocytes, and spherocytes are statistically different. To improve the performance of cell type identification, we adopted several machine learning algorithms, such as decision tree model, support vector machine, linear discriminant classification, and k-nearest neighbor classification. With the aid of these machine learning algorithms, the extracted features are effectively utilized to distinguish erythrocytes. Among the four tested algorithms, the decision tree model exhibits the best identification performance for the training sets (n = 440, 98.18%) and test sets (n = 190, 97.37%). This proposed methodology, which smartly combined DIHM and machine learning, would be helpful for sensing abnormal erythrocytes and computer-aided diagnosis of hematological diseases in clinic. Copyright © 2017 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Biddle, J.; Priour, D. J. Jr.; Wang, B.
We study the quantum localization phenomena of noninteracting particles in one-dimensional lattices based on tight-binding models with various forms of hopping terms beyond the nearest neighbor, which are generalizations of the famous Aubry-Andre and noninteracting Anderson models. For the case with deterministic disordered potential induced by a secondary incommensurate lattice (i.e., the Aubry-Andre model), we identify a class of self-dual models, for which the boundary between localized and extended eigenstates are determined analytically by employing a generalized Aubry-Andre transformation. We also numerically investigate the localization properties of nondual models with next-nearest-neighbor hopping, Gaussian, and power-law decay hopping terms. We findmore » that even for these nondual models, the numerically obtained mobility edges can be well approximated by the analytically obtained condition for localization transition in the self-dual models, as long as the decay of the hopping rate with respect to distance is sufficiently fast. For the disordered potential with genuinely random character, we examine scenarios with next-nearest-neighbor hopping, exponential, Gaussian, and power-law decay hopping terms numerically. We find that the higher-order hopping terms can remove the symmetry in the localization length about the energy band center compared to the Anderson model. Furthermore, our results demonstrate that for the power-law decay case, there exists a critical exponent below which mobility edges can be found. Our theoretical results could, in principle, be directly tested in shallow atomic optical lattice systems enabling non-nearest-neighbor hopping.« less
Fidelity study of superconductivity in extended Hubbard models
NASA Astrophysics Data System (ADS)
Plonka, N.; Jia, C. J.; Wang, Y.; Moritz, B.; Devereaux, T. P.
2015-07-01
The Hubbard model with local on-site repulsion is generally thought to possess a superconducting ground state for appropriate parameters, but the effects of more realistic long-range Coulomb interactions have not been studied extensively. We study the influence of these interactions on superconductivity by including nearest- and next-nearest-neighbor extended Hubbard interactions in addition to the usual on-site terms. Utilizing numerical exact diagonalization, we analyze the signatures of superconductivity in the ground states through the fidelity metric of quantum information theory. We find that nearest and next-nearest neighbor interactions have thresholds above which they destabilize superconductivity regardless of whether they are attractive or repulsive, seemingly due to competing charge fluctuations.
A molecular dynamics study of the relaxation of an excited molecule in crystalline nitromethane
NASA Astrophysics Data System (ADS)
Rivera-Rivera, Luis A.; Siavosh-Haghighi, Ali; Sewell, Thomas D.; Thompson, Donald L.
2014-07-01
Classical molecular dynamics simulations were used to study the relaxation of an excited nitromethane molecule in perfect crystalline nitromethane at 250 K and 1 atm pressure. The molecule was instantaneously excited by statistically distributing energy E∗ between 25.0 kcal/mol and 125.0 kcal/mol among the 21 degrees of freedom of the molecule. The relaxation occurs exponentially with time constants between 11.58 ps and 13.57 ps. Energy transfer from the excited molecule to surrounding quasi-spherical shells of molecules occurs concurrently to both the nearest and next-nearest neighbor shells, but with more energy per molecule transferred more rapidly to the first shell.
Kenneth B. Pierce; Janet L. Ohmann; Michael C. Wimberly; Matthew J. Gregory; Jeremy S. Fried
2009-01-01
Land managers need consistent information about the geographic distribution of wildland fuels and forest structure over large areas to evaluate fire risk and plan fuel treatments. We compared spatial predictions for 12 fuel and forest structure variables across three regions in the western United States using gradient nearest neighbor (GNN) imputation, linear models (...
Mapping change of older forest with nearest-neighbor imputation and Landsat time-series
Janet L. Ohmann; Matthew J. Gregory; Heather M. Roberts; Warren B. Cohen; Robert E. Kennedy; Zhiqiang Yang
2012-01-01
The Northwest Forest Plan (NWFP), which aims to conserve late-successional and old-growth forests (older forests) and associated species, established new policies on federal lands in the Pacific Northwest USA. As part of monitoring for the NWFP, we tested nearest-neighbor imputation for mapping change in older forest, defined by threshold values for forest attributes...
Nearest-neighbor thermodynamics of deoxyinosine pairs in DNA duplexes
Watkins, Norman E.; SantaLucia, John
2005-01-01
Nearest-neighbor thermodynamic parameters of the ‘universal pairing base’ deoxyinosine were determined for the pairs I·C, I·A, I·T, I·G and I·I adjacent to G·C and A·T pairs. Ultraviolet absorbance melting curves were measured and non-linear regression performed on 84 oligonucleotide duplexes with 9 or 12 bp lengths. These data were combined with data for 13 inosine containing duplexes from the literature. Multiple linear regression was used to solve for the 32 nearest-neighbor unknowns. The parameters predict the Tm for all sequences within 1.2°C on average. The general trend in decreasing stability is I·C > I·A > I·T ≈ I· G > I·I. The stability trend for the base pair 5′ of the I·X pair is G·C > C·G > A·T > T·A. The stability trend for the base pair 3′ of I·X is the same. These trends indicate a complex interplay between H-bonding, nearest-neighbor stacking, and mismatch geometry. A survey of 14 tandem inosine pairs and 8 tandem self-complementary inosine pairs is also provided. These results may be used in the design of degenerate PCR primers and for degenerate microarray probes. PMID:16264087
1980-12-05
classification procedures that are common in speech processing. The anesthesia level classification by EEG time series population screening problem example is in...formance. The use of the KL number type metric in NN rule classification, in a delete-one subj ect ’s EE-at-a-time KL-NN and KL- kNN classification of the...17 individual labeled EEG sample population using KL-NN and KL- kNN rules. The results obtained are shown in Table 1. The entries in the table indicate
Preserved Network Metrics across Translated Texts
NASA Astrophysics Data System (ADS)
Cabatbat, Josephine Jill T.; Monsanto, Jica P.; Tapang, Giovanni A.
2014-09-01
Co-occurrence language networks based on Bible translations and the Universal Declaration of Human Rights (UDHR) translations in different languages were constructed and compared with random text networks. Among the considered network metrics, the network size, N, the normalized betweenness centrality (BC), and the average k-nearest neighbors, knn, were found to be the most preserved across translations. Moreover, similar frequency distributions of co-occurring network motifs were observed for translated texts networks.
The Use of Fuzzy Set Classification for Pattern Recognition of the Polygraph
1993-12-01
actual feature extraction was done, It was decided to use the K-nearest neighbor ( KNN ) the data was preprocessed. The electrocardiogram classifier in...showing heart pulse, and a low frequency not known beforehand, and the KNN classifier does not component showing blood volume. The derivative of...the characteristics of the conventional KNN these six derived signals were detrended and filtered, classification method is that it assigns each
NASA Astrophysics Data System (ADS)
Erkyihun, Solomon Tassew; Rajagopalan, Balaji; Zagona, Edith; Lall, Upmanu; Nowak, Kenneth
2016-05-01
A model to generate stochastic streamflow projections conditioned on quasi-oscillatory climate indices such as Pacific Decadal Oscillation (PDO) and Atlantic Multi-decadal Oscillation (AMO) is presented. Recognizing that each climate index has underlying band-limited components that contribute most of the energy of the signals, we first pursue a wavelet decomposition of the signals to identify and reconstruct these features from annually resolved historical data and proxy based paleoreconstructions of each climate index covering the period from 1650 to 2012. A K-Nearest Neighbor block bootstrap approach is then developed to simulate the total signal of each of these climate index series while preserving its time-frequency structure and marginal distributions. Finally, given the simulated climate signal time series, a K-Nearest Neighbor bootstrap is used to simulate annual streamflow series conditional on the joint state space defined by the simulated climate index for each year. We demonstrate this method by applying it to simulation of streamflow at Lees Ferry gauge on the Colorado River using indices of two large scale climate forcings: Pacific Decadal Oscillation (PDO) and Atlantic Multi-decadal Oscillation (AMO), which are known to modulate the Colorado River Basin (CRB) hydrology at multidecadal time scales. Skill in stochastic simulation of multidecadal projections of flow using this approach is demonstrated.
Acosta-Mesa, Héctor Gabriel; Cruz-Ramírez, Nicandro; Hernández-Jiménez, Rodolfo
2017-01-01
Efforts have been being made to improve the diagnostic performance of colposcopy, trying to help better diagnose cervical cancer, particularly in developing countries. However, improvements in a number of areas are still necessary, such as the time it takes to process the full digital image of the cervix, the performance of the computing systems used to identify different kinds of tissues, and biopsy sampling. In this paper, we explore three different, well-known automatic classification methods (k-Nearest Neighbors, Naïve Bayes, and C4.5), in addition to different data models that take full advantage of this information and improve the diagnostic performance of colposcopy based on acetowhite temporal patterns. Based on the ROC and PRC area scores, the k-Nearest Neighbors and discrete PLA representation performed better than other methods. The values of sensitivity, specificity, and accuracy reached using this method were 60% (95% CI 50–70), 79% (95% CI 71–86), and 70% (95% CI 60–80), respectively. The acetowhitening phenomenon is not exclusive to high-grade lesions, and we have found acetowhite temporal patterns of epithelial changes that are not precancerous lesions but that are similar to positive ones. These findings need to be considered when developing more robust computing systems in the future. PMID:28744318
Comparison of four approaches to a rock facies classification problem
Dubois, M.K.; Bohling, Geoffrey C.; Chakrabarti, S.
2007-01-01
In this study, seven classifiers based on four different approaches were tested in a rock facies classification problem: classical parametric methods using Bayes' rule, and non-parametric methods using fuzzy logic, k-nearest neighbor, and feed forward-back propagating artificial neural network. Determining the most effective classifier for geologic facies prediction in wells without cores in the Panoma gas field, in Southwest Kansas, was the objective. Study data include 3600 samples with known rock facies class (from core) with each sample having either four or five measured properties (wire-line log curves), and two derived geologic properties (geologic constraining variables). The sample set was divided into two subsets, one for training and one for testing the ability of the trained classifier to correctly assign classes. Artificial neural networks clearly outperformed all other classifiers and are effective tools for this particular classification problem. Classical parametric models were inadequate due to the nature of the predictor variables (high dimensional and not linearly correlated), and feature space of the classes (overlapping). The other non-parametric methods tested, k-nearest neighbor and fuzzy logic, would need considerable improvement to match the neural network effectiveness, but further work, possibly combining certain aspects of the three non-parametric methods, may be justified. ?? 2006 Elsevier Ltd. All rights reserved.
Cooperation in N-person evolutionary snowdrift game in scale-free Barabási Albert networks
NASA Astrophysics Data System (ADS)
Lee, K. H.; Chan, Chun-Him; Hui, P. M.; Zheng, Da-Fang
2008-09-01
Cooperation in the N-person evolutionary snowdrift game (NESG) is studied in scale-free Barabási-Albert (BA) networks. Due to the inhomogeneity of the network, two versions of NESG are proposed and studied. In a model where the size of the competing group varies from agent to agent, the fraction of cooperators drops as a function of the payoff parameter. The networking effect is studied via the fraction of cooperative agents for nodes with a particular degree. For small payoff parameters, it is found that the small- k agents are dominantly cooperators, while large- k agents are of non-cooperators. Studying the spatial correlation reveals that cooperative agents will avoid to be nearest neighbors and the correlation disappears beyond the next-nearest neighbors. The behavior can be explained in terms of the networking effect and payoffs. In another model with a fixed size of competing groups, the fraction of cooperators could show a non-monotonic behavior in the regime of small payoff parameters. This non-trivial behavior is found to be a combined effect of the many agents with the smallest degree in the BA network and the increasing fraction of cooperators among these agents with the payoff for small payoffs.
Spin coherence and 14N ESEEM effects of nitrogen-vacancy centers in diamond with X-band pulsed ESR
NASA Astrophysics Data System (ADS)
Rose, B. C.; Weis, C. D.; Tyryshkin, A. M.; Schenkel, T.; Lyon, S. A.
2017-02-01
Pulsed ESR experiments are reported for ensembles of negatively-charged nitrogen-vacancy centers (NV$^-$) in diamonds at X-band magnetic fields (280-400 mT) and low temperatures (2-70 K). The NV$^-$ centers in synthetic type IIb diamonds (nitrogen impurity concentration $<1$~ppm) are prepared with bulk concentrations of $2\\cdot 10^{13}$ cm$^{-3}$ to $4\\cdot 10^{14}$ cm$^{-3}$ by high-energy electron irradiation and subsequent annealing. We find that a proper post-radiation anneal (1000$^\\circ$C for 60 mins) is critically important to repair the radiation damage and to recover long electron spin coherence times for NV$^-$s. After the annealing, spin coherence times of T$_2 = 0.74$~ms at 5~K are achieved, being only limited by $^{13}$C nuclear spectral diffusion in natural abundance diamonds. At X-band magnetic fields, strong electron spin echo envelope modulation (ESEEM) is observed originating from the central $^{14}$N nucleus. The ESEEM spectral analysis allows for accurate determination of the $^{14}$N nuclear hypefine and quadrupole tensors. In addition, the ESEEM effects from two proximal $^{13}$C sites (second-nearest neighbor and fourth-nearest neighbor) are resolved and the respective $^{13}$C hyperfine coupling constants are extracted.
Sub-word based Arabic handwriting analysis for writer identification
NASA Astrophysics Data System (ADS)
Maliki, Makki; Al-Jawad, Naseer; Jassim, Sabah
2013-05-01
Analysing a text or part of it is key to handwriting identification. Generally, handwriting is learnt over time and people develop habits in the style of writing. These habits are embedded in special parts of handwritten text. In Arabic each word consists of one or more sub-word(s). The end of each sub-word is considered to be a connect stroke. The main hypothesis in this paper is that sub-words are essential reflection of Arabic writer's habits that could be exploited for writer identification. Testing this hypothesis will be based on experiments that evaluate writer's identification, mainly using K nearest neighbor from group of sub-words extracted from longer text. The experimental results show that using a group of sub-words could be used to identify the writer with a successful rate between 52.94 % to 82.35% when top1 is used, and it can go up to 100% when top5 is used based on K nearest neighbor. The results show that majority of writers are identified using 7 sub-words with a reliability confident of about 90% (i.e. 90% of the rejected templates have significantly larger distances to the tested example than the distance from the correctly identified template). However previous work, using a complete word, shows successful rate of at most 90% in top 10.
Prediction of Human Intestinal Absorption of Compounds Using Artificial Intelligence Techniques.
Kumar, Rajnish; Sharma, Anju; Siddiqui, Mohammed Haris; Tiwari, Rajesh Kumar
2017-01-01
Information about Pharmacokinetics of compounds is an essential component of drug design and development. Modeling the pharmacokinetic properties require identification of the factors effecting absorption, distribution, metabolism and excretion of compounds. There have been continuous attempts in the prediction of intestinal absorption of compounds using various Artificial intelligence methods in the effort to reduce the attrition rate of drug candidates entering to preclinical and clinical trials. Currently, there are large numbers of individual predictive models available for absorption using machine learning approaches. Six Artificial intelligence methods namely, Support vector machine, k- nearest neighbor, Probabilistic neural network, Artificial neural network, Partial least square and Linear discriminant analysis were used for prediction of absorption of compounds. Prediction accuracy of Support vector machine, k- nearest neighbor, Probabilistic neural network, Artificial neural network, Partial least square and Linear discriminant analysis for prediction of intestinal absorption of compounds was found to be 91.54%, 88.33%, 84.30%, 86.51%, 79.07% and 80.08% respectively. Comparative analysis of all the six prediction models suggested that Support vector machine with Radial basis function based kernel is comparatively better for binary classification of compounds using human intestinal absorption and may be useful at preliminary stages of drug design and development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
SubCellProt: predicting protein subcellular localization using machine learning approaches.
Garg, Prabha; Sharma, Virag; Chaudhari, Pradeep; Roy, Nilanjan
2009-01-01
High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.
Capurso, Daniel; Bengtsson, Henrik; Segal, Mark R.
2016-01-01
The spatial organization of the genome influences cellular function, notably gene regulation. Recent studies have assessed the three-dimensional (3D) co-localization of functional annotations (e.g. centromeres, long terminal repeats) using 3D genome reconstructions from Hi-C (genome-wide chromosome conformation capture) data; however, corresponding assessments for continuous functional genomic data (e.g. chromatin immunoprecipitation-sequencing (ChIP-seq) peak height) are lacking. Here, we demonstrate that applying bump hunting via the patient rule induction method (PRIM) to ChIP-seq data superposed on a Saccharomyces cerevisiae 3D genome reconstruction can discover ‘functional 3D hotspots’, regions in 3-space for which the mean ChIP-seq peak height is significantly elevated. For the transcription factor Swi6, the top hotspot by P-value contains MSB2 and ERG11 – known Swi6 target genes on different chromosomes. We verify this finding in a number of ways. First, this top hotspot is relatively stable under PRIM across parameter settings. Second, this hotspot is among the top hotspots by mean outcome identified by an alternative algorithm, k-Nearest Neighbor (k-NN) regression. Third, the distance between MSB2 and ERG11 is smaller than expected (by resampling) in two other 3D reconstructions generated via different normalization and reconstruction algorithms. This analytic approach can discover functional 3D hotspots and potentially reveal novel regulatory interactions. PMID:26869583
Wu, Stephen Gang; Wang, Yuxuan; Jiang, Wu; Oyetunde, Tolutola; Yao, Ruilian; Zhang, Xuehong; Shimizu, Kazuyuki; Tang, Yinjie J; Bao, Forrest Sheng
2016-04-01
13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species.
Wu, Stephen Gang; Wang, Yuxuan; Jiang, Wu; Oyetunde, Tolutola; Yao, Ruilian; Zhang, Xuehong; Shimizu, Kazuyuki; Tang, Yinjie J.; Bao, Forrest Sheng
2016-01-01
13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species. PMID:27092947
Two-pass imputation algorithm for missing value estimation in gene expression time series.
Tsiporkova, Elena; Boeva, Veselka
2007-10-01
Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu Xiaoying; Ho, Shirley; Trac, Hy
We investigate machine learning (ML) techniques for predicting the number of galaxies (N{sub gal}) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N{sub gal}. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: supportmore » vector machines (SVM) and k-nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N{sub gal} by training our algorithms on the following six halo properties: number of particles, M{sub 200}, {sigma}{sub v}, v{sub max}, half-mass radius, and spin. For Millennium, our predicted N{sub gal} values have a mean-squared error (MSE) of {approx}0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to {approx}5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N{sub gal}. Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M{sub star}, low M{sub star}). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.« less
An Ant Colony Optimization Based Feature Selection for Web Page Classification
2014-01-01
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods. PMID:25136678
A comparison between skeleton and bounding box models for falling direction recognition
NASA Astrophysics Data System (ADS)
Narupiyakul, Lalita; Srisrisawang, Nitikorn
2017-12-01
Falling is an injury that can lead to a serious medical condition in every range of the age of people. However, in the case of elderly, the risk of serious injury is much higher. Due to the fact that one way of preventing serious injury is to treat the fallen person as soon as possible, several works attempted to implement different algorithms to recognize the fall. Our work compares the performance of two models based on features extraction: (i) Body joint data (Skeleton Data) which are the joint's positions in 3 axes and (ii) Bounding box (Box-size Data) covering all body joints. Machine learning algorithms that were chosen are Decision Tree (DT), Naïve Bayes (NB), K-nearest neighbors (KNN), Linear discriminant analysis (LDA), Voting Classification (VC), and Gradient boosting (GB). The results illustrate that the models trained with Skeleton data are performed far better than those trained with Box-size data (with an average accuracy of 94-81% and 80-75%, respectively). KNN shows the best performance in both Body joint model and Bounding box model. In conclusion, KNN with Body joint model performs the best among the others.
Prediction of Drug-Plasma Protein Binding Using Artificial Intelligence Based Algorithms.
Kumar, Rajnish; Sharma, Anju; Siddiqui, Mohammed Haris; Tiwari, Rajesh Kumar
2018-01-01
Plasma protein binding (PPB) has vital importance in the characterization of drug distribution in the systemic circulation. Unfavorable PPB can pose a negative effect on clinical development of promising drug candidates. The drug distribution properties should be considered at the initial phases of the drug design and development. Therefore, PPB prediction models are receiving an increased attention. In the current study, we present a systematic approach using Support vector machine, Artificial neural network, k- nearest neighbor, Probabilistic neural network, Partial least square and Linear discriminant analysis to relate various in vitro and in silico molecular descriptors to a diverse dataset of 736 drugs/drug-like compounds. The overall accuracy of Support vector machine with Radial basis function kernel came out to be comparatively better than the rest of the applied algorithms. The training set accuracy, validation set accuracy, precision, sensitivity, specificity and F1 score for the Suprort vector machine was found to be 89.73%, 89.97%, 92.56%, 87.26%, 91.97% and 0.898, respectively. This model can potentially be useful in screening of relevant drug candidates at the preliminary stages of drug design and development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
LMD Based Features for the Automatic Seizure Detection of EEG Signals Using SVM.
Zhang, Tao; Chen, Wanzhong
2017-08-01
Achieving the goal of detecting seizure activity automatically using electroencephalogram (EEG) signals is of great importance and significance for the treatment of epileptic seizures. To realize this aim, a newly-developed time-frequency analytical algorithm, namely local mean decomposition (LMD), is employed in the presented study. LMD is able to decompose an arbitrary signal into a series of product functions (PFs). Primarily, the raw EEG signal is decomposed into several PFs, and then the temporal statistical and non-linear features of the first five PFs are calculated. The features of each PF are fed into five classifiers, including back propagation neural network (BPNN), K-nearest neighbor (KNN), linear discriminant analysis (LDA), un-optimized support vector machine (SVM) and SVM optimized by genetic algorithm (GA-SVM), for five classification cases, respectively. Confluent features of all PFs and raw EEG are further passed into the high-performance GA-SVM for the same classification tasks. Experimental results on the international public Bonn epilepsy EEG dataset show that the average classification accuracy of the presented approach are equal to or higher than 98.10% in all the five cases, and this indicates the effectiveness of the proposed approach for automated seizure detection.
Forecasting the portuguese stock market time series by using artificial neural networks
NASA Astrophysics Data System (ADS)
Isfan, Monica; Menezes, Rui; Mendes, Diana A.
2010-04-01
In this paper, we show that neural networks can be used to uncover the non-linearity that exists in the financial data. First, we follow a traditional approach by analysing the deterministic/stochastic characteristics of the Portuguese stock market data and some typical features are studied, like the Hurst exponents, among others. We also simulate a BDS test to investigate nonlinearities and the results are as expected: the financial time series do not exhibit linear dependence. Secondly, we trained four types of neural networks for the stock markets and used the models to make forecasts. The artificial neural networks were obtained using a three-layer feed-forward topology and the back-propagation learning algorithm. The quite large number of parameters that must be selected to develop a neural network forecasting model involves some trial and as a consequence the error is not small enough. In order to improve this we use a nonlinear optimization algorithm to minimize the error. Finally, the output of the 4 models is quite similar, leading to a qualitative forecast that we compare with the results of the application of k-nearest-neighbor for the same time series.
Zhang, Tong-Liang; Ding, Yong-Sheng; Chou, Kuo-Chen
2008-01-07
Compared with the conventional amino acid (AA) composition, the pseudo-amino acid (PseAA) composition as originally introduced for protein subcellular location prediction can incorporate much more information of a protein sequence, so as to remarkably enhance the power of using a discrete model to predict various attributes of a protein. In this study, based on the concept of PseAA composition, the approximate entropy and hydrophobicity pattern of a protein sequence are used to characterize the PseAA components. Also, the immune genetic algorithm (IGA) is applied to search the optimal weight factors in generating the PseAA composition. Thus, for a given protein sequence sample, a 27-D (dimensional) PseAA composition is generated as its descriptor. The fuzzy K nearest neighbors (FKNN) classifier is adopted as the prediction engine. The results thus obtained in predicting protein structural classification are quite encouraging, indicating that the current approach may also be used to improve the prediction quality of other protein attributes, or at least can play a complimentary role to the existing methods in the relevant areas. Our algorithm is written in Matlab that is available by contacting the corresponding author.
Cyran, Krzysztof A.
2018-01-01
This work considers the problem of utilizing electroencephalographic signals for use in systems designed for monitoring and enhancing the performance of aircraft pilots. Systems with such capabilities are generally referred to as cognitive cockpits. This article provides a description of the potential that is carried by such systems, especially in terms of increasing flight safety. Additionally, a neuropsychological background of the problem is presented. Conducted research was focused mainly on the problem of discrimination between states of brain activity related to idle but focused anticipation of visual cue and reaction to it. Especially, a problem of selecting a proper classification algorithm for such problems is being examined. For that purpose an experiment involving 10 subjects was planned and conducted. Experimental electroencephalographic data was acquired using an Emotiv EPOC+ headset. Proposed methodology involved use of a popular method in biomedical signal processing, the Common Spatial Pattern, extraction of bandpower features, and an extensive test of different classification algorithms, such as Linear Discriminant Analysis, k-nearest neighbors, and Support Vector Machines with linear and radial basis function kernels, Random Forests, and Artificial Neural Networks. PMID:29849544
Deep feature extraction and combination for synthetic aperture radar target classification
NASA Astrophysics Data System (ADS)
Amrani, Moussa; Jiang, Feng
2017-10-01
Feature extraction has always been a difficult problem in the classification performance of synthetic aperture radar automatic target recognition (SAR-ATR). It is very important to select discriminative features to train a classifier, which is a prerequisite. Inspired by the great success of convolutional neural network (CNN), we address the problem of SAR target classification by proposing a feature extraction method, which takes advantage of exploiting the extracted deep features from CNNs on SAR images to introduce more powerful discriminative features and robust representation ability for them. First, the pretrained VGG-S net is fine-tuned on moving and stationary target acquisition and recognition (MSTAR) public release database. Second, after a simple preprocessing is performed, the fine-tuned network is used as a fixed feature extractor to extract deep features from the processed SAR images. Third, the extracted deep features are fused by using a traditional concatenation and a discriminant correlation analysis algorithm. Finally, for target classification, K-nearest neighbors algorithm based on LogDet divergence-based metric learning triplet constraints is adopted as a baseline classifier. Experiments on MSTAR are conducted, and the classification accuracy results demonstrate that the proposed method outperforms the state-of-the-art methods.
NASA Technical Reports Server (NTRS)
Weger, R. C.; Lee, J.; Zhu, Tianri; Welch, R. M.
1992-01-01
The current controversy existing in reference to the regularity vs. clustering in cloud fields is examined by means of analysis and simulation studies based upon nearest-neighbor cumulative distribution statistics. It is shown that the Poisson representation of random point processes is superior to pseudorandom-number-generated models and that pseudorandom-number-generated models bias the observed nearest-neighbor statistics towards regularity. Interpretation of this nearest-neighbor statistics is discussed for many cases of superpositions of clustering, randomness, and regularity. A detailed analysis is carried out of cumulus cloud field spatial distributions based upon Landsat, AVHRR, and Skylab data, showing that, when both large and small clouds are included in the cloud field distributions, the cloud field always has a strong clustering signal.
Phase transition and monopole densities in a nearest neighbor two-dimensional spin ice model
NASA Astrophysics Data System (ADS)
Morais, C. W.; de Freitas, D. N.; Mota, A. L.; Bastone, E. C.
2017-12-01
In this work, we show that, due to the alternating orientation of the spins in the ground state of the artificial square spin ice, the influence of a set of spins at a certain distance of a reference spin decreases faster than the expected result for the long range dipolar interaction, justifying the use of the nearest neighbor two-dimensional square spin ice model as an effective model. Using an extension of the model presented in Y. L. Xie et al., Sci. Rep. 5, 15875 (2015), considering the influence of the eight nearest neighbors of each spin on the lattice, we analyze the thermodynamics of the model and study the dependence of monopoles and string densities as a function of the temperature.
Jafarpour, Farshid; Angheluta, Luiza; Goldenfeld, Nigel
2013-10-01
The dynamics of edge dislocations with parallel Burgers vectors, moving in the same slip plane, is mapped onto Dyson's model of a two-dimensional Coulomb gas confined in one dimension. We show that the tail distribution of the velocity of dislocations is power law in form, as a consequence of the pair interaction of nearest neighbors in one dimension. In two dimensions, we show the presence of a pairing phase transition in a system of interacting dislocations with parallel Burgers vectors. The scaling exponent of the velocity distribution at effective temperatures well below this pairing transition temperature can be derived from the nearest-neighbor interaction, while near the transition temperature, the distribution deviates from the form predicted by the nearest-neighbor interaction, suggesting the presence of collective effects.
Thermal rectification in mass-graded next-nearest-neighbor Fermi-Pasta-Ulam lattices
NASA Astrophysics Data System (ADS)
Romero-Bastida, M.; Miranda-Peña, Jorge-Orlando; López, Juan M.
2017-03-01
We study the thermal rectification efficiency, i.e., quantification of asymmetric heat flow, of a one-dimensional mass-graded anharmonic oscillator Fermi-Pasta-Ulam lattice both with nearest-neighbor (NN) and next-nearest-neighbor (NNN) interactions. The system presents a maximum rectification efficiency for a very precise value of the parameter that controls the coupling strength of the NNN interactions, which also optimizes the rectification figure when its dependence on mass asymmetry and temperature differences is considered. The origin of the enhanced rectification is the asymmetric local heat flow response as the heat reservoirs are swapped when a finely tuned NNN contribution is taken into account. A simple theoretical analysis gives an estimate of the optimal NNN coupling in excellent agreement with our simulation results.
Importance of interatomic spacing in catalytic reduction of oxygen in phosphoric acid
NASA Technical Reports Server (NTRS)
Jalan, V.; Taylor, E. J.
1983-01-01
A correlation between the nearest-neighbor distance and the oxygen reduction activity of various platinum alloys is reported. It is proposed that the distance between nearest-neighbor Pt atoms on the surface of a supported catalyst is not ideal for dual site absorption of O2 or 'HO2' and that the introduction of foreign atoms which reduce the Pt nearest-neighbor spacing would result in higher oxygen reduction activity. This may allow the critical 0-0 bond interatomic distance and hence the optimum Pt-Pt separation for bond rupture to be determined from quantum chemical calculations. A composite analysis shows that the data on supported Pt alloys are consistent with Appleby's (1970) data on bulk metals with respect to specific activity, activation energy, preexponential factor, and percent d-band character.
Smart, Otis; Burrell, Lauren
2014-01-01
Pattern classification for intracranial electroencephalogram (iEEG) and functional magnetic resonance imaging (fMRI) signals has furthered epilepsy research toward understanding the origin of epileptic seizures and localizing dysfunctional brain tissue for treatment. Prior research has demonstrated that implicitly selecting features with a genetic programming (GP) algorithm more effectively determined the proper features to discern biomarker and non-biomarker interictal iEEG and fMRI activity than conventional feature selection approaches. However for each the iEEG and fMRI modalities, it is still uncertain whether the stochastic properties of indirect feature selection with a GP yield (a) consistent results within a patient data set and (b) features that are specific or universal across multiple patient data sets. We examined the reproducibility of implicitly selecting features to classify interictal activity using a GP algorithm by performing several selection trials and subsequent frequent itemset mining (FIM) for separate iEEG and fMRI epilepsy patient data. We observed within-subject consistency and across-subject variability with some small similarity for selected features, indicating a clear need for patient-specific features and possible need for patient-specific feature selection or/and classification. For the fMRI, using nearest-neighbor classification and 30 GP generations, we obtained over 60% median sensitivity and over 60% median selectivity. For the iEEG, using nearest-neighbor classification and 30 GP generations, we obtained over 65% median sensitivity and over 65% median selectivity except one patient. PMID:25580059
NASA Astrophysics Data System (ADS)
Bi, Jiang-lin; Wang, Wei; Li, Qi
2017-07-01
In this paper, the effects of the next-nearest neighbors exchange couplings on the magnetic and thermal properties of the ferrimagnetic mixed-spin (2, 5/2) Ising model on a 3D honeycomb lattice have been investigated by the use of Monte Carlo simulation. In particular, the influences of exchange couplings (Ja, Jb, Jan) and the single-ion anisotropy(Da) on the phase diagrams, the total magnetization, the sublattice magnetization, the total susceptibility, the internal energy and the specific heat have been discussed in detail. The results clearly show that the system can express the critical and compensation behavior within the next-nearest neighbors exchange coupling. Great deals of the M curves such as N-, Q-, P- and L-types have been discovered, owing to the competition between the exchange coupling and the temperature. Compared with other theoretical and experimental works, our results have an excellent consistency with theirs.
Ghalyan, Najah F; Miller, David J; Ray, Asok
2018-06-12
Estimation of a generating partition is critical for symbolization of measurements from discrete-time dynamical systems, where a sequence of symbols from a (finite-cardinality) alphabet may uniquely specify the underlying time series. Such symbolization is useful for computing measures (e.g., Kolmogorov-Sinai entropy) to identify or characterize the (possibly unknown) dynamical system. It is also useful for time series classification and anomaly detection. The seminal work of Hirata, Judd, and Kilminster (2004) derives a novel objective function, akin to a clustering objective, that measures the discrepancy between a set of reconstruction values and the points from the time series. They cast estimation of a generating partition via the minimization of their objective function. Unfortunately, their proposed algorithm is nonconvergent, with no guarantee of finding even locally optimal solutions with respect to their objective. The difficulty is a heuristic-nearest neighbor symbol assignment step. Alternatively, we develop a novel, locally optimal algorithm for their objective. We apply iterative nearest-neighbor symbol assignments with guaranteed discrepancy descent, by which joint, locally optimal symbolization of the entire time series is achieved. While most previous approaches frame generating partition estimation as a state-space partitioning problem, we recognize that minimizing the Hirata et al. (2004) objective function does not induce an explicit partitioning of the state space, but rather the space consisting of the entire time series (effectively, clustering in a (countably) infinite-dimensional space). Our approach also amounts to a novel type of sliding block lossy source coding. Improvement, with respect to several measures, is demonstrated over popular methods for symbolizing chaotic maps. We also apply our approach to time-series anomaly detection, considering both chaotic maps and failure application in a polycrystalline alloy material.
Estimation of Full-Body Poses Using Only Five Inertial Sensors: An Eager or Lazy Learning Approach?
Wouda, Frank J.; Giuberti, Matteo; Bellusci, Giovanni; Veltink, Peter H.
2016-01-01
Human movement analysis has become easier with the wide availability of motion capture systems. Inertial sensing has made it possible to capture human motion without external infrastructure, therefore allowing measurements in any environment. As high-quality motion capture data is available in large quantities, this creates possibilities to further simplify hardware setups, by use of data-driven methods to decrease the number of body-worn sensors. In this work, we contribute to this field by analyzing the capabilities of using either artificial neural networks (eager learning) or nearest neighbor search (lazy learning) for such a problem. Sparse orientation features, resulting from sensor fusion of only five inertial measurement units with magnetometers, are mapped to full-body poses. Both eager and lazy learning algorithms are shown to be capable of constructing this mapping. The full-body output poses are visually plausible with an average joint position error of approximately 7 cm, and average joint angle error of 7∘. Additionally, the effects of magnetic disturbances typical in orientation tracking on the estimation of full-body poses was also investigated, where nearest neighbor search showed better performance for such disturbances. PMID:27983676
Márk, Géza I; Vértesy, Zofia; Kertész, Krisztián; Bálint, Zsolt; Biró, László P
2009-11-01
In order to study local and global order in butterfly wing scales possessing structural colors, we have developed a direct space algorithm, based on averaging the local environment of the repetitive units building up the structure. The method provides the statistical distribution of the local environments, including the histogram of the nearest-neighbor distance and the number of nearest neighbors. We have analyzed how the different kinds of randomness present in the direct space structure influence the reciprocal space structure. It was found that the Fourier method is useful in the case of a structure randomly deviating from an ordered lattice. The direct space averaging method remains applicable even for structures lacking long-range order. Based on the first Born approximation, a link is established between the reciprocal space image and the optical reflectance spectrum. Results calculated within this framework agree well with measured reflectance spectra because of the small width and moderate refractive index contrast of butterfly scales. By the analysis of the wing scales of Cyanophrys remus and Albulina metallica butterflies, we tested the methods for structures having long-range order, medium-range order, and short-range order.
Estimation of Full-Body Poses Using Only Five Inertial Sensors: An Eager or Lazy Learning Approach?
Wouda, Frank J; Giuberti, Matteo; Bellusci, Giovanni; Veltink, Peter H
2016-12-15
Human movement analysis has become easier with the wide availability of motion capture systems. Inertial sensing has made it possible to capture human motion without external infrastructure, therefore allowing measurements in any environment. As high-quality motion capture data is available in large quantities, this creates possibilities to further simplify hardware setups, by use of data-driven methods to decrease the number of body-worn sensors. In this work, we contribute to this field by analyzing the capabilities of using either artificial neural networks (eager learning) or nearest neighbor search (lazy learning) for such a problem. Sparse orientation features, resulting from sensor fusion of only five inertial measurement units with magnetometers, are mapped to full-body poses. Both eager and lazy learning algorithms are shown to be capable of constructing this mapping. The full-body output poses are visually plausible with an average joint position error of approximately 7 cm, and average joint angle error of 7 ∘ . Additionally, the effects of magnetic disturbances typical in orientation tracking on the estimation of full-body poses was also investigated, where nearest neighbor search showed better performance for such disturbances.
Streamflow Prediction based on Chaos Theory
NASA Astrophysics Data System (ADS)
Li, X.; Wang, X.; Babovic, V. M.
2015-12-01
Chaos theory is a popular method in hydrologic time series prediction. Local model (LM) based on this theory utilizes time-delay embedding to reconstruct the phase-space diagram. For this method, its efficacy is dependent on the embedding parameters, i.e. embedding dimension, time lag, and nearest neighbor number. The optimal estimation of these parameters is thus critical to the application of Local model. However, these embedding parameters are conventionally estimated using Average Mutual Information (AMI) and False Nearest Neighbors (FNN) separately. This may leads to local optimization and thus has limitation to its prediction accuracy. Considering about these limitation, this paper applies a local model combined with simulated annealing (SA) to find the global optimization of embedding parameters. It is also compared with another global optimization approach of Genetic Algorithm (GA). These proposed hybrid methods are applied in daily and monthly streamflow time series for examination. The results show that global optimization can contribute to the local model to provide more accurate prediction results compared with local optimization. The LM combined with SA shows more advantages in terms of its computational efficiency. The proposed scheme here can also be applied to other fields such as prediction of hydro-climatic time series, error correction, etc.
NASA Astrophysics Data System (ADS)
Márk, Géza I.; Vértesy, Zofia; Kertész, Krisztián; Bálint, Zsolt; Biró, László P.
2009-11-01
In order to study local and global order in butterfly wing scales possessing structural colors, we have developed a direct space algorithm, based on averaging the local environment of the repetitive units building up the structure. The method provides the statistical distribution of the local environments, including the histogram of the nearest-neighbor distance and the number of nearest neighbors. We have analyzed how the different kinds of randomness present in the direct space structure influence the reciprocal space structure. It was found that the Fourier method is useful in the case of a structure randomly deviating from an ordered lattice. The direct space averaging method remains applicable even for structures lacking long-range order. Based on the first Born approximation, a link is established between the reciprocal space image and the optical reflectance spectrum. Results calculated within this framework agree well with measured reflectance spectra because of the small width and moderate refractive index contrast of butterfly scales. By the analysis of the wing scales of Cyanophrys remus and Albulina metallica butterflies, we tested the methods for structures having long-range order, medium-range order, and short-range order.
van Laarhoven, Twan; Marchiori, Elena
2013-01-01
In silico discovery of interactions between drug compounds and target proteins is of core importance for improving the efficiency of the laborious and costly experimental determination of drug-target interaction. Drug-target interaction data are available for many classes of pharmaceutically useful target proteins including enzymes, ion channels, GPCRs and nuclear receptors. However, current drug-target interaction databases contain a small number of drug-target pairs which are experimentally validated interactions. In particular, for some drug compounds (or targets) there is no available interaction. This motivates the need for developing methods that predict interacting pairs with high accuracy also for these 'new' drug compounds (or targets). We show that a simple weighted nearest neighbor procedure is highly effective for this task. We integrate this procedure into a recent machine learning method for drug-target interaction we developed in previous work. Results of experiments indicate that the resulting method predicts true interactions with high accuracy also for new drug compounds and achieves results comparable or better than those of recent state-of-the-art algorithms. Software is publicly available at http://cs.ru.nl/~tvanlaarhoven/drugtarget2013/.
Photometric Supernova Classification with Machine Learning
NASA Astrophysics Data System (ADS)
Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.
2016-08-01
Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.
Predicting the Occurrence of Haze Events in Southeast Asia using Machine Learning Algorithms
NASA Astrophysics Data System (ADS)
Lee, H. H.; Chulakadabba, A.; Tonks, A.; Yang, Z.; Wang, C.
2017-12-01
Severe local- and regional-scale air pollution episodes typically originate from 1) high emissions of air pollutants, 2) poor dispersion conditions, and 3) trans-boundary pollutant transport. Biomass burning activities have become more frequent in Southeast Asia, especially in Sumatra, Borneo, and the mainland Southeast. Trans-boundary transport of biomass burning aerosols often lead to air quality problems in the region. Furthermore, particulate pollutants from human activities besides biomass burning also play an important role in the air quality of Southeast Asia. Singapore, for example, has a dynamic industrial sector including chemical, electric and metallurgic industries, and is the region's major petroleum-refining center. In addition, natural gas and oil power plants, waste incinerators, active port traffic, and a major regional airport further complicate Singapore's air quality issues. In this study, we compare five Machine Learning algorithms: k-Nearest Neighbors, Linear Support Vector Machine, Decision Tree, Random Forest and Artificial Neural Network, to identify haze patterns and determine variable importance. The algorithms were trained using local atmospheric data (i.e. months, atmospheric conditions, wind direction and relative humidity) from three observation stations in Singapore (Changi, Seletar and Paya Labar). We find that the algorithms reveal the associations in data within and between the stations, and provide in-depth interpretation of the haze sources. The algorithms also allow us to predict the probability of haze episodes in Singapore and to determine the correlation between this probability and atmospheric conditions.
A Comparative Study of Interferometric Regridding Algorithms
NASA Technical Reports Server (NTRS)
Hensley, Scott; Safaeinili, Ali
1999-01-01
THe paper discusses regridding options: (1) The problem of interpolating data that is not sampled on a uniform grid, that is noisy, and contains gaps is a difficult problem. (2) Several interpolation algorithms have been implemented: (a) Nearest neighbor - Fast and easy but shows some artifacts in shaded relief images. (b) Simplical interpolator - uses plane going through three points containing point where interpolation is required. Reasonably fast and accurate. (c) Convolutional - uses a windowed Gaussian approximating the optimal prolate spheroidal weighting function for a specified bandwidth. (d) First or second order surface fitting - Uses the height data centered in a box about a given point and does a weighted least squares surface fit.
The probability of misassociation between neighboring targets
NASA Astrophysics Data System (ADS)
Areta, Javier A.; Bar-Shalom, Yaakov; Rothrock, Ronald
2008-04-01
This paper presents procedures to calculate the probability that the measurement originating from an extraneous target will be (mis)associated with a target of interest for the cases of Nearest Neighbor and Global association. It is shown that these misassociation probabilities depend, under certain assumptions, on a particular - covariance weighted - norm of the difference between the targets' predicted measurements. For the Nearest Neighbor association, the exact solution, obtained for the case of equal innovation covariances, is based on a noncentral chi-square distribution. An approximate solution is also presented for the case of unequal innovation covariances. For the Global case an approximation is presented for the case of "similar" innovation covariances. In the general case of unequal innovation covariances where this approximation fails, an exact method based on the inversion of the characteristic function is presented. The theoretical results, confirmed by Monte Carlo simulations, quantify the benefit of Global vs. Nearest Neighbor association. These results are applied to problems of single sensor as well as centralized fusion architecture multiple sensor tracking.
Kenneth B. Jr. Pierce; C. Kenneth Brewer; Janet L. Ohmann
2010-01-01
This study was designed to test the feasibility of combining a method designed to populate pixels with inventory plot data at the 30-m scale with a new national predictor data set. The new national predictor data set was developed by the USDA Forest Service Remote Sensing Applications Center (hereafter RSAC) at the 250-m scale. Gradient Nearest Neighbor (GNN)...
Janet L. Ohmann; Matthew J. Gregory
2002-01-01
Spatially explicit information on the species composition and structure of forest vegetation is needed at broad spatial scales for natural resource policy analysis and ecological research. We present a method for predictive vegetation mapping that applies direct gradient analysis and nearest-neighbor imputation to ascribe detailed ground attributes of vegetation to...
Bianca N. I. Eskelson; Hailemariam Temesgen; Valerie Lemay; Tara M. Barrett; Nicholas L. Crookston; Andrew T. Hudak
2009-01-01
Almost universally, forest inventory and monitoring databases are incomplete, ranging from missing data for only a few records and a few variables, common for small land areas, to missing data for many observations and many variables, common for large land areas. For a wide variety of applications, nearest neighbor (NN) imputation methods have been developed to fill in...
Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett
2009-01-01
Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....
Harold S.J. Zald; Janet L. Ohmann; Heather M. Roberts; Matthew J. Gregory; Emilie B. Henderson; Robert J. McGaughey; Justin Braaten
2014-01-01
This study investigated how lidar-derived vegetation indices, disturbance history from Landsat time series (LTS) imagery, plot location accuracy, and plot size influenced accuracy of statistical spatial models (nearest-neighbor imputation maps) of forest vegetation composition and structure. Nearest-neighbor (NN) imputation maps were developed for 539,000 ha in the...
Evolutionary game theory and criticality
NASA Astrophysics Data System (ADS)
Mahmoodi, Korosh; Grigolini, Paolo
2017-01-01
We study a regular two-dimensional network of individuals playing the Prisonner’s Dilemma game with their neighbors, assigning to each individual the adoption of two different criteria to make a choice between cooperation and defection. For a fraction q < 1 of her time the individual makes her choice by imitating those done by the nearest neighbors, with no payoff consideration. For a fraction ε =1-q the choice between cooperation and defection of an individual depends on the payoff difference between the most successful neighbor and her payoff. When q = 1 for a special value of the imitation strength K, denoted as K c, the model of social pressure generates criticality. When q = 0 a large incentive to cheat yields the extinction of cooperation and a modest one leads to the survival of cooperation. We show that for K={{K}\\text{c}} the adoption of a very small value of ɛ exerts a bias in favor of either cooperation or defection, as a form of criticality-induced intelligence, which leads the system to select either the cooperation or the defection branch, when K>{{K}\\text{c}} . Intermediate values of ɛ annihilated criticality-induced cognition and, as consequence, may favor defection choice even in the case when a wise payoff consideration is expected to yield the emergence of cooperation.
Method of Menu Selection by Gaze Movement Using AC EOG Signals
NASA Astrophysics Data System (ADS)
Kanoh, Shin'ichiro; Futami, Ryoko; Yoshinobu, Tatsuo; Hoshimiya, Nozomu
A method to detect the direction and the distance of voluntary eye gaze movement from EOG (electrooculogram) signals was proposed and tested. In this method, AC-amplified vertical and horizontal transient EOG signals were classified into 8-class directions and 2-class distances of voluntary eye gaze movements. A horizontal and a vertical EOGs during eye gaze movement at each sampling time were treated as a two-dimensional vector, and the center of gravity of the sample vectors whose norms were more than 80% of the maximum norm was used as a feature vector to be classified. By the classification using the k-nearest neighbor algorithm, it was shown that the averaged correct detection rates on each subject were 98.9%, 98.7%, 94.4%, respectively. This method can avoid strict EOG-based eye tracking which requires DC amplification of very small signal. It would be useful to develop robust human interfacing systems based on menu selection for severely paralyzed patients.
An ultra low power feature extraction and classification system for wearable seizure detection.
Page, Adam; Pramod Tim Oates, Siddharth; Mohsenin, Tinoosh
2015-01-01
In this paper we explore the use of a variety of machine learning algorithms for designing a reliable and low-power, multi-channel EEG feature extractor and classifier for predicting seizures from electroencephalographic data (scalp EEG). Different machine learning classifiers including k-nearest neighbor, support vector machines, naïve Bayes, logistic regression, and neural networks are explored with the goal of maximizing detection accuracy while minimizing power, area, and latency. The input to each machine learning classifier is a 198 feature vector containing 9 features for each of the 22 EEG channels obtained over 1-second windows. All classifiers were able to obtain F1 scores over 80% and onset sensitivity of 100% when tested on 10 patients. Among five different classifiers that were explored, logistic regression (LR) proved to have minimum hardware complexity while providing average F-1 score of 91%. Both ASIC and FPGA implementations of logistic regression are presented and show the smallest area, power consumption, and the lowest latency when compared to the previous work.
Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection.
Zhu, Xiaofeng; Li, Xuelong; Zhang, Shichao; Ju, Chunhua; Wu, Xindong
2017-06-01
In this paper, we propose a new unsupervised spectral feature selection model by embedding a graph regularizer into the framework of joint sparse regression for preserving the local structures of data. To do this, we first extract the bases of training data by previous dictionary learning methods and, then, map original data into the basis space to generate their new representations, by proposing a novel joint graph sparse coding (JGSC) model. In JGSC, we first formulate its objective function by simultaneously taking subspace learning and joint sparse regression into account, then, design a new optimization solution to solve the resulting objective function, and further prove the convergence of the proposed solution. Furthermore, we extend JGSC to a robust JGSC (RJGSC) via replacing the least square loss function with a robust loss function, for achieving the same goals and also avoiding the impact of outliers. Finally, experimental results on real data sets showed that both JGSC and RJGSC outperformed the state-of-the-art algorithms in terms of k -nearest neighbor classification performance.
Predictive analysis and data mining among the employment of fresh graduate students in HEI
NASA Astrophysics Data System (ADS)
Rahman, Nor Azziaty Abdul; Tan, Kian Lam; Lim, Chen Kim
2017-10-01
Management of higher education have a problem in producing 100% of graduates who can meet the needs of industry while industry is also facing the problem of finding skilled graduates who suit their needs partly due to the lack of an effective method in assessing problem solving skills as well as weaknesses in the assessment of problem-solving skills. The purpose of this paper is to propose a suitable classification model that can be used in making prediction and assessment of the attributes of the student's dataset to meet the selection criteria of work demanded by the industry of the graduates in the academic field. Supervised and unsupervised Machine Learning Algorithms were used in this research where; K-Nearest Neighbor, Naïve Bayes, Decision Tree, Neural Network, Logistic Regression and Support Vector Machine. The proposed model will help the university management to make a better long-term plans for producing graduates who are skilled, knowledgeable and fulfill the industry needs as well.
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
ERIC Educational Resources Information Center
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
Hiebeler, David E; Millett, Nicholas E
2011-06-21
We investigate a spatial lattice model of a population employing dispersal to nearest and second-nearest neighbors, as well as long-distance dispersal across the landscape. The model is studied via stochastic spatial simulations, ordinary pair approximation, and triplet approximation. The latter method, which uses the probabilities of state configurations of contiguous blocks of three sites as its state variables, is demonstrated to be greatly superior to pair approximations for estimating spatial correlation information at various scales. Correlations between pairs of sites separated by arbitrary distances are estimated by constructing spatial Markov processes using the information from both approximations. These correlations demonstrate why pair approximation misses basic qualitative features of the model, such as decreasing population density as a large proportion of offspring are dropped on second-nearest neighbors, and why triplet approximation is able to include them. Analytical and numerical results show that, excluding long-distance dispersal, the initial growth rate of an invading population is maximized and the equilibrium population density is also roughly maximized when the population spreads its offspring evenly over nearest and second-nearest neighboring sites. Copyright © 2011 Elsevier Ltd. All rights reserved.
Ground-state ordering of the J1-J2 model on the simple cubic and body-centered cubic lattices
NASA Astrophysics Data System (ADS)
Farnell, D. J. J.; Götze, O.; Richter, J.
2016-06-01
The J1-J2 Heisenberg model is a "canonical" model in the field of quantum magnetism in order to study the interplay between frustration and quantum fluctuations as well as quantum phase transitions driven by frustration. Here we apply the coupled cluster method (CCM) to study the spin-half J1-J2 model with antiferromagnetic nearest-neighbor bonds J1>0 and next-nearest-neighbor bonds J2>0 for the simple cubic (sc) and body-centered cubic (bcc) lattices. In particular, we wish to study the ground-state ordering of these systems as a function of the frustration parameter p =z2J2/z1J1 , where z1 (z2) is the number of nearest (next-nearest) neighbors. We wish to determine the positions of the phase transitions using the CCM and we aim to resolve the nature of the phase transition points. We consider the ground-state energy, order parameters, spin-spin correlation functions, as well as the spin stiffness in order to determine the ground-state phase diagrams of these models. We find a direct first-order phase transition at a value of p =0.528 from a state of nearest-neighbor Néel order to next-nearest-neighbor Néel order for the bcc lattice. For the sc lattice the situation is more subtle. CCM results for the energy, the order parameter, the spin-spin correlation functions, and the spin stiffness indicate that there is no direct first-order transition between ground-state phases with magnetic long-range order, rather it is more likely that two phases with antiferromagnetic long range are separated by a narrow region of a spin-liquid-like quantum phase around p =0.55 . Thus the strong frustration present in the J1-J2 Heisenberg model on the sc lattice may open a window for an unconventional quantum ground state in this three-dimensional spin model.
Identifying the most influential spreaders in complex networks by an Extended Local K-Shell Sum
NASA Astrophysics Data System (ADS)
Yang, Fan; Zhang, Ruisheng; Yang, Zhao; Hu, Rongjing; Li, Mengtian; Yuan, Yongna; Li, Keqin
Identifying influential spreaders is crucial for developing strategies to control the spreading process on complex networks. Following the well-known K-Shell (KS) decomposition, several improved measures are proposed. However, these measures cannot identify the most influential spreaders accurately. In this paper, we define a Local K-Shell Sum (LKSS) by calculating the sum of the K-Shell indices of the neighbors within 2-hops of a given node. Based on the LKSS, we propose an Extended Local K-Shell Sum (ELKSS) centrality to rank spreaders. The ELKSS is defined as the sum of the LKSS of the nearest neighbors of a given node. By assuming that the spreading process on networks follows the Susceptible-Infectious-Recovered (SIR) model, we perform extensive simulations on a series of real networks to compare the performance between the ELKSS centrality and other six measures. The results show that the ELKSS centrality has a better performance than the six measures to distinguish the spreading ability of nodes and to identify the most influential spreaders accurately.
Fidelity study of superconductivity in extended Hubbard models
Plonka, N.; Jia, C. J.; Wang, Y.; ...
2015-07-08
The Hubbard model with local on-site repulsion is generally thought to possess a superconducting ground state for appropriate parameters, but the effects of more realistic long-range Coulomb interactions have not been studied extensively. We study the influence of these interactions on superconductivity by including nearest- and next-nearest-neighbor extended Hubbard interactions in addition to the usual on-site terms. Utilizing numerical exact diagonalization, we analyze the signatures of superconductivity in the ground states through the fidelity metric of quantum information theory. Finally, we find that nearest and next-nearest neighbor interactions have thresholds above which they destabilize superconductivity regardless of whether they aremore » attractive or repulsive, seemingly due to competing charge fluctuations.« less
Como, F; Carnesecchi, E; Volani, S; Dorne, J L; Richardson, J; Bassan, A; Pavan, M; Benfenati, E
2017-01-01
Ecological risk assessment of plant protection products (PPPs) requires an understanding of both the toxicity and the extent of exposure to assess risks for a range of taxa of ecological importance including target and non-target species. Non-target species such as honey bees (Apis mellifera), solitary bees and bumble bees are of utmost importance because of their vital ecological services as pollinators of wild plants and crops. To improve risk assessment of PPPs in bee species, computational models predicting the acute and chronic toxicity of a range of PPPs and contaminants can play a major role in providing structural and physico-chemical properties for the prioritisation of compounds of concern and future risk assessments. Over the last three decades, scientific advisory bodies and the research community have developed toxicological databases and quantitative structure-activity relationship (QSAR) models that are proving invaluable to predict toxicity using historical data and reduce animal testing. This paper describes the development and validation of a k-Nearest Neighbor (k-NN) model using in-house software for the prediction of acute contact toxicity of pesticides on honey bees. Acute contact toxicity data were collected from different sources for 256 pesticides, which were divided into training and test sets. The k-NN models were validated with good prediction, with an accuracy of 70% for all compounds and of 65% for highly toxic compounds, suggesting that they might reliably predict the toxicity of structurally diverse pesticides and could be used to screen and prioritise new pesticides. Copyright © 2016 Elsevier Ltd. All rights reserved.
A novel image retrieval algorithm based on PHOG and LSH
NASA Astrophysics Data System (ADS)
Wu, Hongliang; Wu, Weimin; Peng, Jiajin; Zhang, Junyuan
2017-08-01
PHOG can describe the local shape of the image and its relationship between the spaces. The using of PHOG algorithm to extract image features in image recognition and retrieval and other aspects have achieved good results. In recent years, locality sensitive hashing (LSH) algorithm has been superior to large-scale data in solving near-nearest neighbor problems compared with traditional algorithms. This paper presents a novel image retrieval algorithm based on PHOG and LSH. First, we use PHOG to extract the feature vector of the image, then use L different LSH hash table to reduce the dimension of PHOG texture to index values and map to different bucket, and finally extract the corresponding value of the image in the bucket for second image retrieval using Manhattan distance. This algorithm can adapt to the massive image retrieval, which ensures the high accuracy of the image retrieval and reduces the time complexity of the retrieval. This algorithm is of great significance.
Homopolyrotaxanes and Homopolyrotaxane Networks of PEO
NASA Technical Reports Server (NTRS)
Pugh, Coleen; Mattice, Wayne
2005-01-01
In order to identify the optimum size of macrocrown ether for threading, we first investigated the size and shape of simple crown ethers in the melt at 373 K, and their extent of threading with PEO in the melt using coarse-grained Monte Carlo simulations on the 2nnd (second nearest neighbor diamond) lattice, which is a high coordination lattice whose coarse-grained chains can be reverse mapped into fully atomistic models in continuous space.
NASA Astrophysics Data System (ADS)
Nicolaou, N.; Nasuto, S. J.
2005-12-01
We agree with Duckrow and Albano [Phys. Rev. E 67, 063901 (2003)] and Quian Quiroga [Phys. Rev. E 67, 063902 (2003)] that mutual information (MI) is a useful measure of dependence for electroencephalogram (EEG) data, but we show that the improvement seen in the performance of MI on extracting dependence trends from EEG is more dependent on the type of MI estimator rather than any embedding technique used. In an independent study we conducted in search for an optimal MI estimator, and in particular for EEG applications, we examined the performance of a number of MI estimators on the data set used by Quian Quiroga in their original study, where the performance of different dependence measures on real data was investigated [Phys. Rev. E 65, 041903 (2002)]. We show that for EEG applications the best performance among the investigated estimators is achieved by k -nearest neighbors, which supports the conjecture by Quian Quiroga in Phys. Rev. E 67, 063902 (2003) that the nearest neighbor estimator is the most precise method for estimating MI.
Control of Synchronization Regimes in Networks of Mobile Interacting Agents
NASA Astrophysics Data System (ADS)
Perez-Diaz, Fernando; Zillmer, Ruediger; Groß, Roderich
2017-05-01
We investigate synchronization in a population of mobile pulse-coupled agents with a view towards implementations in swarm-robotics systems and mobile sensor networks. Previous theoretical approaches dealt with range and nearest-neighbor interactions. In the latter case, a synchronization-hindering regime for intermediate agent mobility is found. We investigate the robustness of this intermediate regime under practical scenarios. We show that synchronization in the intermediate regime can be predicted by means of a suitable metric of the phase response curve. Furthermore, we study more-realistic K -nearest-neighbor and cone-of-vision interactions, showing that it is possible to control the extent of the synchronization-hindering region by appropriately tuning the size of the neighborhood. To assess the effect of noise, we analyze the propagation of perturbations over the network and draw an analogy between the response in the hindering regime and stable chaos. Our findings reveal the conditions for the control of clock or activity synchronization of agents with intermediate mobility. In addition, the emergence of the intermediate regime is validated experimentally using a swarm of physical robots interacting with cone-of-vision interactions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gevorkyan, A. S., E-mail: g-ashot@sci.am; Sahakyan, V. V.
We study the classical 1D Heisenberg spin glasses in the framework of nearest-neighboring model. Based on the Hamilton equations we obtained the system of recurrence equations which allows to perform node-by-node calculations of a spin-chain. It is shown that calculations from the first principles of classical mechanics lead to ℕℙ hard problem, that however in the limit of the statistical equilibrium can be calculated by ℙ algorithm. For the partition function of the ensemble a new representation is offered in the form of one-dimensional integral of spin-chains’ energy distribution.
NASA Astrophysics Data System (ADS)
Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying
2018-06-01
In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.
Structure and Bonding in Noncrystalline Solids Abstracts
1983-06-02
displacement cascades are unlikely. Related damage studies as diffuse X- ray scattering, magnetic susceptibility and positron - annihilation lifetime...the positron annihilation lifetime data; diffuse X-ray scattering studies give evidence for "amorphized" clusters in neutron but not in elec-ron...feldspar glasses and glasses in the system CaO- MgO -SiO 2 . These results indicate that the nearest-neighbor and next- nearest-neighbor environments are very
Lavery, Richard; Zakrzewska, Krystyna; Beveridge, David; Bishop, Thomas C.; Case, David A.; Cheatham, Thomas; Dixit, Surjit; Jayaram, B.; Lankas, Filip; Laughton, Charles; Maddocks, John H.; Michon, Alexis; Osman, Roman; Orozco, Modesto; Perez, Alberto; Singh, Tanya; Spackova, Nada; Sponer, Jiri
2010-01-01
It is well recognized that base sequence exerts a significant influence on the properties of DNA and plays a significant role in protein–DNA interactions vital for cellular processes. Understanding and predicting base sequence effects requires an extensive structural and dynamic dataset which is currently unavailable from experiment. A consortium of laboratories was consequently formed to obtain this information using molecular simulations. This article describes results providing information not only on all 10 unique base pair steps, but also on all possible nearest-neighbor effects on these steps. These results are derived from simulations of 50–100 ns on 39 different DNA oligomers in explicit solvent and using a physiological salt concentration. We demonstrate that the simulations are converged in terms of helical and backbone parameters. The results show that nearest-neighbor effects on base pair steps are very significant, implying that dinucleotide models are insufficient for predicting sequence-dependent behavior. Flanking base sequences can notably lead to base pair step parameters in dynamic equilibrium between two conformational sub-states. Although this study only provides limited data on next-nearest-neighbor effects, we suggest that such effects should be analyzed before attempting to predict the sequence-dependent behavior of DNA. PMID:19850719
Jiang, Nanfeng; Song, Weiran; Wang, Hui; Guo, Gongde; Liu, Yuanyuan
2018-05-23
As the expectation for higher quality of life increases, consumers have higher demands for quality food. Food authentication is the technical means of ensuring food is what it says it is. A popular approach to food authentication is based on spectroscopy, which has been widely used for identifying and quantifying the chemical components of an object. This approach is non-destructive and effective but expensive. This paper presents a computer vision-based sensor system for food authentication, i.e., differentiating organic from non-organic apples. This sensor system consists of low-cost hardware and pattern recognition software. We use a flashlight to illuminate apples and capture their images through a diffraction grating. These diffraction images are then converted into a data matrix for classification by pattern recognition algorithms, including k -nearest neighbors ( k -NN), support vector machine (SVM) and three partial least squares discriminant analysis (PLS-DA)- based methods. We carry out experiments on a reasonable collection of apple samples and employ a proper pre-processing, resulting in a highest classification accuracy of 94%. Our studies conclude that this sensor system has the potential to provide a viable solution to empower consumers in food authentication.
A Novel Artificial Intelligence System for Endotracheal Intubation.
Carlson, Jestin N; Das, Samarjit; De la Torre, Fernando; Frisch, Adam; Guyette, Francis X; Hodgins, Jessica K; Yealy, Donald M
2016-01-01
Adequate visualization of the glottic opening is a key factor to successful endotracheal intubation (ETI); however, few objective tools exist to help guide providers' ETI attempts toward the glottic opening in real-time. Machine learning/artificial intelligence has helped to automate the detection of other visual structures but its utility with ETI is unknown. We sought to test the accuracy of various computer algorithms in identifying the glottic opening, creating a tool that could aid successful intubation. We collected a convenience sample of providers who each performed ETI 10 times on a mannequin using a video laryngoscope (C-MAC, Karl Storz Corp, Tuttlingen, Germany). We recorded each attempt and reviewed one-second time intervals for the presence or absence of the glottic opening. Four different machine learning/artificial intelligence algorithms analyzed each attempt and time point: k-nearest neighbor (KNN), support vector machine (SVM), decision trees, and neural networks (NN). We used half of the videos to train the algorithms and the second half to test the accuracy, sensitivity, and specificity of each algorithm. We enrolled seven providers, three Emergency Medicine attendings, and four paramedic students. From the 70 total recorded laryngoscopic video attempts, we created 2,465 time intervals. The algorithms had the following sensitivity and specificity for detecting the glottic opening: KNN (70%, 90%), SVM (70%, 90%), decision trees (68%, 80%), and NN (72%, 78%). Initial efforts at computer algorithms using artificial intelligence are able to identify the glottic opening with over 80% accuracy. With further refinements, video laryngoscopy has the potential to provide real-time, direction feedback to the provider to help guide successful ETI.
Computations on the massively parallel processor at the Goddard Space Flight Center
NASA Technical Reports Server (NTRS)
Strong, James P.
1991-01-01
Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.
NASA Astrophysics Data System (ADS)
Wu, Wei; Zhao, Dewei; Zhang, Huan
2015-12-01
Super-resolution image reconstruction is an effective method to improve the image quality. It has important research significance in the field of image processing. However, the choice of the dictionary directly affects the efficiency of image reconstruction. A sparse representation theory is introduced into the problem of the nearest neighbor selection. Based on the sparse representation of super-resolution image reconstruction method, a super-resolution image reconstruction algorithm based on multi-class dictionary is analyzed. This method avoids the redundancy problem of only training a hyper complete dictionary, and makes the sub-dictionary more representatives, and then replaces the traditional Euclidean distance computing method to improve the quality of the whole image reconstruction. In addition, the ill-posed problem is introduced into non-local self-similarity regularization. Experimental results show that the algorithm is much better results than state-of-the-art algorithm in terms of both PSNR and visual perception.
NASA Astrophysics Data System (ADS)
Chitchian, Shahab; Vincent, Kathleen L.; Vargas, Gracie; Motamedi, Massoud
2012-11-01
We have explored the use of optical coherence tomography (OCT) as a noninvasive tool for assessing the toxicity of topical microbicides, products used to prevent HIV, by monitoring the integrity of the vaginal epithelium. A novel feature-based segmentation algorithm using a nearest-neighbor classifier was developed to monitor changes in the morphology of vaginal epithelium. The two-step automated algorithm yielded OCT images with a clearly defined epithelial layer, enabling differentiation of normal and damaged tissue. The algorithm was robust in that it was able to discriminate the epithelial layer from underlying stroma as well as residual microbicide product on the surface. This segmentation technique for OCT images has the potential to be readily adaptable to the clinical setting for noninvasively defining the boundaries of the epithelium, enabling quantifiable assessment of microbicide-induced damage in vaginal tissue.
NASA Astrophysics Data System (ADS)
Tibi, R.; Young, C. J.; Gonzales, A.; Ballard, S.; Encarnacao, A. V.
2016-12-01
The matched filtering technique involving the cross-correlation of a waveform of interest with archived signals from a template library has proven to be a powerful tool for detecting events in regions with repeating seismicity. However, waveform correlation is computationally expensive, and therefore impractical for large template sets unless dedicated distributed computing hardware and software are used. In this study, we introduce an Approximate Nearest Neighbor (ANN) approach that enables the use of very large template libraries for waveform correlation without requiring a complex distributed computing system. Our method begins with a projection into a reduced dimensionality space based on correlation with a randomized subset of the full template archive. Searching for a specified number of nearest neighbors is accomplished by using randomized K-dimensional trees. We used the approach to search for matches to each of 2700 analyst-reviewed signal detections reported for May 2010 for the IMS station MKAR. The template library in this case consists of a dataset of more than 200,000 analyst-reviewed signal detections for the same station from 2002-2014 (excluding May 2010). Of these signal detections, 60% are teleseismic first P, and 15% regional phases (Pn, Pg, Sn, and Lg). The analyses performed on a standard desktop computer shows that the proposed approach performs the search of the large template libraries about 20 times faster than the standard full linear search, while achieving recall rates greater than 80%, with the recall rate increasing for higher correlation values. To decide whether to confirm a match, we use a hybrid method involving a cluster approach for queries with two or more matches, and correlation score for single matches. Of the signal detections that passed our confirmation process, 52% were teleseismic first P, and 30% were regional phases.
Molecular dynamics analysis of transitions between rotational isomers in polymethylene
NASA Astrophysics Data System (ADS)
Zúñiga, Ignacio; Bahar, Ivet; Dodge, Robert; Mattice, Wayne L.
1991-10-01
Molecular dynamics trajectories have been computed and analyzed for linear chains, with sizes ranging from C10H22 to C100H202, and for cyclic C100H200. All hydrogen atoms are included discretely. All bond lengths, bond angles, and torsion angles are variable. Hazard plots show a tendency, at very short times, for correlations between rotational isomeric transitions at bond i and i±2, in much the same manner as in the Brownian dynamics simulations reported by Helfand and co-workers. This correlation of next nearest neighbor bonds in isolated polyethylene chains is much weaker than the correlation found for next nearest neighbor CH-CH2 bonds in poly(1,4-trans-butadiene) confined to the channel formed by crystalline perhydrotriphenylene [Dodge and Mattice, Macromolecules 24, 2709 (1991)]. Less than half of the rotational isomeric transitions observed in the entire trajectory for C50H102 can be described as strongly coupled next nearest neighbor transitions. If correlated motions are identified with successive transitions, which occur within a time interval of Δt≤1 ps, only 18% of the transitions occur through cooperative motion of bonds i and i±2. An analysis of the entire data set of 2482 rotational isomeric state transitions, observed in a 3.7 ns trajectory for C50H102 at 400 K, was performed using a formalism that treats the transitions at different bonds as being independent. On time scales of 0.1 ns or longer, the analysis based on independent bonds accounts reasonably well for the results from the molecular dynamics simulations. At shorter times the molecular dynamics simulation reveals a higher mobility than implied by the analysis assuming independent bonds, presumably due to the influence of correlations that are important at shorter times.
Polymers with nearest- and next nearest-neighbor interactions on the Husimi lattice
NASA Astrophysics Data System (ADS)
Oliveira, Tiago J.
2016-04-01
The exact grand-canonical solution of a generalized interacting self-avoid walk (ISAW) model, placed on a Husimi lattice built with squares, is presented. In this model, beyond the traditional interaction {ω }1={{{e}}}{ɛ 1/{k}BT} between (nonconsecutive) monomers on nearest-neighbor (NN) sites, an additional energy {ɛ }2 is associated to next-NN (NNN) monomers. Three definitions of NNN sites/interactions are considered, where each monomer can have, effectively, at most two, four, or six NNN monomers on the Husimi lattice. The phase diagrams found in all cases have (qualitatively) the same thermodynamic properties: a non-polymerized (NP) and a polymerized (P) phase separated by a critical and a coexistence surface that meet at a tricritical (θ-) line. This θ-line is found even when one of the interactions is repulsive, existing for {ω }1 in the range [0,∞ ), i.e., for {ɛ }1/{k}BT in the range [-∞ ,∞ ). Thus, counterintuitively, a θ-point exists even for an infinite repulsion between NN monomers ({ω }1=0), being associated to a coil-‘soft globule’ transition. In the limit of an infinite repulsive force between NNN monomers, however, the coil-globule transition disappears, and only NP-P continuous transition is observed. This particular case, with {ω }2=0, is also solved exactly on the square lattice, using a transfer matrix calculation where a discontinuous NP-P transition is found. For attractive and repulsive forces between NN and NNN monomers, respectively, the model becomes quite similar to the semiflexible-ISAW one, whose crystalline phase is not observed here, as a consequence of the frustration due to competing NN and NNN forces. The mapping of the phase diagrams in canonical ones is discussed and compared with recent results from Monte Carlo simulations on the square lattice.
Real time target allocation in cooperative unmanned aerial vehicles
NASA Astrophysics Data System (ADS)
Kudleppanavar, Ganesh
The prolific development of Unmanned Aerial Vehicles (UAV's) in recent years has the potential to provide tremendous advantages in military, commercial and law enforcement applications. While safety and performance take precedence in the development lifecycle, autonomous operations and, in particular, cooperative missions have the ability to significantly enhance the usability of these vehicles. The success of cooperative missions relies on the optimal allocation of targets while taking into consideration the resource limitation of each vehicle. The task allocation process can be centralized or decentralized. This effort presents the development of a real time target allocation algorithm that considers available stored energy in each vehicle while minimizing the communication between each UAV. The algorithm utilizes a nearest neighbor search algorithm to locate new targets with respect to existing targets. Simulations show that this novel algorithm compares favorably to the mixed integer linear programming method, which is computationally more expensive. The implementation of this algorithm on Arduino and Xbee wireless modules shows the capability of the algorithm to execute efficiently on hardware with minimum computation complexity.
van Diest, Mike; Stegenga, Jan; Wörtche, Heinrich J.; Roerdink, Jos B. T. M; Verkerke, Gijsbertus J.; Lamoth, Claudine J. C.
2015-01-01
Background Exergames are becoming an increasingly popular tool for training balance ability, thereby preventing falls in older adults. Automatic, real time, assessment of the user’s balance control offers opportunities in terms of providing targeted feedback and dynamically adjusting the gameplay to the individual user, yet algorithms for quantification of balance control remain to be developed. The aim of the present study was to identify movement patterns, and variability therein, of young and older adults playing a custom-made weight-shifting (ice-skating) exergame. Methods Twenty older adults and twenty young adults played a weight-shifting exergame under five conditions of varying complexity, while multi-segmental whole-body movement data were captured using Kinect. Movement coordination patterns expressed during gameplay were identified using Self Organizing Maps (SOM), an artificial neural network, and variability in these patterns was quantified by computing Total Trajectory Variability (TTvar). Additionally a k Nearest Neighbor (kNN) classifier was trained to discriminate between young and older adults based on the SOM features. Results Results showed that TTvar was significantly higher in older adults than in young adults, when playing the exergame under complex task conditions. The kNN classifier showed a classification accuracy of 65.8%. Conclusions Older adults display more variable sway behavior than young adults, when playing the exergame under complex task conditions. The SOM features characterizing movement patterns expressed during exergaming allow for discriminating between young and older adults with limited accuracy. Our findings contribute to the development of algorithms for quantification of balance ability during home-based exergaming for balance training. PMID:26230655
van Diest, Mike; Stegenga, Jan; Wörtche, Heinrich J; Roerdink, Jos B T M; Verkerke, Gijsbertus J; Lamoth, Claudine J C
2015-01-01
Exergames are becoming an increasingly popular tool for training balance ability, thereby preventing falls in older adults. Automatic, real time, assessment of the user's balance control offers opportunities in terms of providing targeted feedback and dynamically adjusting the gameplay to the individual user, yet algorithms for quantification of balance control remain to be developed. The aim of the present study was to identify movement patterns, and variability therein, of young and older adults playing a custom-made weight-shifting (ice-skating) exergame. Twenty older adults and twenty young adults played a weight-shifting exergame under five conditions of varying complexity, while multi-segmental whole-body movement data were captured using Kinect. Movement coordination patterns expressed during gameplay were identified using Self Organizing Maps (SOM), an artificial neural network, and variability in these patterns was quantified by computing Total Trajectory Variability (TTvar). Additionally a k Nearest Neighbor (kNN) classifier was trained to discriminate between young and older adults based on the SOM features. Results showed that TTvar was significantly higher in older adults than in young adults, when playing the exergame under complex task conditions. The kNN classifier showed a classification accuracy of 65.8%. Older adults display more variable sway behavior than young adults, when playing the exergame under complex task conditions. The SOM features characterizing movement patterns expressed during exergaming allow for discriminating between young and older adults with limited accuracy. Our findings contribute to the development of algorithms for quantification of balance ability during home-based exergaming for balance training.
NASA Astrophysics Data System (ADS)
Kumada, Takayuki
2006-03-01
Tunneling chemical reactions D +H2→DH+H and D +DH→D2+H in solid HD -H2 and D2-H2 mixtures were studied in the temperature range between 4 and 8K. These reactions were initiated by UV photolysis of DI molecules doped in these solids for 30s and followed by measuring the time course of electron-spin-resonance (ESR) intensities of D and H atoms. ESR intensity of D atoms produced by the photolysis decreases but that of H atoms increases with time. Time course of the D and H intensities has the fast and slow processes. The fast process, which finishes within ˜300s after the photolysis, is assigned to the reaction of D atom with one of its nearest-neighboring H2 molecules, D(H2)n(HD)12-n→H(H2)n-1(HD)13-n or D(H2)n(D2)12-n→H(HD )(H2)n-1(D2)12-n for 12⩾n⩾1. Rate constant for the D +H2 reaction between neighboring D atom-H2 molecule pair is determined to be (7.5±0.7)×10-3s-1 in solid HD -H2 and (1.3±0.3)×10-2s-1 in D2-H2 at 4.1K, which is very close to that calculated based on the theory of chemical reaction in gas phase by Hancock et al. [J. Chem. Phys. 91, 3492 (1989)] and Takayanagi and Sato [J. Chem. Phys. 92, 2862 (1990)]. This rate constant was found to be independent of temperature up to 7K within experimental error of ±30%. The slow process is assigned to the reaction of D atom produced in a cage fully surrounded by HD or D2 molecules, D(HD)12 or D(D2)12. This D atom undergoes the D +DH reaction with one of its nearest-neighboring HD molecules in solid HD -H2 or diffuses to the neighbor of H2 molecules to allow the D +H2 reaction in solid HD -H2 and D2-H2. The former is the main channel in solid HD -H2 below 6K where D atoms diffuse very slowly, whereas the latter dominates over the former above 6K. Rate for the reactions in the slow process is independent of temperature below 6K but increases with the increase in temperature above 6K. We found that the increase is due to the increase in hopping rate of D atoms to the neighbor of H2 molecules. Rate constant for the D +DH reaction was found to be independent of temperature up to 7K as well.
OCR enhancement through neighbor embedding and fast approximate nearest neighbors
NASA Astrophysics Data System (ADS)
Smith, D. C.
2012-10-01
Generic optical character recognition (OCR) engines often perform very poorly in transcribing scanned low resolution (LR) text documents. To improve OCR performance, we apply the Neighbor Embedding (NE) single-image super-resolution (SISR) technique to LR scanned text documents to obtain high resolution (HR) versions, which we subsequently process with OCR. For comparison, we repeat this procedure using bicubic interpolation (BI). We demonstrate that mean-square errors (MSE) in NE HR estimates do not increase substantially when NE is trained in one Latin font style and tested in another, provided both styles belong to the same font category (serif or sans serif). This is very important in practice, since for each font size, the number of training sets required for each category may be reduced from dozens to just one. We also incorporate randomized k-d trees into our NE implementation to perform approximate nearest neighbor search, and obtain a 1000x speed up of our original NE implementation, with negligible MSE degradation. This acceleration also made it practical to combine all of our size-specific NE Latin models into a single Universal Latin Model (ULM). The ULM eliminates the need to determine the unknown font category and size of an input LR text document and match it to an appropriate model, a very challenging task, since the dpi (pixels per inch) of the input LR image is generally unknown. Our experiments show that OCR character error rates (CER) were over 90% when we applied the Tesseract OCR engine to LR text documents (scanned at 75 dpi and 100 dpi) in the 6-10 pt range. By contrast, using k-d trees and the ULM, CER after NE preprocessing averaged less than 7% at 3x (100 dpi LR scanning) and 4x (75 dpi LR scanning) magnification, over an order of magnitude improvement. Moreover, CER after NE preprocessing was more that 6 times lower on average than after BI preprocessing.
Minimizers with Bounded Action for the High-Dimensional Frenkel-Kontorova Model
NASA Astrophysics Data System (ADS)
Miao, Xue-Qing; Wang, Ya-Nan; Qin, Wen-Xin
In Aubry-Mather theory for monotone twist maps or for one-dimensional Frenkel-Kontorova (FK) model with nearest neighbor interactions, each global minimizer (minimal energy configuration) is naturally Birkhoff. However, this is not true for the one-dimensional FK model with non-nearest neighbor interactions or for the high-dimensional FK model. In this paper, we study the Birkhoff property of minimizers with bounded action for the high-dimensional FK model.
NASA Astrophysics Data System (ADS)
Tarasenko, Alexander
2018-01-01
Diffusion of particles adsorbed on a homogeneous one-dimensional lattice is investigated using a theoretical approach and MC simulations. The analytical dependencies calculated in the framework of approach are tested using the numerical data. The perfect coincidence of the data obtained by these different methods demonstrates that the correctness of the approach based on the theory of the non-equilibrium statistical operator.
NASA Technical Reports Server (NTRS)
Salzmann, D.; Stein, J.; Goldberg, I. B.; Pratt, R. H.
1991-01-01
The effect of the cylindrical symmetry imposed by the nearest-neighbor ions on the ionic levels and the emission spectra of a Li-like Kr ion immersed in hot and dense plasmas is investigated using the Stein et al. (1989) two-centered model extended to include computations of the line profiles, shifts, and widths, as well as the energy-level mixing and the forbidden transition probabilities. It is shown that the cylindrical symmetry mixes states with different orbital quantum numbers l, particularly for highly excited states, and, thereby, gives rise to forbidden transitions in the emission spectrum. Results are obtained for the variation of the ionic level shifts and mixing coefficients with the distance to the nearest neighbor. Also obtained are representative computed spectra that show the density effects on the spectral line profiles, shifts, and widths, and the forbidden components in the spectrum.
Acoustic localization of triggered lightning
NASA Astrophysics Data System (ADS)
Arechiga, Rene O.; Johnson, Jeffrey B.; Edens, Harald E.; Thomas, Ronald J.; Rison, William
2011-05-01
We use acoustic (3.3-500 Hz) arrays to locate local (<20 km) thunder produced by triggered lightning in the Magdalena Mountains of central New Mexico. The locations of the thunder sources are determined by the array back azimuth and the elapsed time since discharge of the lightning flash. We compare the acoustic source locations with those obtained by the Lightning Mapping Array (LMA) from Langmuir Laboratory, which is capable of accurately locating the lightning channels. To estimate the location accuracy of the acoustic array we performed Monte Carlo simulations and measured the distance (nearest neighbors) between acoustic and LMA sources. For close sources (<5 km) the mean nearest-neighbors distance was 185 m compared to 100 m predicted by the Monte Carlo analysis. For far distances (>6 km) the error increases to 800 m for the nearest neighbors and 650 m for the Monte Carlo analysis. This work shows that thunder sources can be accurately located using acoustic signals.
Chen, Peng; Li, Jinyan; Wong, Limsoon; Kuwahara, Hiroyuki; Huang, Jianhua Z; Gao, Xin
2013-08-01
Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. Copyright © 2013 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
ZHANG, H; Huang, J; Ma, J
2014-06-15
Purpose: To study the noise correlation properties of cone-beam CT (CBCT) projection data and to incorporate the noise correlation information to a statistics-based projection restoration algorithm for noise reduction in low-dose CBCT. Methods: In this study, we systematically investigated the noise correlation properties among detector bins of CBCT projection data by analyzing repeated projection measurements. The measurements were performed on a TrueBeam on-board CBCT imaging system with a 4030CB flat panel detector. An anthropomorphic male pelvis phantom was used to acquire 500 repeated projection data at six different dose levels from 0.1 mAs to 1.6 mAs per projection at threemore » fixed angles. To minimize the influence of the lag effect, lag correction was performed on the consecutively acquired projection data. The noise correlation coefficient between detector bin pairs was calculated from the corrected projection data. The noise correlation among CBCT projection data was then incorporated into the covariance matrix of the penalized weighted least-squares (PWLS) criterion for noise reduction of low-dose CBCT. Results: The analyses of the repeated measurements show that noise correlation coefficients are non-zero between the nearest neighboring bins of CBCT projection data. The average noise correlation coefficients for the first- and second- order neighbors are about 0.20 and 0.06, respectively. The noise correlation coefficients are independent of the dose level. Reconstruction of the pelvis phantom shows that the PWLS criterion with consideration of noise correlation (PWLS-Cor) results in a lower noise level as compared to the PWLS criterion without considering the noise correlation (PWLS-Dia) at the matched resolution. Conclusion: Noise is correlated among nearest neighboring detector bins of CBCT projection data. An accurate noise model of CBCT projection data can improve the performance of the statistics-based projection restoration algorithm for low-dose CBCT.« less
Automatic morphological classification of galaxy images
Shamir, Lior
2009-01-01
We describe an image analysis supervised learning algorithm that can automatically classify galaxy images. The algorithm is first trained using a manually classified images of elliptical, spiral, and edge-on galaxies. A large set of image features is extracted from each image, and the most informative features are selected using Fisher scores. Test images can then be classified using a simple Weighted Nearest Neighbor rule such that the Fisher scores are used as the feature weights. Experimental results show that galaxy images from Galaxy Zoo can be classified automatically to spiral, elliptical and edge-on galaxies with accuracy of ~90% compared to classifications carried out by the author. Full compilable source code of the algorithm is available for free download, and its general-purpose nature makes it suitable for other uses that involve automatic image analysis of celestial objects. PMID:20161594
NASA Astrophysics Data System (ADS)
Le, Anh H.; Park, Young W.; Ma, Kevin; Jacobs, Colin; Liu, Brent J.
2010-03-01
Multiple Sclerosis (MS) is a progressive neurological disease affecting myelin pathways in the brain. Multiple lesions in the white matter can cause paralysis and severe motor disabilities of the affected patient. To solve the issue of inconsistency and user-dependency in manual lesion measurement of MRI, we have proposed a 3-D automated lesion quantification algorithm to enable objective and efficient lesion volume tracking. The computer-aided detection (CAD) of MS, written in MATLAB, utilizes K-Nearest Neighbors (KNN) method to compute the probability of lesions on a per-voxel basis. Despite the highly optimized algorithm of imaging processing that is used in CAD development, MS CAD integration and evaluation in clinical workflow is technically challenging due to the requirement of high computation rates and memory bandwidth in the recursive nature of the algorithm. In this paper, we present the development and evaluation of using a computing engine in the graphical processing unit (GPU) with MATLAB for segmentation of MS lesions. The paper investigates the utilization of a high-end GPU for parallel computing of KNN in the MATLAB environment to improve algorithm performance. The integration is accomplished using NVIDIA's CUDA developmental toolkit for MATLAB. The results of this study will validate the practicality and effectiveness of the prototype MS CAD in a clinical setting. The GPU method may allow MS CAD to rapidly integrate in an electronic patient record or any disease-centric health care system.
Sampling algorithms for validation of supervised learning models for Ising-like systems
NASA Astrophysics Data System (ADS)
Portman, Nataliya; Tamblyn, Isaac
2017-12-01
In this paper, we build and explore supervised learning models of ferromagnetic system behavior, using Monte-Carlo sampling of the spin configuration space generated by the 2D Ising model. Given the enormous size of the space of all possible Ising model realizations, the question arises as to how to choose a reasonable number of samples that will form physically meaningful and non-intersecting training and testing datasets. Here, we propose a sampling technique called ;ID-MH; that uses the Metropolis-Hastings algorithm creating Markov process across energy levels within the predefined configuration subspace. We show that application of this method retains phase transitions in both training and testing datasets and serves the purpose of validation of a machine learning algorithm. For larger lattice dimensions, ID-MH is not feasible as it requires knowledge of the complete configuration space. As such, we develop a new ;block-ID; sampling strategy: it decomposes the given structure into square blocks with lattice dimension N ≤ 5 and uses ID-MH sampling of candidate blocks. Further comparison of the performance of commonly used machine learning methods such as random forests, decision trees, k nearest neighbors and artificial neural networks shows that the PCA-based Decision Tree regressor is the most accurate predictor of magnetizations of the Ising model. For energies, however, the accuracy of prediction is not satisfactory, highlighting the need to consider more algorithmically complex methods (e.g., deep learning).
Caso, Giuseppe; de Nardis, Luca; di Benedetto, Maria-Gabriella
2015-10-30
The weighted k-nearest neighbors (WkNN) algorithm is by far the most popular choice in the design of fingerprinting indoor positioning systems based on WiFi received signal strength (RSS). WkNN estimates the position of a target device by selecting k reference points (RPs) based on the similarity of their fingerprints with the measured RSS values. The position of the target device is then obtained as a weighted sum of the positions of the k RPs. Two-step WkNN positioning algorithms were recently proposed, in which RPs are divided into clusters using the affinity propagation clustering algorithm, and one representative for each cluster is selected. Only cluster representatives are then considered during the position estimation, leading to a significant computational complexity reduction compared to traditional, flat WkNN. Flat and two-step WkNN share the issue of properly selecting the similarity metric so as to guarantee good positioning accuracy: in two-step WkNN, in particular, the metric impacts three different steps in the position estimation, that is cluster formation, cluster selection and RP selection and weighting. So far, however, the only similarity metric considered in the literature was the one proposed in the original formulation of the affinity propagation algorithm. This paper fills this gap by comparing different metrics and, based on this comparison, proposes a novel mixed approach in which different metrics are adopted in the different steps of the position estimation procedure. The analysis is supported by an extensive experimental campaign carried out in a multi-floor 3D indoor positioning testbed. The impact of similarity metrics and their combinations on the structure and size of the resulting clusters, 3D positioning accuracy and computational complexity are investigated. Results show that the adoption of metrics different from the one proposed in the original affinity propagation algorithm and, in particular, the combination of different metrics can significantly improve the positioning accuracy while preserving the efficiency in computational complexity typical of two-step algorithms.
Caso, Giuseppe; de Nardis, Luca; di Benedetto, Maria-Gabriella
2015-01-01
The weighted k-nearest neighbors (WkNN) algorithm is by far the most popular choice in the design of fingerprinting indoor positioning systems based on WiFi received signal strength (RSS). WkNN estimates the position of a target device by selecting k reference points (RPs) based on the similarity of their fingerprints with the measured RSS values. The position of the target device is then obtained as a weighted sum of the positions of the k RPs. Two-step WkNN positioning algorithms were recently proposed, in which RPs are divided into clusters using the affinity propagation clustering algorithm, and one representative for each cluster is selected. Only cluster representatives are then considered during the position estimation, leading to a significant computational complexity reduction compared to traditional, flat WkNN. Flat and two-step WkNN share the issue of properly selecting the similarity metric so as to guarantee good positioning accuracy: in two-step WkNN, in particular, the metric impacts three different steps in the position estimation, that is cluster formation, cluster selection and RP selection and weighting. So far, however, the only similarity metric considered in the literature was the one proposed in the original formulation of the affinity propagation algorithm. This paper fills this gap by comparing different metrics and, based on this comparison, proposes a novel mixed approach in which different metrics are adopted in the different steps of the position estimation procedure. The analysis is supported by an extensive experimental campaign carried out in a multi-floor 3D indoor positioning testbed. The impact of similarity metrics and their combinations on the structure and size of the resulting clusters, 3D positioning accuracy and computational complexity are investigated. Results show that the adoption of metrics different from the one proposed in the original affinity propagation algorithm and, in particular, the combination of different metrics can significantly improve the positioning accuracy while preserving the efficiency in computational complexity typical of two-step algorithms. PMID:26528984
Automatic Recognition of Breathing Route During Sleep Using Snoring Sounds
NASA Astrophysics Data System (ADS)
Mikami, Tsuyoshi; Kojima, Yohichiro
This letter classifies snoring sounds into three breathing routes (oral, nasal, and oronasal) with discriminant analysis of the power spectra and k-nearest neighbor method. It is necessary to recognize breathing route during snoring, because oral snoring is a typical symptom of sleep apnea but we cannot know our own breathing and snoring condition during sleep. As a result, about 98.8% classification rate is obtained by using leave-one-out test for performance evaluation.
Ab initio calculation of atomic interactions on Al(110): implications for epitaxial growth
NASA Astrophysics Data System (ADS)
Fichthorn, Kristen; Tiwary, Yogesh
2007-03-01
Using first-principles calculations based on density-functional theory, we resolved atomic interactions between adsorbed Al atoms on Al(110). Relevant pair and trio interactions were quantified. We find that pair interactions extend to the third in-channel and second cross-channel neighbor on the anisotropic (110) surface. Beyond these distances, pair interactions are negligible. The nearest-neighbor interaction in the in-channel direction is attractive, but nearest-neighbor cross-channel interaction is repulsive. While nearest-neighbor, cross-channel repulsion does not support the experimental observation of 3D hut formation in Al/Al(110) homoepitaxial growth [1], we find that trio interactions can be significant and attractive and they support cross-channel bonding. The pair and trio interactions have direct and indirect components. We have quantified the electronic and elastic components of the indirect, substrate-mediated interactions. We also probe the influence of these interactions on the energy barriers for adatom hopping. [1] F. Buatier de Mongeot, W. Zhu, A. Molle, R. Buzio, C. Boragno, U. Valbusa, E. Wang, and Z. Zhang, Phys. Rev. Lett. 91, 016102 (2003).
Mapping from multiple-control Toffoli circuits to linear nearest neighbor quantum circuits
NASA Astrophysics Data System (ADS)
Cheng, Xueyun; Guan, Zhijin; Ding, Weiping
2018-07-01
In recent years, quantum computing research has been attracting more and more attention, but few studies on the limited interaction distance between quantum bits (qubit) are deeply carried out. This paper presents a mapping method for transforming multiple-control Toffoli (MCT) circuits into linear nearest neighbor (LNN) quantum circuits instead of traditional decomposition-based methods. In order to reduce the number of inserted SWAP gates, a novel type of gate with the optimal LNN quantum realization was constructed, namely NNTS gate. The MCT gate with multiple control bits could be better cascaded by the NNTS gates, in which the arrangement of the input lines was LNN arrangement of the MCT gate. Then, the communication overhead measurement model on inserted SWAP gate count from the original arrangement to the new arrangement was put forward, and we selected one of the LNN arrangements with the minimum SWAP gate count. Moreover, the LNN arrangement-based mapping algorithm was given, and it dealt with the MCT gates in turn and mapped each MCT gate into its LNN form by inserting the minimum number of SWAP gates. Finally, some simplification rules were used, which can further reduce the final quantum cost of the LNN quantum circuit. Experiments on some benchmark MCT circuits indicate that the direct mapping algorithm results in fewer additional SWAP gates in about 50%, while the average improvement rate in quantum cost is 16.95% compared to the decomposition-based method. In addition, it has been verified that the proposed method has greater superiority for reversible circuits cascaded by MCT gates with more control bits.
Isolated galaxies, pairs, and groups of galaxies
NASA Technical Reports Server (NTRS)
Kuneva, I.; Kalinkov, M.
1990-01-01
The authors searched for isolated galaxies, pairs and groups of galaxies in the CfA survey (Huchra et al. 1983). It was assumed that the distances to galaxies are given by R = V/H sub o, where H sub o = 100 km s(exp -1) Mpc(exp -1) and R greater than 6 Mpc. The searching procedure is close to those, applied to find superclusters of galaxies (Kalinkov and Kuneva 1985, 1986). A sphere with fixed radius r (asterisk) is described around each galaxy. The mean spatial density in the sphere is m. Let G (sup 1) be any galaxy and G (sup 2) be its nearest neighbor at a distance R sub 2. If R sub 2 exceeds the 95 percent quintile in the distribution of the distances of the second neighbors, then G (sup 1) is an isolated galaxy. Let the midpoint of G (sup 1) and G (sup 2) be O sub 2 and r sub 2=R sub 2/2. For the volume V sub 2, defined with the radius r sub 2, the density D sub 2 less than k mu, the galaxy G (sup 2) is a single one and the procedure for searching for pairs and groups, beginning with this object is over and we have to pass to another object. Here the authors present the groups - isolated and nonisolated - with n greater than 3, found in the CfA survey in the Northern galactic hemisphere. The parameters used are k = 10 and r (asterisk) = 5 Mpc. Table 1 contains: (1) the group number, (2) the galaxy, nearest to the multiplet center, (3) multiplicity n, (4) the brightest galaxy if it is not listed in (2); (5) and (6) are R.A. and Dec. (1950), (7) - mean distance D in Mpc. Further there are the mean density rho (8) of the multiplet (galaxies Mpc (exp -3), (9) the density rho (asterisk) for r (asterisk) = 5 Mpc and (10) the density rho sub g for the group with its nearest neighbor. The parenthesized digits for densities in the last three columns are powers of ten.
The Monotonic Lagrangian Grid for Fast Air-Traffic Evaluation
NASA Technical Reports Server (NTRS)
Alexandrov, Natalia; Kaplan, Carolyn; Oran, Elaine; Boris, Jay
2010-01-01
This paper describes the continued development of a dynamic air-traffic model, ATMLG, intended for rapid evaluation of rules and methods to control and optimize transport systems. The underlying data structure is based on the Monotonic Lagrangian Grid (MLG), which is used for sorting and ordering positions and other data needed to describe N moving bodies, and their interactions. In ATMLG, the MLG is combined with algorithms for collision avoidance and updating aircraft trajectories. Aircraft that are close to each other in physical space are always near neighbors in the MLG data arrays, resulting in a fast nearest-neighbor interaction algorithm that scales as N. In this paper, we use ATMLG to examine how the ability to maintain a required separation between aircraft decreases as the number of aircraft in the volume increases. This requires keeping track of the primary and subsequent collision avoidance maneuvers necessary to maintain a five mile separation distance between all aircraft. Simulation results show that the number of collision avoidance moves increases exponentially with the number of aircraft in the volume.
Rectangular Array Of Digital Processors For Planning Paths
NASA Technical Reports Server (NTRS)
Kemeny, Sabrina E.; Fossum, Eric R.; Nixon, Robert H.
1993-01-01
Prototype 24 x 25 rectangular array of asynchronous parallel digital processors rapidly finds best path across two-dimensional field, which could be patch of terrain traversed by robotic or military vehicle. Implemented as single-chip very-large-scale integrated circuit. Excepting processors on edges, each processor communicates with four nearest neighbors along paths representing travel to north, south, east, and west. Each processor contains delay generator in form of 8-bit ripple counter, preset to 1 of 256 possible values. Operation begins with choice of processor representing starting point. Transmits signals to nearest neighbor processors, which retransmits to other neighboring processors, and process repeats until signals propagated across entire field.
Rule groupings in expert systems using nearest neighbour decision rules, and convex hulls
NASA Technical Reports Server (NTRS)
Anastasiadis, Stergios
1991-01-01
Expert System shells are lacking in many areas of software engineering. Large rule based systems are not semantically comprehensible, difficult to debug, and impossible to modify or validate. Partitioning a set of rules found in CLIPS (C Language Integrated Production System) into groups of rules which reflect the underlying semantic subdomains of the problem, will address adequately the concerns stated above. Techniques are introduced to structure a CLIPS rule base into groups of rules that inherently have common semantic information. The concepts involved are imported from the field of A.I., Pattern Recognition, and Statistical Inference. Techniques focus on the areas of feature selection, classification, and a criteria of how 'good' the classification technique is, based on Bayesian Decision Theory. A variety of distance metrics are discussed for measuring the 'closeness' of CLIPS rules and various Nearest Neighbor classification algorithms are described based on the above metric.
NASA Astrophysics Data System (ADS)
Martynov, S. N.; Tugarinov, V. I.; Martynov, A. S.
2017-10-01
The algorithm of approximate solution was developed for the differential equation describing the anharmonical change of the spin orientation angle in the model of ferromagnet with the exchange competition between nearest and next nearest magnetic neighbors and the easy axis exchange anisotropy. The equation was obtained from the collinearity constraint on the discrete lattice. In the low anharmonicity approximation the equation is resulted to an autonomous form and is integrated in quadratures. The obvious dependence of the angle velocity and second derivative of angle from angle and initial condition was derived by expanding the first integral of the equation in the Taylor series in vicinity of initial condition. The ground state of the soliton solutions was calculated by a numerical minimization of the energy integral. The evaluation of the used approximation was made for a triple point of the phase diagram.
Benchmarking protein classification algorithms via supervised cross-validation.
Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor
2008-04-24
Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
Capurso, Daniel; Bengtsson, Henrik; Segal, Mark R
2016-03-18
The spatial organization of the genome influences cellular function, notably gene regulation. Recent studies have assessed the three-dimensional (3D) co-localization of functional annotations (e.g. centromeres, long terminal repeats) using 3D genome reconstructions from Hi-C (genome-wide chromosome conformation capture) data; however, corresponding assessments for continuous functional genomic data (e.g. chromatin immunoprecipitation-sequencing (ChIP-seq) peak height) are lacking. Here, we demonstrate that applying bump hunting via the patient rule induction method (PRIM) to ChIP-seq data superposed on a Saccharomyces cerevisiae 3D genome reconstruction can discover 'functional 3D hotspots', regions in 3-space for which the mean ChIP-seq peak height is significantly elevated. For the transcription factor Swi6, the top hotspot by P-value contains MSB2 and ERG11 - known Swi6 target genes on different chromosomes. We verify this finding in a number of ways. First, this top hotspot is relatively stable under PRIM across parameter settings. Second, this hotspot is among the top hotspots by mean outcome identified by an alternative algorithm, k-Nearest Neighbor (k-NN) regression. Third, the distance between MSB2 and ERG11 is smaller than expected (by resampling) in two other 3D reconstructions generated via different normalization and reconstruction algorithms. This analytic approach can discover functional 3D hotspots and potentially reveal novel regulatory interactions. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Modeling occupancy distribution in large spaces with multi-feature classification algorithm
Wang, Wei; Chen, Jiayu; Hong, Tianzhen
2018-04-07
We present that occupancy information enables robust and flexible control of heating, ventilation, and air-conditioning (HVAC) systems in buildings. In large spaces, multiple HVAC terminals are typically installed to provide cooperative services for different thermal zones, and the occupancy information determines the cooperation among terminals. However, a person count at room-level does not adequately optimize HVAC system operation due to the movement of occupants within the room that creates uneven load distribution. Without accurate knowledge of the occupants’ spatial distribution, the uneven distribution of occupants often results in under-cooling/heating or over-cooling/heating in some thermal zones. Therefore, the lack of high-resolutionmore » occupancy distribution is often perceived as a bottleneck for future improvements to HVAC operation efficiency. To fill this gap, this study proposes a multi-feature k-Nearest-Neighbors (k-NN) classification algorithm to extract occupancy distribution through reliable, low-cost Bluetooth Low Energy (BLE) networks. An on-site experiment was conducted in a typical office of an institutional building to demonstrate the proposed methods, and the experiment outcomes of three case studies were examined to validate detection accuracy. One method based on City Block Distance (CBD) was used to measure the distance between detected occupancy distribution and ground truth and assess the results of occupancy distribution. Finally, the results show the accuracy when CBD = 1 is over 71.4% and the accuracy when CBD = 2 can reach up to 92.9%.« less
Modeling occupancy distribution in large spaces with multi-feature classification algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Wei; Chen, Jiayu; Hong, Tianzhen
We present that occupancy information enables robust and flexible control of heating, ventilation, and air-conditioning (HVAC) systems in buildings. In large spaces, multiple HVAC terminals are typically installed to provide cooperative services for different thermal zones, and the occupancy information determines the cooperation among terminals. However, a person count at room-level does not adequately optimize HVAC system operation due to the movement of occupants within the room that creates uneven load distribution. Without accurate knowledge of the occupants’ spatial distribution, the uneven distribution of occupants often results in under-cooling/heating or over-cooling/heating in some thermal zones. Therefore, the lack of high-resolutionmore » occupancy distribution is often perceived as a bottleneck for future improvements to HVAC operation efficiency. To fill this gap, this study proposes a multi-feature k-Nearest-Neighbors (k-NN) classification algorithm to extract occupancy distribution through reliable, low-cost Bluetooth Low Energy (BLE) networks. An on-site experiment was conducted in a typical office of an institutional building to demonstrate the proposed methods, and the experiment outcomes of three case studies were examined to validate detection accuracy. One method based on City Block Distance (CBD) was used to measure the distance between detected occupancy distribution and ground truth and assess the results of occupancy distribution. Finally, the results show the accuracy when CBD = 1 is over 71.4% and the accuracy when CBD = 2 can reach up to 92.9%.« less
NASA Astrophysics Data System (ADS)
Poklonski, N. A.; Vyrko, S. A.; Zabrodskii, A. G.
2010-08-01
Expressions for the pre-exponential factor σ3 and the thermal activation energy ɛ3 of hopping electric conductivity of electrons via hydrogen-like donors in n-type gallium arsenide are obtained in the quasiclassical approximation. Crystals with the donor concentration N and the acceptor concentration KN at the intermediate compensation ratio K (approximately from 0.25 to 0.75) are considered. We assume that the donors in the charge states (0) and (+1) and the acceptors in the charge state (-1) form a joint nonstoichiometric simple cubic 'sublattice' within the crystalline matrix. In such sublattice the distance between nearest impurity atoms is Rh = [(1 + K)N]-1/3 which is also the length of an electron hop between donors. To take into account orientational disorder of hops we assume that the impurity sublattice randomly and smoothly changes orientation inside a macroscopic sample. Values of σ3(N) and ɛ3(N) calculated for the temperature of 2.5 K agree with known experimental data at the insulator side of the insulator-metal phase transition.
Vrooman, Henri A; Cocosco, Chris A; van der Lijn, Fedde; Stokking, Rik; Ikram, M Arfan; Vernooij, Meike W; Breteler, Monique M B; Niessen, Wiro J
2007-08-01
Conventional k-Nearest-Neighbor (kNN) classification, which has been successfully applied to classify brain tissue in MR data, requires training on manually labeled subjects. This manual labeling is a laborious and time-consuming procedure. In this work, a new fully automated brain tissue classification procedure is presented, in which kNN training is automated. This is achieved by non-rigidly registering the MR data with a tissue probability atlas to automatically select training samples, followed by a post-processing step to keep the most reliable samples. The accuracy of the new method was compared to rigid registration-based training and to conventional kNN-based segmentation using training on manually labeled subjects for segmenting gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) in 12 data sets. Furthermore, for all classification methods, the performance was assessed when varying the free parameters. Finally, the robustness of the fully automated procedure was evaluated on 59 subjects. The automated training method using non-rigid registration with a tissue probability atlas was significantly more accurate than rigid registration. For both automated training using non-rigid registration and for the manually trained kNN classifier, the difference with the manual labeling by observers was not significantly larger than inter-observer variability for all tissue types. From the robustness study, it was clear that, given an appropriate brain atlas and optimal parameters, our new fully automated, non-rigid registration-based method gives accurate and robust segmentation results. A similarity index was used for comparison with manually trained kNN. The similarity indices were 0.93, 0.92 and 0.92, for CSF, GM and WM, respectively. It can be concluded that our fully automated method using non-rigid registration may replace manual segmentation, and thus that automated brain tissue segmentation without laborious manual training is feasible.
Oya, Yoshifumi; Hata, Kenji; Ohba, Tomonori
2017-10-24
We present the structures of NaCl aqueous solution in carbon nanotubes with diameters of 1, 2, and 3 nm based on an analysis performed using X-ray diffraction and canonical ensemble Monte Carlo simulations. Anomalously longer nearest-neighbor distances were observed in the electrolyte for the 1-nm-diameter carbon nanotubes; in contrast, in the 2 and 3 nm carbon nanotubes, the nearest-neighbor distances were shorter than those in the bulk electrolyte. We also observed similar properties for water in carbon nanotubes, which was expected because the main component of the electrolyte was water. However, the nearest-neighbor distances of the electrolyte were longer than those of water in all of the carbon nanotubes; the difference was especially pronounced in the 2-nm-diameter carbon nanotubes. Thus, small numbers of ions affected the entire structure of the electrolyte in the nanopores of the carbon nanotubes. The formation of strong hydration shells between ions and water molecules considerably interrupted the hydrogen bonding between water molecules in the nanopores of the carbon nanotubes. The hydration shell had a diameter of approximately 1 nm, and hydration shells were thus adopted for the nanopores of the 2-nm-diameter carbon nanotubes, providing an explanation for the large difference in the nearest-neighbor distances between the electrolyte and water in these nanopores.
PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer
Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models tomore » curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.« less
Olivera, André Rodrigues; Roesler, Valter; Iochpe, Cirano; Schmidt, Maria Inês; Vigo, Álvaro; Barreto, Sandhi Maria; Duncan, Bruce Bartholow
2017-01-01
Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. The best models were created using artificial neural networks and logistic regression. -These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.
NASA Astrophysics Data System (ADS)
Pini, Maria Gloria; Rettori, Angelo
1993-08-01
The thermodynamical properties of an alternating spin (S,s) one-dimensional (1D) Ising model with competing nearest- and next-nearest-neighbor interactions are exactly calculated using a transfer-matrix technique. In contrast to the case S=s=1/2, previously investigated by Harada, the alternation of different spins (S≠s) along the chain is found to give rise to two-peaked static structure factors, signaling the coexistence of different short-range-order configurations. The relevance of our calculations with regard to recent experimental data by Gatteschi et al. in quasi-1D molecular magnetic materials, R (hfac)3 NITEt (R=Gd, Tb, Dy, Ho, Er, . . .), is discussed; hfac is hexafluoro-acetylacetonate and NlTEt is 2-Ethyl-4,4,5,5-tetramethyl-4,5-dihydro-1H-imidazolyl-1-oxyl-3-oxide.
Evaluation of new collision-pair selection models in DSMC
NASA Astrophysics Data System (ADS)
Akhlaghi, Hassan; Roohi, Ehsan
2017-10-01
The current paper investigates new collision-pair selection procedures in a direct simulation Monte Carlo (DSMC) method. Collision partner selection based on the random procedure from nearest neighbor particles and deterministic selection of nearest neighbor particles have already been introduced as schemes that provide accurate results in a wide range of problems. In the current research, new collision-pair selections based on the time spacing and direction of the relative movement of particles are introduced and evaluated. Comparisons between the new and existing algorithms are made considering appropriate test cases including fluctuations in homogeneous gas, 2D equilibrium flow, and Fourier flow problem. Distribution functions for number of particles and collisions in cell, velocity components, and collisional parameters (collision separation, time spacing, relative velocity, and the angle between relative movements of particles) are investigated and compared with existing analytical relations for each model. The capability of each model in the prediction of the heat flux in the Fourier problem at different cell numbers, numbers of particles, and time steps is examined. For new and existing collision-pair selection schemes, the effect of an alternative formula for the number of collision-pair selections and avoiding repetitive collisions are investigated via the prediction of the Fourier heat flux. The simulation results demonstrate the advantages and weaknesses of each model in different test cases.