Sample records for k-nearest neighbor algorithm

  1. K-Nearest Neighbor Algorithm Optimization in Text Categorization

    NASA Astrophysics Data System (ADS)

    Chen, Shufeng

    2018-01-01

    K-Nearest Neighbor (KNN) classification algorithm is one of the simplest methods of data mining. It has been widely used in classification, regression and pattern recognition. The traditional KNN method has some shortcomings such as large amount of sample computation and strong dependence on the sample library capacity. In this paper, a method of representative sample optimization based on CURE algorithm is proposed. On the basis of this, presenting a quick algorithm QKNN (Quick k-nearest neighbor) to find the nearest k neighbor samples, which greatly reduces the similarity calculation. The experimental results show that this algorithm can effectively reduce the number of samples and speed up the search for the k nearest neighbor samples to improve the performance of the algorithm.

  2. A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.

    PubMed

    Wang, Xueyi

    2012-02-08

    The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.

  3. Scalable Nearest Neighbor Algorithms for High Dimensional Data.

    PubMed

    Muja, Marius; Lowe, David G

    2014-11-01

    For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.

  4. Quantum Algorithm for K-Nearest Neighbors Classification Based on the Metric of Hamming Distance

    NASA Astrophysics Data System (ADS)

    Ruan, Yue; Xue, Xiling; Liu, Heng; Tan, Jianing; Li, Xi

    2017-11-01

    K-nearest neighbors (KNN) algorithm is a common algorithm used for classification, and also a sub-routine in various complicated machine learning tasks. In this paper, we presented a quantum algorithm (QKNN) for implementing this algorithm based on the metric of Hamming distance. We put forward a quantum circuit for computing Hamming distance between testing sample and each feature vector in the training set. Taking advantage of this method, we realized a good analog for classical KNN algorithm by setting a distance threshold value t to select k - n e a r e s t neighbors. As a result, QKNN achieves O( n 3) performance which is only relevant to the dimension of feature vectors and high classification accuracy, outperforms Llyod's algorithm (Lloyd et al. 2013) and Wiebe's algorithm (Wiebe et al. 2014).

  5. Frog sound identification using extended k-nearest neighbor classifier

    NASA Astrophysics Data System (ADS)

    Mukahar, Nordiana; Affendi Rosdi, Bakhtiar; Athiar Ramli, Dzati; Jaafar, Haryati

    2017-09-01

    Frog sound identification based on the vocalization becomes important for biological research and environmental monitoring. As a result, different types of feature extractions and classifiers have been employed to evaluate the accuracy of frog sound identification. This paper presents a frog sound identification with Extended k-Nearest Neighbor (EKNN) classifier. The EKNN classifier integrates the nearest neighbors and mutual sharing of neighborhood concepts, with the aims of improving the classification performance. It makes a prediction based on who are the nearest neighbors of the testing sample and who consider the testing sample as their nearest neighbors. In order to evaluate the classification performance in frog sound identification, the EKNN classifier is compared with competing classifier, k -Nearest Neighbor (KNN), Fuzzy k -Nearest Neighbor (FKNN) k - General Nearest Neighbor (KGNN)and Mutual k -Nearest Neighbor (MKNN) on the recorded sounds of 15 frog species obtained in Malaysia forest. The recorded sounds have been segmented using Short Time Energy and Short Time Average Zero Crossing Rate (STE+STAZCR), sinusoidal modeling (SM), manual and the combination of Energy (E) and Zero Crossing Rate (ZCR) (E+ZCR) while the features are extracted by Mel Frequency Cepstrum Coefficient (MFCC). The experimental results have shown that the EKNCN classifier exhibits the best performance in terms of accuracy compared to the competing classifiers, KNN, FKNN, GKNN and MKNN for all cases.

  6. Applying an efficient K-nearest neighbor search to forest attribute imputation

    Treesearch

    Andrew O. Finley; Ronald E. McRoberts; Alan R. Ek

    2006-01-01

    This paper explores the utility of an efficient nearest neighbor (NN) search algorithm for applications in multi-source kNN forest attribute imputation. The search algorithm reduces the number of distance calculations between a given target vector and each reference vector, thereby, decreasing the time needed to discover the NN subset. Results of five trials show gains...

  7. Privacy Preserving Nearest Neighbor Search

    NASA Astrophysics Data System (ADS)

    Shaneck, Mark; Kim, Yongdae; Kumar, Vipin

    Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by providing mechanisms to mine the data while giving certain privacy guarantees. In this chapter we address the issue of privacy preserving nearest neighbor search, which forms the kernel of many data mining applications. To this end, we present a novel algorithm based on secure multiparty computation primitives to compute the nearest neighbors of records in horizontally distributed data. We show how this algorithm can be used in three important data mining algorithms, namely LOF outlier detection, SNN clustering, and kNN classification. We prove the security of these algorithms under the semi-honest adversarial model, and describe methods that can be used to optimize their performance. Keywords: Privacy Preserving Data Mining, Nearest Neighbor Search, Outlier Detection, Clustering, Classification, Secure Multiparty Computation

  8. Nearest Neighbor Algorithms for Pattern Classification

    NASA Technical Reports Server (NTRS)

    Barrios, J. O.

    1972-01-01

    A solution of the discrimination problem is considered by means of the minimum distance classifier, commonly referred to as the nearest neighbor (NN) rule. The NN rule is nonparametric, or distribution free, in the sense that it does not depend on any assumptions about the underlying statistics for its application. The k-NN rule is a procedure that assigns an observation vector z to a category F if most of the k nearby observations x sub i are elements of F. The condensed nearest neighbor (CNN) rule may be used to reduce the size of the training set required categorize The Bayes risk serves merely as a reference-the limit of excellence beyond which it is not possible to go. The NN rule is bounded below by the Bayes risk and above by twice the Bayes risk.

  9. Finger vein identification using fuzzy-based k-nearest centroid neighbor classifier

    NASA Astrophysics Data System (ADS)

    Rosdi, Bakhtiar Affendi; Jaafar, Haryati; Ramli, Dzati Athiar

    2015-02-01

    In this paper, a new approach for personal identification using finger vein image is presented. Finger vein is an emerging type of biometrics that attracts attention of researchers in biometrics area. As compared to other biometric traits such as face, fingerprint and iris, finger vein is more secured and hard to counterfeit since the features are inside the human body. So far, most of the researchers focus on how to extract robust features from the captured vein images. Not much research was conducted on the classification of the extracted features. In this paper, a new classifier called fuzzy-based k-nearest centroid neighbor (FkNCN) is applied to classify the finger vein image. The proposed FkNCN employs a surrounding rule to obtain the k-nearest centroid neighbors based on the spatial distributions of the training images and their distance to the test image. Then, the fuzzy membership function is utilized to assign the test image to the class which is frequently represented by the k-nearest centroid neighbors. Experimental evaluation using our own database which was collected from 492 fingers shows that the proposed FkNCN has better performance than the k-nearest neighbor, k-nearest-centroid neighbor and fuzzy-based-k-nearest neighbor classifiers. This shows that the proposed classifier is able to identify the finger vein image effectively.

  10. An Improvement To The k-Nearest Neighbor Classifier For ECG Database

    NASA Astrophysics Data System (ADS)

    Jaafar, Haryati; Hidayah Ramli, Nur; Nasir, Aimi Salihah Abdul

    2018-03-01

    The k nearest neighbor (kNN) is a non-parametric classifier and has been widely used for pattern classification. However, in practice, the performance of kNN often tends to fail due to the lack of information on how the samples are distributed among them. Moreover, kNN is no longer optimal when the training samples are limited. Another problem observed in kNN is regarding the weighting issues in assigning the class label before classification. Thus, to solve these limitations, a new classifier called Mahalanobis fuzzy k-nearest centroid neighbor (MFkNCN) is proposed in this study. Here, a Mahalanobis distance is applied to avoid the imbalance of samples distribition. Then, a surrounding rule is employed to obtain the nearest centroid neighbor based on the distributions of training samples and its distance to the query point. Consequently, the fuzzy membership function is employed to assign the query point to the class label which is frequently represented by the nearest centroid neighbor Experimental studies from electrocardiogram (ECG) signal is applied in this study. The classification performances are evaluated in two experimental steps i.e. different values of k and different sizes of feature dimensions. Subsequently, a comparative study of kNN, kNCN, FkNN and MFkCNN classifier is conducted to evaluate the performances of the proposed classifier. The results show that the performance of MFkNCN consistently exceeds the kNN, kNCN and FkNN with the best classification rates of 96.5%.

  11. An RFID Indoor Positioning Algorithm Based on Bayesian Probability and K-Nearest Neighbor.

    PubMed

    Xu, He; Ding, Ye; Li, Peng; Wang, Ruchuan; Li, Yizhu

    2017-08-05

    The Global Positioning System (GPS) is widely used in outdoor environmental positioning. However, GPS cannot support indoor positioning because there is no signal for positioning in an indoor environment. Nowadays, there are many situations which require indoor positioning, such as searching for a book in a library, looking for luggage in an airport, emergence navigation for fire alarms, robot location, etc. Many technologies, such as ultrasonic, sensors, Bluetooth, WiFi, magnetic field, Radio Frequency Identification (RFID), etc., are used to perform indoor positioning. Compared with other technologies, RFID used in indoor positioning is more cost and energy efficient. The Traditional RFID indoor positioning algorithm LANDMARC utilizes a Received Signal Strength (RSS) indicator to track objects. However, the RSS value is easily affected by environmental noise and other interference. In this paper, our purpose is to reduce the location fluctuation and error caused by multipath and environmental interference in LANDMARC. We propose a novel indoor positioning algorithm based on Bayesian probability and K -Nearest Neighbor (BKNN). The experimental results show that the Gaussian filter can filter some abnormal RSS values. The proposed BKNN algorithm has the smallest location error compared with the Gaussian-based algorithm, LANDMARC and an improved KNN algorithm. The average error in location estimation is about 15 cm using our method.

  12. The Application of Determining Students’ Graduation Status of STMIK Palangkaraya Using K-Nearest Neighbors Method

    NASA Astrophysics Data System (ADS)

    Rusdiana, Lili; Marfuah

    2017-12-01

    K-Nearest Neighbors method is one of methods used for classification which calculate a value to find out the closest in distance. It is used to group a set of data such as students’ graduation status that are got from the amount of course credits taken by them, the grade point average (AVG), and the mini-thesis grade. The study is conducted to know the results of using K-Nearest Neighbors method on the application of determining students’ graduation status, so it can be analyzed from the method used, the data, and the application constructed. The aim of this study is to find out the application results by using K-Nearest Neighbors concept to determine students’ graduation status using the data of STMIK Palangkaraya students. The development of the software used Extreme Programming, since it was appropriate and precise for this study which was to quickly finish the project. The application was created using Microsoft Office Excel 2007 for the training data and Matlab 7 to implement the application. The result of K-Nearest Neighbors method on the application of determining students’ graduation status was 92.5%. It could determine the predicate graduation of 94 data used from the initial data before the processing as many as 136 data which the maximal training data was 50data. The K-Nearest Neighbors method is one of methods used to group a set of data based on the closest value, so that using K-Nearest Neighbors method agreed with this study. The results of K-Nearest Neighbors method on the application of determining students’ graduation status was 92.5% could determine the predicate graduation which is the maximal training data. The K-Nearest Neighbors method is one of methods used to group a set of data based on the closest value, so that using K-Nearest Neighbors method agreed with this study.

  13. Multidimensional k-nearest neighbor model based on EEMD for financial time series forecasting

    NASA Astrophysics Data System (ADS)

    Zhang, Ningning; Lin, Aijing; Shang, Pengjian

    2017-07-01

    In this paper, we propose a new two-stage methodology that combines the ensemble empirical mode decomposition (EEMD) with multidimensional k-nearest neighbor model (MKNN) in order to forecast the closing price and high price of the stocks simultaneously. The modified algorithm of k-nearest neighbors (KNN) has an increasingly wide application in the prediction of all fields. Empirical mode decomposition (EMD) decomposes a nonlinear and non-stationary signal into a series of intrinsic mode functions (IMFs), however, it cannot reveal characteristic information of the signal with much accuracy as a result of mode mixing. So ensemble empirical mode decomposition (EEMD), an improved method of EMD, is presented to resolve the weaknesses of EMD by adding white noise to the original data. With EEMD, the components with true physical meaning can be extracted from the time series. Utilizing the advantage of EEMD and MKNN, the new proposed ensemble empirical mode decomposition combined with multidimensional k-nearest neighbor model (EEMD-MKNN) has high predictive precision for short-term forecasting. Moreover, we extend this methodology to the case of two-dimensions to forecast the closing price and high price of the four stocks (NAS, S&P500, DJI and STI stock indices) at the same time. The results indicate that the proposed EEMD-MKNN model has a higher forecast precision than EMD-KNN, KNN method and ARIMA.

  14. A Novel Hybrid Classification Model of Genetic Algorithms, Modified k-Nearest Neighbor and Developed Backpropagation Neural Network

    PubMed Central

    Salari, Nader; Shohaimi, Shamarina; Najafi, Farid; Nallappan, Meenakshii; Karishnarajah, Isthrinayagy

    2014-01-01

    Among numerous artificial intelligence approaches, k-Nearest Neighbor algorithms, genetic algorithms, and artificial neural networks are considered as the most common and effective methods in classification problems in numerous studies. In the present study, the results of the implementation of a novel hybrid feature selection-classification model using the above mentioned methods are presented. The purpose is benefitting from the synergies obtained from combining these technologies for the development of classification models. Such a combination creates an opportunity to invest in the strength of each algorithm, and is an approach to make up for their deficiencies. To develop proposed model, with the aim of obtaining the best array of features, first, feature ranking techniques such as the Fisher's discriminant ratio and class separability criteria were used to prioritize features. Second, the obtained results that included arrays of the top-ranked features were used as the initial population of a genetic algorithm to produce optimum arrays of features. Third, using a modified k-Nearest Neighbor method as well as an improved method of backpropagation neural networks, the classification process was advanced based on optimum arrays of the features selected by genetic algorithms. The performance of the proposed model was compared with thirteen well-known classification models based on seven datasets. Furthermore, the statistical analysis was performed using the Friedman test followed by post-hoc tests. The experimental findings indicated that the novel proposed hybrid model resulted in significantly better classification performance compared with all 13 classification methods. Finally, the performance results of the proposed model was benchmarked against the best ones reported as the state-of-the-art classifiers in terms of classification accuracy for the same data sets. The substantial findings of the comprehensive comparative study revealed that performance of the

  15. A Novel Graph Constructor for Semisupervised Discriminant Analysis: Combined Low-Rank and k-Nearest Neighbor Graph

    PubMed Central

    Pan, Yongke; Niu, Wenjia

    2017-01-01

    Semisupervised Discriminant Analysis (SDA) is a semisupervised dimensionality reduction algorithm, which can easily resolve the out-of-sample problem. Relative works usually focus on the geometric relationships of data points, which are not obvious, to enhance the performance of SDA. Different from these relative works, the regularized graph construction is researched here, which is important in the graph-based semisupervised learning methods. In this paper, we propose a novel graph for Semisupervised Discriminant Analysis, which is called combined low-rank and k-nearest neighbor (LRKNN) graph. In our LRKNN graph, we map the data to the LR feature space and then the kNN is adopted to satisfy the algorithmic requirements of SDA. Since the low-rank representation can capture the global structure and the k-nearest neighbor algorithm can maximally preserve the local geometrical structure of the data, the LRKNN graph can significantly improve the performance of SDA. Extensive experiments on several real-world databases show that the proposed LRKNN graph is an efficient graph constructor, which can largely outperform other commonly used baselines. PMID:28316616

  16. Using genetic algorithms to optimize k-Nearest Neighbors configurations for use with airborne laser scanning data

    Treesearch

    Ronald E. McRoberts; Grant M. Domke; Qi Chen; Erik Næsset; Terje Gobakken

    2016-01-01

    The relatively small sampling intensities used by national forest inventories are often insufficient to produce the desired precision for estimates of population parameters unless the estimation process is augmented with auxiliary information, usually in the form of remotely sensed data. The k-Nearest Neighbors (k-NN) technique is a non-parametric,multivariate approach...

  17. Diagnosis of diabetes diseases using an Artificial Immune Recognition System2 (AIRS2) with fuzzy K-nearest neighbor.

    PubMed

    Chikh, Mohamed Amine; Saidi, Meryem; Settouti, Nesma

    2012-10-01

    The use of expert systems and artificial intelligence techniques in disease diagnosis has been increasing gradually. Artificial Immune Recognition System (AIRS) is one of the methods used in medical classification problems. AIRS2 is a more efficient version of the AIRS algorithm. In this paper, we used a modified AIRS2 called MAIRS2 where we replace the K- nearest neighbors algorithm with the fuzzy K-nearest neighbors to improve the diagnostic accuracy of diabetes diseases. The diabetes disease dataset used in our work is retrieved from UCI machine learning repository. The performances of the AIRS2 and MAIRS2 are evaluated regarding classification accuracy, sensitivity and specificity values. The highest classification accuracy obtained when applying the AIRS2 and MAIRS2 using 10-fold cross-validation was, respectively 82.69% and 89.10%.

  18. Nearest neighbors by neighborhood counting.

    PubMed

    Wang, Hui

    2006-06-01

    Finding nearest neighbors is a general idea that underlies many artificial intelligence tasks, including machine learning, data mining, natural language understanding, and information retrieval. This idea is explicitly used in the k-nearest neighbors algorithm (kNN), a popular classification method. In this paper, this idea is adopted in the development of a general methodology, neighborhood counting, for devising similarity functions. We turn our focus from neighbors to neighborhoods, a region in the data space covering the data point in question. To measure the similarity between two data points, we consider all neighborhoods that cover both data points. We propose to use the number of such neighborhoods as a measure of similarity. Neighborhood can be defined for different types of data in different ways. Here, we consider one definition of neighborhood for multivariate data and derive a formula for such similarity, called neighborhood counting measure or NCM. NCM was tested experimentally in the framework of kNN. Experiments show that NCM is generally comparable to VDM and its variants, the state-of-the-art distance functions for multivariate data, and, at the same time, is consistently better for relatively large k values. Additionally, NCM consistently outperforms HEOM (a mixture of Euclidean and Hamming distances), the "standard" and most widely used distance function for multivariate data. NCM has a computational complexity in the same order as the standard Euclidean distance function and NCM is task independent and works for numerical and categorical data in a conceptually uniform way. The neighborhood counting methodology is proven sound for multivariate data experimentally. We hope it will work for other types of data.

  19. K-nearest neighbor imputation of forest inventory variables in New Hampshire

    Treesearch

    Andrew Lister; Michael Hoppus; Raymond L. Czaplewski

    2005-01-01

    The k-nearest neighbor (kNN) method was used to map stand volume for a mosaic of 4 Landsat scenes covering the state of New Hampshire. Data for gross cubic foot volume and trees per acre were summarized from USDA Forest Service Forest Inventory and Analysis (FIA) plots and used as training for kNN. Six bands of...

  20. K-Nearest Neighbor Estimation of Forest Attributes: Improving Mapping Efficiency

    Treesearch

    Andrew O. Finley; Alan R. Ek; Yun Bai; Marvin E. Bauer

    2005-01-01

    This paper describes our efforts in refining k-nearest neighbor forest attributes classification using U.S. Department of Agriculture Forest Service Forest Inventory and Analysis plot data and Landsat 7 Enhanced Thematic Mapper Plus imagery. The analysis focuses on FIA-defined forest type classification across St. Louis County in northeastern Minnesota. We outline...

  1. Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio

    NASA Astrophysics Data System (ADS)

    Nababan, A. A.; Sitompul, O. S.; Tulus

    2018-04-01

    K- Nearest Neighbor (KNN) is a good classifier, but from several studies, the result performance accuracy of KNN still lower than other methods. One of the causes of the low accuracy produced, because each attribute has the same effect on the classification process, while some less relevant characteristics lead to miss-classification of the class assignment for new data. In this research, we proposed Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio as a parameter to see the correlation between each attribute in the data and the Gain Ratio also will be used as the basis for weighting each attribute of the dataset. The accuracy of results is compared to the accuracy acquired from the original KNN method using 10-fold Cross-Validation with several datasets from the UCI Machine Learning repository and KEEL-Dataset Repository, such as abalone, glass identification, haberman, hayes-roth and water quality status. Based on the result of the test, the proposed method was able to increase the classification accuracy of KNN, where the highest difference of accuracy obtained hayes-roth dataset is worth 12.73%, and the lowest difference of accuracy obtained in the abalone dataset of 0.07%. The average result of the accuracy of all dataset increases the accuracy by 5.33%.

  2. Improving the accuracy of k-nearest neighbor using local mean based and distance weight

    NASA Astrophysics Data System (ADS)

    Syaliman, K. U.; Nababan, E. B.; Sitompul, O. S.

    2018-03-01

    In k-nearest neighbor (kNN), the determination of classes for new data is normally performed by a simple majority vote system, which may ignore the similarities among data, as well as allowing the occurrence of a double majority class that can lead to misclassification. In this research, we propose an approach to resolve the majority vote issues by calculating the distance weight using a combination of local mean based k-nearest neighbor (LMKNN) and distance weight k-nearest neighbor (DWKNN). The accuracy of results is compared to the accuracy acquired from the original k-NN method using several datasets from the UCI Machine Learning repository, Kaggle and Keel, such as ionosphare, iris, voice genre, lower back pain, and thyroid. In addition, the proposed method is also tested using real data from a public senior high school in city of Tualang, Indonesia. Results shows that the combination of LMKNN and DWKNN was able to increase the classification accuracy of kNN, whereby the average accuracy on test data is 2.45% with the highest increase in accuracy of 3.71% occurring on the lower back pain symptoms dataset. For the real data, the increase in accuracy is obtained as high as 5.16%.

  3. False-nearest-neighbors algorithm and noise-corrupted time series

    NASA Astrophysics Data System (ADS)

    Rhodes, Carl; Morari, Manfred

    1997-05-01

    The false-nearest-neighbors (FNN) algorithm was originally developed to determine the embedding dimension for autonomous time series. For noise-free computer-generated time series, the algorithm does a good job in predicting the embedding dimension. However, the problem of predicting the embedding dimension when the time-series data are corrupted by noise was not fully examined in the original studies of the FNN algorithm. Here it is shown that with large data sets, even small amounts of noise can lead to incorrect prediction of the embedding dimension. Surprisingly, as the length of the time series analyzed by FNN grows larger, the cause of incorrect prediction becomes more pronounced. An analysis of the effect of noise on the FNN algorithm and a solution for dealing with the effects of noise are given here. Some results on the theoretically correct choice of the FNN threshold are also presented.

  4. The nearest neighbor and next nearest neighbor effects on the thermodynamic and kinetic properties of RNA base pair

    NASA Astrophysics Data System (ADS)

    Wang, Yujie; Wang, Zhen; Wang, Yanli; Liu, Taigang; Zhang, Wenbing

    2018-01-01

    The thermodynamic and kinetic parameters of an RNA base pair with different nearest and next nearest neighbors were obtained through long-time molecular dynamics simulation of the opening-closing switch process of the base pair near its melting temperature. The results indicate that thermodynamic parameters of GC base pair are dependent on the nearest neighbor base pair, and the next nearest neighbor base pair has little effect, which validated the nearest-neighbor model. The closing and opening rates of the GC base pair also showed nearest neighbor dependences. At certain temperature, the closing and opening rates of the GC pair with nearest neighbor AU is larger than that with the nearest neighbor GC, and the next nearest neighbor plays little role. The free energy landscape of the GC base pair with the nearest neighbor GC is rougher than that with nearest neighbor AU.

  5. Credit scoring analysis using weighted k nearest neighbor

    NASA Astrophysics Data System (ADS)

    Mukid, M. A.; Widiharih, T.; Rusgiyono, A.; Prahutama, A.

    2018-05-01

    Credit scoring is a quatitative method to evaluate the credit risk of loan applications. Both statistical methods and artificial intelligence are often used by credit analysts to help them decide whether the applicants are worthy of credit. These methods aim to predict future behavior in terms of credit risk based on past experience of customers with similar characteristics. This paper reviews the weighted k nearest neighbor (WKNN) method for credit assessment by considering the use of some kernels. We use credit data from a private bank in Indonesia. The result shows that the Gaussian kernel and rectangular kernel have a better performance based on the value of percentage corrected classified whose value is 82.4% respectively.

  6. Study of parameters of the nearest neighbour shared algorithm on clustering documents

    NASA Astrophysics Data System (ADS)

    Mustika Rukmi, Alvida; Budi Utomo, Daryono; Imro’atus Sholikhah, Neni

    2018-03-01

    Document clustering is one way of automatically managing documents, extracting of document topics and fastly filtering information. Preprocess of clustering documents processed by textmining consists of: keyword extraction using Rapid Automatic Keyphrase Extraction (RAKE) and making the document as concept vector using Latent Semantic Analysis (LSA). Furthermore, the clustering process is done so that the documents with the similarity of the topic are in the same cluster, based on the preprocesing by textmining performed. Shared Nearest Neighbour (SNN) algorithm is a clustering method based on the number of "nearest neighbors" shared. The parameters in the SNN Algorithm consist of: k nearest neighbor documents, ɛ shared nearest neighbor documents and MinT minimum number of similar documents, which can form a cluster. Characteristics The SNN algorithm is based on shared ‘neighbor’ properties. Each cluster is formed by keywords that are shared by the documents. SNN algorithm allows a cluster can be built more than one keyword, if the value of the frequency of appearing keywords in document is also high. Determination of parameter values on SNN algorithm affects document clustering results. The higher parameter value k, will increase the number of neighbor documents from each document, cause similarity of neighboring documents are lower. The accuracy of each cluster is also low. The higher parameter value ε, caused each document catch only neighbor documents that have a high similarity to build a cluster. It also causes more unclassified documents (noise). The higher the MinT parameter value cause the number of clusters will decrease, since the number of similar documents can not form clusters if less than MinT. Parameter in the SNN Algorithm determine performance of clustering result and the amount of noise (unclustered documents ). The Silhouette coeffisient shows almost the same result in many experiments, above 0.9, which means that SNN algorithm works well

  7. Secure Nearest Neighbor Query on Crowd-Sensing Data

    PubMed Central

    Cheng, Ke; Wang, Liangmin; Zhong, Hong

    2016-01-01

    Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes. PMID:27669253

  8. Secure Nearest Neighbor Query on Crowd-Sensing Data.

    PubMed

    Cheng, Ke; Wang, Liangmin; Zhong, Hong

    2016-09-22

    Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes.

  9. K-nearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: Revisited

    NASA Astrophysics Data System (ADS)

    Wang, Dong

    2016-03-01

    Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical

  10. Quantum realization of the nearest neighbor value interpolation method for INEQR

    NASA Astrophysics Data System (ADS)

    Zhou, RiGui; Hu, WenWen; Luo, GaoFeng; Liu, XingAo; Fan, Ping

    2018-07-01

    This paper presents the nearest neighbor value (NNV) interpolation algorithm for the improved novel enhanced quantum representation of digital images (INEQR). It is necessary to use interpolation in image scaling because there is an increase or a decrease in the number of pixels. The difference between the proposed scheme and nearest neighbor interpolation is that the concept applied, to estimate the missing pixel value, is guided by the nearest value rather than the distance. Firstly, a sequence of quantum operations is predefined, such as cyclic shift transformations and the basic arithmetic operations. Then, the feasibility of the nearest neighbor value interpolation method for quantum image of INEQR is proven using the previously designed quantum operations. Furthermore, quantum image scaling algorithm in the form of circuits of the NNV interpolation for INEQR is constructed for the first time. The merit of the proposed INEQR circuit lies in their low complexity, which is achieved by utilizing the unique properties of quantum superposition and entanglement. Finally, simulation-based experimental results involving different classical images and ratios (i.e., conventional or non-quantum) are simulated based on the classical computer's MATLAB 2014b software, which demonstrates that the proposed interpolation method has higher performances in terms of high resolution compared to the nearest neighbor and bilinear interpolation.

  11. A Comparison of the Spatial Linear Model to Nearest Neighbor (k-NN) Methods for Forestry Applications

    Treesearch

    Jay M. Ver Hoef; Hailemariam Temesgen; Sergio Gómez

    2013-01-01

    Forest surveys provide critical information for many diverse interests. Data are often collected from samples, and from these samples, maps of resources and estimates of aerial totals or averages are required. In this paper, two approaches for mapping and estimating totals; the spatial linear model (SLM) and k-NN (k-Nearest Neighbor) are compared, theoretically,...

  12. Ising lattices with +/-J second-nearest-neighbor interactions

    NASA Astrophysics Data System (ADS)

    Ramírez-Pastor, A. J.; Nieto, F.; Vogel, E. E.

    1997-06-01

    Second-nearest-neighbor interactions are added to the usual nearest-neighbor Ising Hamiltonian for square lattices in different ways. The starting point is a square lattice where half the nearest-neighbor interactions are ferromagnetic and the other half of the bonds are antiferromagnetic. Then, second-nearest-neighbor interactions can also be assigned randomly or in a variety of causal manners determined by the nearest-neighbor interactions. In the present paper we consider three causal and three random ways of assigning second-nearest-neighbor exchange interactions. Several ground-state properties are then calculated for each of these lattices:energy per bond ɛg, site correlation parameter pg, maximal magnetization μg, and fraction of unfrustrated bonds hg. A set of 500 samples is considered for each size N (number of spins) and array (way of distributing the N spins). The properties of the original lattices with only nearest-neighbor interactions are already known, which allows realizing the effect of the additional interactions. We also include cubic lattices to discuss the distinction between coordination number and dimensionality. Comparison with results for triangular and honeycomb lattices is done at specific points.

  13. Using K-Nearest Neighbor Classification to Diagnose Abnormal Lung Sounds

    PubMed Central

    Chen, Chin-Hsing; Huang, Wen-Tzeng; Tan, Tan-Hsu; Chang, Cheng-Chun; Chang, Yuan-Jen

    2015-01-01

    A reported 30% of people worldwide have abnormal lung sounds, including crackles, rhonchi, and wheezes. To date, the traditional stethoscope remains the most popular tool used by physicians to diagnose such abnormal lung sounds, however, many problems arise with the use of a stethoscope, including the effects of environmental noise, the inability to record and store lung sounds for follow-up or tracking, and the physician’s subjective diagnostic experience. This study has developed a digital stethoscope to help physicians overcome these problems when diagnosing abnormal lung sounds. In this digital system, mel-frequency cepstral coefficients (MFCCs) were used to extract the features of lung sounds, and then the K-means algorithm was used for feature clustering, to reduce the amount of data for computation. Finally, the K-nearest neighbor method was used to classify the lung sounds. The proposed system can also be used for home care: if the percentage of abnormal lung sound frames is > 30% of the whole test signal, the system can automatically warn the user to visit a physician for diagnosis. We also used bend sensors together with an amplification circuit, Bluetooth, and a microcontroller to implement a respiration detector. The respiratory signal extracted by the bend sensors can be transmitted to the computer via Bluetooth to calculate the respiratory cycle, for real-time assessment. If an abnormal status is detected, the device will warn the user automatically. Experimental results indicated that the error in respiratory cycles between measured and actual values was only 6.8%, illustrating the potential of our detector for home care applications. PMID:26053756

  14. Estimating areal means and variances of forest attributes using the k-Nearest Neighbors technique and satellite imagery

    Treesearch

    Ronald E. McRoberts; Erkki O. Tomppo; Andrew O. Finley; Heikkinen Juha

    2007-01-01

    The k-Nearest Neighbor (k-NN) technique has become extremely popular for a variety of forest inventory mapping and estimation applications. Much of this popularity may be attributed to the non-parametric, multivariate features of the technique, its intuitiveness, and its ease of use. When used with satellite imagery and forest...

  15. [Galaxy/quasar classification based on nearest neighbor method].

    PubMed

    Li, Xiang-Ru; Lu, Yu; Zhou, Jian-Ming; Wang, Yong-Jun

    2011-09-01

    With the wide application of high-quality CCD in celestial spectrum imagery and the implementation of many large sky survey programs (e. g., Sloan Digital Sky Survey (SDSS), Two-degree-Field Galaxy Redshift Survey (2dF), Spectroscopic Survey Telescope (SST), Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) program and Large Synoptic Survey Telescope (LSST) program, etc.), celestial observational data are coming into the world like torrential rain. Therefore, to utilize them effectively and fully, research on automated processing methods for celestial data is imperative. In the present work, we investigated how to recognizing galaxies and quasars from spectra based on nearest neighbor method. Galaxies and quasars are extragalactic objects, they are far away from earth, and their spectra are usually contaminated by various noise. Therefore, it is a typical problem to recognize these two types of spectra in automatic spectra classification. Furthermore, the utilized method, nearest neighbor, is one of the most typical, classic, mature algorithms in pattern recognition and data mining, and often is used as a benchmark in developing novel algorithm. For applicability in practice, it is shown that the recognition ratio of nearest neighbor method (NN) is comparable to the best results reported in the literature based on more complicated methods, and the superiority of NN is that this method does not need to be trained, which is useful in incremental learning and parallel computation in mass spectral data processing. In conclusion, the results in this work are helpful for studying galaxies and quasars spectra classification.

  16. Simulating ensembles of source water quality using a K-nearest neighbor resampling approach.

    PubMed

    Towler, Erin; Rajagopalan, Balaji; Seidel, Chad; Summers, R Scott

    2009-03-01

    Climatological, geological, and water management factors can cause significant variability in surface water quality. As drinking water quality standards become more stringent, the ability to quantify the variability of source water quality becomes more important for decision-making and planning in water treatment for regulatory compliance. However, paucity of long-term water quality data makes it challenging to apply traditional simulation techniques. To overcome this limitation, we have developed and applied a robust nonparametric K-nearest neighbor (K-nn) bootstrap approach utilizing the United States Environmental Protection Agency's Information Collection Rule (ICR) data. In this technique, first an appropriate "feature vector" is formed from the best available explanatory variables. The nearest neighbors to the feature vector are identified from the ICR data and are resampled using a weight function. Repetition of this results in water quality ensembles, and consequently the distribution and the quantification of the variability. The main strengths of the approach are its flexibility, simplicity, and the ability to use a large amount of spatial data with limited temporal extent to provide water quality ensembles for any given location. We demonstrate this approach by applying it to simulate monthly ensembles of total organic carbon for two utilities in the U.S. with very different watersheds and to alkalinity and bromide at two other U.S. utilities.

  17. Large margin nearest neighbor classifiers.

    PubMed

    Domeniconi, Carlotta; Gunopulos, Dimitrios; Peng, Jing

    2005-07-01

    The nearest neighbor technique is a simple and appealing approach to addressing classification problems. It relies on the assumption of locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with a finite number of examples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. The employment of a locally adaptive metric becomes crucial in order to keep class conditional probabilities close to uniform, thereby minimizing the bias of estimates. We propose a technique that computes a locally flexible metric by means of support vector machines (SVMs). The decision function constructed by SVMs is used to determine the most discriminant direction in a neighborhood around the query. Such a direction provides a local feature weighting scheme. We formally show that our method increases the margin in the weighted space where classification takes place. Moreover, our method has the important advantage of online computational efficiency over competing locally adaptive techniques for nearest neighbor classification. We demonstrate the efficacy of our method using both real and simulated data.

  18. Detection of acute lymphocyte leukemia using k-nearest neighbor algorithm based on shape and histogram features

    NASA Astrophysics Data System (ADS)

    Purwanti, Endah; Calista, Evelyn

    2017-05-01

    Leukemia is a type of cancer which is caused by malignant neoplasms in leukocyte cells. Leukemia disease which can cause death quickly enough for the sufferer is a type of acute lymphocyte leukemia (ALL). In this study, we propose automatic detection of lymphocyte leukemia through classification of lymphocyte cell images obtained from peripheral blood smear single cell. There are two main objectives in this study. The first is to extract featuring cells. The second objective is to classify the lymphocyte cells into two classes, namely normal and abnormal lymphocytes. In conducting this study, we use combination of shape feature and histogram feature, and the classification algorithm is k-nearest Neighbour with k variation is 1, 3, 5, 7, 9, 11, 13, and 15. The best level of accuracy, sensitivity, and specificity in this study are 90%, 90%, and 90%, and they were obtained from combined features of area-perimeter-mean-standard deviation with k=7.

  19. Neural Network and Nearest Neighbor Algorithms for Enhancing Sampling of Molecular Dynamics.

    PubMed

    Galvelis, Raimondas; Sugita, Yuji

    2017-06-13

    The free energy calculations of complex chemical and biological systems with molecular dynamics (MD) are inefficient due to multiple local minima separated by high-energy barriers. The minima can be escaped using an enhanced sampling method such as metadynamics, which apply bias (i.e., importance sampling) along a set of collective variables (CV), but the maximum number of CVs (or dimensions) is severely limited. We propose a high-dimensional bias potential method (NN2B) based on two machine learning algorithms: the nearest neighbor density estimator (NNDE) and the artificial neural network (ANN) for the bias potential approximation. The bias potential is constructed iteratively from short biased MD simulations accounting for correlation among CVs. Our method is capable of achieving ergodic sampling and calculating free energy of polypeptides with up to 8-dimensional bias potential.

  20. Automated analysis of long-term grooming behavior in Drosophila using a k-nearest neighbors classifier

    PubMed Central

    Allen, Victoria W; Shirasu-Hiza, Mimi

    2018-01-01

    Despite being pervasive, the control of programmed grooming is poorly understood. We addressed this gap by developing a high-throughput platform that allows long-term detection of grooming in Drosophila melanogaster. In our method, a k-nearest neighbors algorithm automatically classifies fly behavior and finds grooming events with over 90% accuracy in diverse genotypes. Our data show that flies spend ~13% of their waking time grooming, driven largely by two major internal programs. One of these programs regulates the timing of grooming and involves the core circadian clock components cycle, clock, and period. The second program regulates the duration of grooming and, while dependent on cycle and clock, appears to be independent of period. This emerging dual control model in which one program controls timing and another controls duration, resembles the two-process regulatory model of sleep. Together, our quantitative approach presents the opportunity for further dissection of mechanisms controlling long-term grooming in Drosophila. PMID:29485401

  1. Missing value imputation for gene expression data by tailored nearest neighbors.

    PubMed

    Faisal, Shahla; Tutz, Gerhard

    2017-04-25

    High dimensional data like gene expression and RNA-sequences often contain missing values. The subsequent analysis and results based on these incomplete data can suffer strongly from the presence of these missing values. Several approaches to imputation of missing values in gene expression data have been developed but the task is difficult due to the high dimensionality (number of genes) of the data. Here an imputation procedure is proposed that uses weighted nearest neighbors. Instead of using nearest neighbors defined by a distance that includes all genes the distance is computed for genes that are apt to contribute to the accuracy of imputed values. The method aims at avoiding the curse of dimensionality, which typically occurs if local methods as nearest neighbors are applied in high dimensional settings. The proposed weighted nearest neighbors algorithm is compared to existing missing value imputation techniques like mean imputation, KNNimpute and the recently proposed imputation by random forests. We use RNA-sequence and microarray data from studies on human cancer to compare the performance of the methods. The results from simulations as well as real studies show that the weighted distance procedure can successfully handle missing values for high dimensional data structures where the number of predictors is larger than the number of samples. The method typically outperforms the considered competitors.

  2. Categorizing document by fuzzy C-Means and K-nearest neighbors approach

    NASA Astrophysics Data System (ADS)

    Priandini, Novita; Zaman, Badrus; Purwanti, Endah

    2017-08-01

    Increasing of technology had made categorizing documents become important. It caused by increasing of number of documents itself. Managing some documents by categorizing is one of Information Retrieval application, because it involve text mining on its process. Whereas, categorization technique could be done both Fuzzy C-Means (FCM) and K-Nearest Neighbors (KNN) method. This experiment would consolidate both methods. The aim of the experiment is increasing performance of document categorize. First, FCM is in order to clustering training documents. Second, KNN is in order to categorize testing document until the output of categorization is shown. Result of the experiment is 14 testing documents retrieve relevantly to its category. Meanwhile 6 of 20 testing documents retrieve irrelevant to its category. Result of system evaluation shows that both precision and recall are 0,7.

  3. A two-step nearest neighbors algorithm using satellite imagery for predicting forest structure within species composition classes

    Treesearch

    Ronald E. McRoberts

    2009-01-01

    Nearest neighbors techniques have been shown to be useful for predicting multiple forest attributes from forest inventory and Landsat satellite image data. However, in regions lacking good digital land cover information, nearest neighbors selected to predict continuous variables such as tree volume must be selected without regard to relevant categorical variables such...

  4. Conformal Prediction Based on K-Nearest Neighbors for Discrimination of Ginsengs by a Home-Made Electronic Nose

    PubMed Central

    Sun, Xiyang; Miao, Jiacheng; Wang, You; Luo, Zhiyuan; Li, Guang

    2017-01-01

    An estimate on the reliability of prediction in the applications of electronic nose is essential, which has not been paid enough attention. An algorithm framework called conformal prediction is introduced in this work for discriminating different kinds of ginsengs with a home-made electronic nose instrument. Nonconformity measure based on k-nearest neighbors (KNN) is implemented separately as underlying algorithm of conformal prediction. In offline mode, the conformal predictor achieves a classification rate of 84.44% based on 1NN and 80.63% based on 3NN, which is better than that of simple KNN. In addition, it provides an estimate of reliability for each prediction. In online mode, the validity of predictions is guaranteed, which means that the error rate of region predictions never exceeds the significance level set by a user. The potential of this framework for detecting borderline examples and outliers in the application of E-nose is also investigated. The result shows that conformal prediction is a promising framework for the application of electronic nose to make predictions with reliability and validity. PMID:28805721

  5. Diagnostic tools for nearest neighbors techniques when used with satellite imagery

    Treesearch

    Ronald E. McRoberts

    2009-01-01

    Nearest neighbors techniques are non-parametric approaches to multivariate prediction that are useful for predicting both continuous and categorical forest attribute variables. Although some assumptions underlying nearest neighbor techniques are common to other prediction techniques such as regression, other assumptions are unique to nearest neighbor techniques....

  6. Improved Fuzzy K-Nearest Neighbor Using Modified Particle Swarm Optimization

    NASA Astrophysics Data System (ADS)

    Jamaluddin; Siringoringo, Rimbun

    2017-12-01

    Fuzzy k-Nearest Neighbor (FkNN) is one of the most powerful classification methods. The presence of fuzzy concepts in this method successfully improves its performance on almost all classification issues. The main drawbackof FKNN is that it is difficult to determine the parameters. These parameters are the number of neighbors (k) and fuzzy strength (m). Both parameters are very sensitive. This makes it difficult to determine the values of ‘m’ and ‘k’, thus making FKNN difficult to control because no theories or guides can deduce how proper ‘m’ and ‘k’ should be. This study uses Modified Particle Swarm Optimization (MPSO) to determine the best value of ‘k’ and ‘m’. MPSO is focused on the Constriction Factor Method. Constriction Factor Method is an improvement of PSO in order to avoid local circumstances optima. The model proposed in this study was tested on the German Credit Dataset. The test of the data/The data test has been standardized by UCI Machine Learning Repository which is widely applied to classification problems. The application of MPSO to the determination of FKNN parameters is expected to increase the value of classification performance. Based on the experiments that have been done indicating that the model offered in this research results in a better classification performance compared to the Fk-NN model only. The model offered in this study has an accuracy rate of 81%, while. With using Fk-NN model, it has the accuracy of 70%. At the end is done comparison of research model superiority with 2 other classification models;such as Naive Bayes and Decision Tree. This research model has a better performance level, where Naive Bayes has accuracy 75%, and the decision tree model has 70%

  7. A novel method for the detection of R-peaks in ECG based on K-Nearest Neighbors and Particle Swarm Optimization

    NASA Astrophysics Data System (ADS)

    He, Runnan; Wang, Kuanquan; Li, Qince; Yuan, Yongfeng; Zhao, Na; Liu, Yang; Zhang, Henggui

    2017-12-01

    Cardiovascular diseases are associated with high morbidity and mortality. However, it is still a challenge to diagnose them accurately and efficiently. Electrocardiogram (ECG), a bioelectrical signal of the heart, provides crucial information about the dynamical functions of the heart, playing an important role in cardiac diagnosis. As the QRS complex in ECG is associated with ventricular depolarization, therefore, accurate QRS detection is vital for interpreting ECG features. In this paper, we proposed a real-time, accurate, and effective algorithm for QRS detection. In the algorithm, a proposed preprocessor with a band-pass filter was first applied to remove baseline wander and power-line interference from the signal. After denoising, a method combining K-Nearest Neighbor (KNN) and Particle Swarm Optimization (PSO) was used for accurate QRS detection in ECGs with different morphologies. The proposed algorithm was tested and validated using 48 ECG records from MIT-BIH arrhythmia database (MITDB), achieved a high averaged detection accuracy, sensitivity and positive predictivity of 99.43, 99.69, and 99.72%, respectively, indicating a notable improvement to extant algorithms as reported in literatures.

  8. The nearest neighbor and the bayes error rates.

    PubMed

    Loizou, G; Maybank, S J

    1987-02-01

    The (k, l) nearest neighbor method of pattern classification is compared to the Bayes method. If the two acceptance rates are equal then the asymptotic error rates satisfy the inequalities Ek,l + 1 ¿ E*(¿) ¿ Ek,l dE*(¿), where d is a function of k, l, and the number of pattern classes, and ¿ is the reject threshold for the Bayes method. An explicit expression for d is given which is optimal in the sense that for some probability distributions Ek,l and dE* (¿) are equal.

  9. The distance function effect on k-nearest neighbor classification for medical datasets.

    PubMed

    Hu, Li-Yu; Huang, Min-Wei; Ke, Shih-Wen; Tsai, Chih-Fong

    2016-01-01

    K-nearest neighbor (k-NN) classification is conventional non-parametric classifier, which has been used as the baseline classifier in many pattern classification problems. It is based on measuring the distances between the test data and each of the training data to decide the final classification output. Since the Euclidean distance function is the most widely used distance metric in k-NN, no study examines the classification performance of k-NN by different distance functions, especially for various medical domain problems. Therefore, the aim of this paper is to investigate whether the distance function can affect the k-NN performance over different medical datasets. Our experiments are based on three different types of medical datasets containing categorical, numerical, and mixed types of data and four different distance functions including Euclidean, cosine, Chi square, and Minkowsky are used during k-NN classification individually. The experimental results show that using the Chi square distance function is the best choice for the three different types of datasets. However, using the cosine and Euclidean (and Minkowsky) distance function perform the worst over the mixed type of datasets. In this paper, we demonstrate that the chosen distance function can affect the classification accuracy of the k-NN classifier. For the medical domain datasets including the categorical, numerical, and mixed types of data, K-NN based on the Chi square distance function performs the best.

  10. Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search.

    PubMed

    Liu, Xianglong; Deng, Cheng; Lang, Bo; Tao, Dacheng; Li, Xuelong

    2016-02-01

    Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare work studies the unified approach to constructing multiple informative hash tables using any type of hashing algorithms. Meanwhile, for multiple table search, it also lacks of a generic query-adaptive and fine-grained ranking scheme that can alleviate the binary quantization loss suffered in the standard hashing techniques. To solve the above problems, in this paper, we first regard the table construction as a selection problem over a set of candidate hash functions. With the graph representation of the function set, we propose an efficient solution that sequentially applies normalized dominant set to finding the most informative and independent hash functions for each table. To further reduce the redundancy between tables, we explore the reciprocal hash tables in a boosting manner, where the hash function graph is updated with high weights emphasized on the misclassified neighbor pairs of previous hash tables. To refine the ranking of the retrieved buckets within a certain Hamming radius from the query, we propose a query-adaptive bitwise weighting scheme to enable fine-grained bucket ranking in each hash table, exploiting the discriminative power of its hash functions and their complement for nearest neighbor search. Moreover, we integrate such scheme into the multiple table search using a fast, yet reciprocal table lookup algorithm within the adaptive weighted Hamming radius. In this paper, both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings. Our extensive experiments on several large-scale benchmarks demonstrate that the proposed techniques can significantly outperform both

  11. Nearest neighbor-density-based clustering methods for large hyperspectral images

    NASA Astrophysics Data System (ADS)

    Cariou, Claude; Chehdi, Kacem

    2017-10-01

    We address the problem of hyperspectral image (HSI) pixel partitioning using nearest neighbor - density-based (NN-DB) clustering methods. NN-DB methods are able to cluster objects without specifying the number of clusters to be found. Within the NN-DB approach, we focus on deterministic methods, e.g. ModeSeek, knnClust, and GWENN (standing for Graph WatershEd using Nearest Neighbors). These methods only require the availability of a k-nearest neighbor (kNN) graph based on a given distance metric. Recently, a new DB clustering method, called Density Peak Clustering (DPC), has received much attention, and kNN versions of it have quickly followed and showed their efficiency. However, NN-DB methods still suffer from the difficulty of obtaining the kNN graph due to the quadratic complexity with respect to the number of pixels. This is why GWENN was embedded into a multiresolution (MR) scheme to bypass the computation of the full kNN graph over the image pixels. In this communication, we propose to extent the MR-GWENN scheme on three aspects. Firstly, similarly to knnClust, the original labeling rule of GWENN is modified to account for local density values, in addition to the labels of previously processed objects. Secondly, we set up a modified NN search procedure within the MR scheme, in order to stabilize of the number of clusters found from the coarsest to the finest spatial resolution. Finally, we show that these extensions can be easily adapted to the three other NN-DB methods (ModeSeek, knnClust, knnDPC) for pixel clustering in large HSIs. Experiments are conducted to compare the four NN-DB methods for pixel clustering in HSIs. We show that NN-DB methods can outperform a classical clustering method such as fuzzy c-means (FCM), in terms of classification accuracy, relevance of found clusters, and clustering speed. Finally, we demonstrate the feasibility and evaluate the performances of NN-DB methods on a very large image acquired by our AISA Eagle hyperspectral

  12. Nearest Neighbor Searching in Binary Search Trees: Simulation of a Multiprocessor System.

    ERIC Educational Resources Information Center

    Stewart, Mark; Willett, Peter

    1987-01-01

    Describes the simulation of a nearest neighbor searching algorithm for document retrieval using a pool of microprocessors. Three techniques are described which allow parallel searching of a binary search tree as well as a PASCAL-based system, PASSIM, which can simulate these techniques. Fifty-six references are provided. (Author/LRW)

  13. Landscape-scale parameterization of a tree-level forest growth model: a k-nearest neighbor imputation approach incorporating LiDAR data

    Treesearch

    Michael J. Falkowski; Andrew T. Hudak; Nicholas L. Crookston; Paul E. Gessler; Edward H. Uebler; Alistair M. S. Smith

    2010-01-01

    Sustainable forest management requires timely, detailed forest inventory data across large areas, which is difficult to obtain via traditional forest inventory techniques. This study evaluated k-nearest neighbor imputation models incorporating LiDAR data to predict tree-level inventory data (individual tree height, diameter at breast height, and...

  14. Phase transitions in the antiferromagnetic Ising model on a body-centered cubic lattice with interactions between next-to-nearest neighbors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murtazaev, A. K.; Ramazanov, M. K., E-mail: sheikh77@mail.ru; Kassan-Ogly, F. A.

    2015-01-15

    Phase transitions in the antiferromagnetic Ising model on a body-centered cubic lattice are studied on the basis of the replica algorithm by the Monte Carlo method and histogram analysis taking into account the interaction of next-to-nearest neighbors. The phase diagram of the dependence of the critical temperature on the intensity of interaction of the next-to-nearest neighbors is constructed. It is found that a second-order phase transition is realized in this model in the investigated interval of the intensities of interaction of next-to-nearest neighbors.

  15. Improving GPU-accelerated adaptive IDW interpolation algorithm using fast kNN search.

    PubMed

    Mei, Gang; Xu, Nengxiong; Xu, Liangliang

    2016-01-01

    This paper presents an efficient parallel Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm on modern Graphics Processing Unit (GPU). The presented algorithm is an improvement of our previous GPU-accelerated AIDW algorithm by adopting fast k-nearest neighbors (kNN) search. In AIDW, it needs to find several nearest neighboring data points for each interpolated point to adaptively determine the power parameter; and then the desired prediction value of the interpolated point is obtained by weighted interpolating using the power parameter. In this work, we develop a fast kNN search approach based on the space-partitioning data structure, even grid, to improve the previous GPU-accelerated AIDW algorithm. The improved algorithm is composed of the stages of kNN search and weighted interpolating. To evaluate the performance of the improved algorithm, we perform five groups of experimental tests. The experimental results indicate: (1) the improved algorithm can achieve a speedup of up to 1017 over the corresponding serial algorithm; (2) the improved algorithm is at least two times faster than our previous GPU-accelerated AIDW algorithm; and (3) the utilization of fast kNN search can significantly improve the computational efficiency of the entire GPU-accelerated AIDW algorithm.

  16. Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.

    PubMed

    Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar

    2016-04-01

    Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data. Copyright © 2016 Elsevier Inc. All rights reserved.

  17. Remaining Useful Life Estimation of Insulated Gate Biploar Transistors (IGBTs) Based on a Novel Volterra k-Nearest Neighbor Optimally Pruned Extreme Learning Machine (VKOPP) Model Using Degradation Data

    PubMed Central

    Mei, Wenjuan; Zeng, Xianping; Yang, Chenglin; Zhou, Xiuyun

    2017-01-01

    The insulated gate bipolar transistor (IGBT) is a kind of excellent performance switching device used widely in power electronic systems. How to estimate the remaining useful life (RUL) of an IGBT to ensure the safety and reliability of the power electronics system is currently a challenging issue in the field of IGBT reliability. The aim of this paper is to develop a prognostic technique for estimating IGBTs’ RUL. There is a need for an efficient prognostic algorithm that is able to support in-situ decision-making. In this paper, a novel prediction model with a complete structure based on optimally pruned extreme learning machine (OPELM) and Volterra series is proposed to track the IGBT’s degradation trace and estimate its RUL; we refer to this model as Volterra k-nearest neighbor OPELM prediction (VKOPP) model. This model uses the minimum entropy rate method and Volterra series to reconstruct phase space for IGBTs’ ageing samples, and a new weight update algorithm, which can effectively reduce the influence of the outliers and noises, is utilized to establish the VKOPP network; then a combination of the k-nearest neighbor method (KNN) and least squares estimation (LSE) method is used to calculate the output weights of OPELM and predict the RUL of the IGBT. The prognostic results show that the proposed approach can predict the RUL of IGBT modules with small error and achieve higher prediction precision and lower time cost than some classic prediction approaches. PMID:29099811

  18. Classification Features of US Images Liver Extracted with Co-occurrence Matrix Using the Nearest Neighbor Algorithm

    NASA Astrophysics Data System (ADS)

    Moldovanu, Simona; Bibicu, Dorin; Moraru, Luminita; Nicolae, Mariana Carmen

    2011-12-01

    Co-occurrence matrix has been applied successfully for echographic images characterization because it contains information about spatial distribution of grey-scale levels in an image. The paper deals with the analysis of pixels in selected regions of interest of an US image of the liver. The useful information obtained refers to texture features such as entropy, contrast, dissimilarity and correlation extract with co-occurrence matrix. The analyzed US images were grouped in two distinct sets: healthy liver and steatosis (or fatty) liver. These two sets of echographic images of the liver build a database that includes only histological confirmed cases: 10 images of healthy liver and 10 images of steatosis liver. The healthy subjects help to compute four textural indices and as well as control dataset. We chose to study these diseases because the steatosis is the abnormal retention of lipids in cells. The texture features are statistical measures and they can be used to characterize irregularity of tissues. The goal is to extract the information using the Nearest Neighbor classification algorithm. The K-NN algorithm is a powerful tool to classify features textures by means of grouping in a training set using healthy liver, on the one hand, and in a holdout set using the features textures of steatosis liver, on the other hand. The results could be used to quantify the texture information and will allow a clear detection between health and steatosis liver.

  19. Emotion recognition from multichannel EEG signals using K-nearest neighbor classification.

    PubMed

    Li, Mi; Xu, Hongpei; Liu, Xingwang; Lu, Shengfu

    2018-04-27

    Many studies have been done on the emotion recognition based on multi-channel electroencephalogram (EEG) signals. This paper explores the influence of the emotion recognition accuracy of EEG signals in different frequency bands and different number of channels. We classified the emotional states in the valence and arousal dimensions using different combinations of EEG channels. Firstly, DEAP default preprocessed data were normalized. Next, EEG signals were divided into four frequency bands using discrete wavelet transform, and entropy and energy were calculated as features of K-nearest neighbor Classifier. The classification accuracies of the 10, 14, 18 and 32 EEG channels based on the Gamma frequency band were 89.54%, 92.28%, 93.72% and 95.70% in the valence dimension and 89.81%, 92.24%, 93.69% and 95.69% in the arousal dimension. As the number of channels increases, the classification accuracy of emotional states also increases, the classification accuracy of the gamma frequency band is greater than that of the beta frequency band followed by the alpha and theta frequency bands. This paper provided better frequency bands and channels reference for emotion recognition based on EEG.

  20. Finite element computation on nearest neighbor connected machines

    NASA Technical Reports Server (NTRS)

    Mcaulay, A. D.

    1984-01-01

    Research aimed at faster, more cost effective parallel machines and algorithms for improving designer productivity with finite element computations is discussed. A set of 8 boards, containing 4 nearest neighbor connected arrays of commercially available floating point chips and substantial memory, are inserted into a commercially available machine. One-tenth Mflop (64 bit operation) processors provide an 89% efficiency when solving the equations arising in a finite element problem for a single variable regular grid of size 40 by 40 by 40. This is approximately 15 to 20 times faster than a much more expensive machine such as a VAX 11/780 used in double precision. The efficiency falls off as faster or more processors are envisaged because communication times become dominant. A novel successive overrelaxation algorithm which uses cyclic reduction in order to permit data transfer and computation to overlap in time is proposed.

  1. Earthquake Declustering via a Nearest-Neighbor Approach in Space-Time-Magnitude Domain

    NASA Astrophysics Data System (ADS)

    Zaliapin, I. V.; Ben-Zion, Y.

    2016-12-01

    We propose a new method for earthquake declustering based on nearest-neighbor analysis of earthquakes in space-time-magnitude domain. The nearest-neighbor approach was recently applied to a variety of seismological problems that validate the general utility of the technique and reveal the existence of several different robust types of earthquake clusters. Notably, it was demonstrated that clustering associated with the largest earthquakes is statistically different from that of small-to-medium events. In particular, the characteristic bimodality of the nearest-neighbor distances that helps separating clustered and background events is often violated after the largest earthquakes in their vicinity, which is dominated by triggered events. This prevents using a simple threshold between the two modes of the nearest-neighbor distance distribution for declustering. The current study resolves this problem hence extending the nearest-neighbor approach to the problem of earthquake declustering. The proposed technique is applied to seismicity of different areas in California (San Jacinto, Coso, Salton Sea, Parkfield, Ventura, Mojave, etc.), as well as to the global seismicity, to demonstrate its stability and efficiency in treating various clustering types. The results are compared with those of alternative declustering methods.

  2. Nearest Neighbor Interactions Affect the Conformational Distribution in the Unfolded State of Peptides

    NASA Astrophysics Data System (ADS)

    Toal, Siobhan; Schweitzer-Stenner, Reinhard; Rybka, Karin; Schwalbe, Hardol

    2013-03-01

    In order to enable structural predictions of intrinsically disordered proteins (IDPs) the intrinsic conformational propensities of amino acids must be complimented by information on nearest-neighbor interactions. To explore the influence of nearest-neighbors on conformational distributions, we preformed a joint vibrational (Infrared, Vibrational Circular Dichroism (VCD), polarized Raman) and 2D-NMR study of selected GxyG host-guest peptides: GDyG, GSyG, GxLG, GxVG, where x/y ={A,K,LV}. D and S (L and V) were chosen at the x (y) position due to their observance to drastically change the distribution of alanine in xAy tripeptide sequences in truncated coil libraries. The conformationally sensitive amide' profiles of the respective spectra were analyzed in terms of a statistical ensemble described as a superposition of 2D-Gaussian functions in Ramachandran space representing sub-ensembles of pPII-, β-strand-, helical-, and turn-like conformations. Our analysis and simulation of the amide I' band profiles exploits excitonic coupling between the local amide I' vibrational modes in the tetra-peptides. The resulting distributions reveal that D and S, which themselves have high propensities for turn-structures, strongly affect the conformational distribution of their downstream neighbor. Taken together, our results indicate that Dx and Sx motifs might act as conformational randomizers in proteins, attenuating intrinsic propensities of neighboring residues. Overall, our results show that nearest neighbor interactions contribute significantly to the Gibbs energy landscape of disordered peptides and proteins.

  3. RRAM-based parallel computing architecture using k-nearest neighbor classification for pattern recognition

    NASA Astrophysics Data System (ADS)

    Jiang, Yuning; Kang, Jinfeng; Wang, Xinan

    2017-03-01

    Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.

  4. Integrating instance selection, instance weighting, and feature weighting for nearest neighbor classifiers by coevolutionary algorithms.

    PubMed

    Derrac, Joaquín; Triguero, Isaac; Garcia, Salvador; Herrera, Francisco

    2012-10-01

    Cooperative coevolution is a successful trend of evolutionary computation which allows us to define partitions of the domain of a given problem, or to integrate several related techniques into one, by the use of evolutionary algorithms. It is possible to apply it to the development of advanced classification methods, which integrate several machine learning techniques into a single proposal. A novel approach integrating instance selection, instance weighting, and feature weighting into the framework of a coevolutionary model is presented in this paper. We compare it with a wide range of evolutionary and nonevolutionary related methods, in order to show the benefits of the employment of coevolution to apply the techniques considered simultaneously. The results obtained, contrasted through nonparametric statistical tests, show that our proposal outperforms other methods in the comparison, thus becoming a suitable tool in the task of enhancing the nearest neighbor classifier.

  5. Quantum realization of the nearest-neighbor interpolation method for FRQI and NEQR

    NASA Astrophysics Data System (ADS)

    Sang, Jianzhi; Wang, Shen; Niu, Xiamu

    2016-01-01

    This paper is concerned with the feasibility of the classical nearest-neighbor interpolation based on flexible representation of quantum images (FRQI) and novel enhanced quantum representation (NEQR). Firstly, the feasibility of the classical image nearest-neighbor interpolation for quantum images of FRQI and NEQR is proven. Then, by defining the halving operation and by making use of quantum rotation gates, the concrete quantum circuit of the nearest-neighbor interpolation for FRQI is designed for the first time. Furthermore, quantum circuit of the nearest-neighbor interpolation for NEQR is given. The merit of the proposed NEQR circuit lies in their low complexity, which is achieved by utilizing the halving operation and the quantum oracle operator. Finally, in order to further improve the performance of the former circuits, new interpolation circuits for FRQI and NEQR are presented by using Control-NOT gates instead of a halving operation. Simulation results show the effectiveness of the proposed circuits.

  6. Enhanced Approximate Nearest Neighbor via Local Area Focused Search.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gonzales, Antonio; Blazier, Nicholas Paul

    Approximate Nearest Neighbor (ANN) algorithms are increasingly important in machine learning, data mining, and image processing applications. There is a large family of space- partitioning ANN algorithms, such as randomized KD-Trees, that work well in practice but are limited by an exponential increase in similarity comparisons required to optimize recall. Additionally, they only support a small set of similarity metrics. We present Local Area Fo- cused Search (LAFS), a method that enhances the way queries are performed using an existing ANN index. Instead of a single query, LAFS performs a number of smaller (fewer similarity comparisons) queries and focuses onmore » a local neighborhood which is refined as candidates are identified. We show that our technique improves performance on several well known datasets and is easily extended to general similarity metrics using kernel projection techniques.« less

  7. Geometric k-nearest neighbor estimation of entropy and mutual information

    NASA Astrophysics Data System (ADS)

    Lord, Warren M.; Sun, Jie; Bollt, Erik M.

    2018-03-01

    Nonparametric estimation of mutual information is used in a wide range of scientific problems to quantify dependence between variables. The k-nearest neighbor (knn) methods are consistent, and therefore expected to work well for a large sample size. These methods use geometrically regular local volume elements. This practice allows maximum localization of the volume elements, but can also induce a bias due to a poor description of the local geometry of the underlying probability measure. We introduce a new class of knn estimators that we call geometric knn estimators (g-knn), which use more complex local volume elements to better model the local geometry of the probability measures. As an example of this class of estimators, we develop a g-knn estimator of entropy and mutual information based on elliptical volume elements, capturing the local stretching and compression common to a wide range of dynamical system attractors. A series of numerical examples in which the thickness of the underlying distribution and the sample sizes are varied suggest that local geometry is a source of problems for knn methods such as the Kraskov-Stögbauer-Grassberger estimator when local geometric effects cannot be removed by global preprocessing of the data. The g-knn method performs well despite the manipulation of the local geometry. In addition, the examples suggest that the g-knn estimators can be of particular relevance to applications in which the system is large, but the data size is limited.

  8. Multi-spectral brain tissue segmentation using automatically trained k-Nearest-Neighbor classification.

    PubMed

    Vrooman, Henri A; Cocosco, Chris A; van der Lijn, Fedde; Stokking, Rik; Ikram, M Arfan; Vernooij, Meike W; Breteler, Monique M B; Niessen, Wiro J

    2007-08-01

    Conventional k-Nearest-Neighbor (kNN) classification, which has been successfully applied to classify brain tissue in MR data, requires training on manually labeled subjects. This manual labeling is a laborious and time-consuming procedure. In this work, a new fully automated brain tissue classification procedure is presented, in which kNN training is automated. This is achieved by non-rigidly registering the MR data with a tissue probability atlas to automatically select training samples, followed by a post-processing step to keep the most reliable samples. The accuracy of the new method was compared to rigid registration-based training and to conventional kNN-based segmentation using training on manually labeled subjects for segmenting gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) in 12 data sets. Furthermore, for all classification methods, the performance was assessed when varying the free parameters. Finally, the robustness of the fully automated procedure was evaluated on 59 subjects. The automated training method using non-rigid registration with a tissue probability atlas was significantly more accurate than rigid registration. For both automated training using non-rigid registration and for the manually trained kNN classifier, the difference with the manual labeling by observers was not significantly larger than inter-observer variability for all tissue types. From the robustness study, it was clear that, given an appropriate brain atlas and optimal parameters, our new fully automated, non-rigid registration-based method gives accurate and robust segmentation results. A similarity index was used for comparison with manually trained kNN. The similarity indices were 0.93, 0.92 and 0.92, for CSF, GM and WM, respectively. It can be concluded that our fully automated method using non-rigid registration may replace manual segmentation, and thus that automated brain tissue segmentation without laborious manual training is feasible.

  9. Self-Organizing Map Neural Network-Based Nearest Neighbor Position Estimation Scheme for Continuous Crystal PET Detectors

    NASA Astrophysics Data System (ADS)

    Wang, Yonggang; Li, Deng; Lu, Xiaoming; Cheng, Xinyi; Wang, Liwei

    2014-10-01

    Continuous crystal-based positron emission tomography (PET) detectors could be an ideal alternative for current high-resolution pixelated PET detectors if the issues of high performance γ interaction position estimation and its real-time implementation are solved. Unfortunately, existing position estimators are not very feasible for implementation on field-programmable gate array (FPGA). In this paper, we propose a new self-organizing map neural network-based nearest neighbor (SOM-NN) positioning scheme aiming not only at providing high performance, but also at being realistic for FPGA implementation. Benefitting from the SOM feature mapping mechanism, the large set of input reference events at each calibration position is approximated by a small set of prototypes, and the computation of the nearest neighbor searching for unknown events is largely reduced. Using our experimental data, the scheme was evaluated, optimized and compared with the smoothed k-NN method. The spatial resolutions of full-width-at-half-maximum (FWHM) of both methods averaged over the center axis of the detector were obtained as 1.87 ±0.17 mm and 1.92 ±0.09 mm, respectively. The test results show that the SOM-NN scheme has an equivalent positioning performance with the smoothed k-NN method, but the amount of computation is only about one-tenth of the smoothed k-NN method. In addition, the algorithm structure of the SOM-NN scheme is more feasible for implementation on FPGA. It has the potential to realize real-time position estimation on an FPGA with a high-event processing throughput.

  10. Analysis of miRNA expression profile based on SVM algorithm

    NASA Astrophysics Data System (ADS)

    Ting-ting, Dai; Chang-ji, Shan; Yan-shou, Dong; Yi-duo, Bian

    2018-05-01

    Based on mirna expression spectrum data set, a new data mining algorithm - tSVM - KNN (t statistic with support vector machine - k nearest neighbor) is proposed. the idea of the algorithm is: firstly, the feature selection of the data set is carried out by the unified measurement method; Secondly, SVM - KNN algorithm, which combines support vector machine (SVM) and k - nearest neighbor (k - nearest neighbor) is used as classifier. Simulation results show that SVM - KNN algorithm has better classification ability than SVM and KNN alone. Tsvm - KNN algorithm only needs 5 mirnas to obtain 96.08 % classification accuracy in terms of the number of mirna " tags" and recognition accuracy. compared with similar algorithms, tsvm - KNN algorithm has obvious advantages.

  11. Reverse Nearest Neighbor Search on a Protein-Protein Interaction Network to Infer Protein-Disease Associations.

    PubMed

    Suratanee, Apichat; Plaimas, Kitiporn

    2017-01-01

    The associations between proteins and diseases are crucial information for investigating pathological mechanisms. However, the number of known and reliable protein-disease associations is quite small. In this study, an analysis framework to infer associations between proteins and diseases was developed based on a large data set of a human protein-protein interaction network integrating an effective network search, namely, the reverse k -nearest neighbor (R k NN) search. The R k NN search was used to identify an impact of a protein on other proteins. Then, associations between proteins and diseases were inferred statistically. The method using the R k NN search yielded a much higher precision than a random selection, standard nearest neighbor search, or when applying the method to a random protein-protein interaction network. All protein-disease pair candidates were verified by a literature search. Supporting evidence for 596 pairs was identified. In addition, cluster analysis of these candidates revealed 10 promising groups of diseases to be further investigated experimentally. This method can be used to identify novel associations to better understand complex relationships between proteins and diseases.

  12. Modeling Gas and Gas Hydrate Accumulation in Marine Sediments Using a K-Nearest Neighbor Machine-Learning Technique

    NASA Astrophysics Data System (ADS)

    Wood, W. T.; Runyan, T. E.; Palmsten, M.; Dale, J.; Crawford, C.

    2016-12-01

    Natural Gas (primarily methane) and gas hydrate accumulations require certain bio-geochemical, as well as physical conditions, some of which are poorly sampled and/or poorly understood. We exploit recent advances in the prediction of seafloor porosity and heat flux via machine learning techniques (e.g. Random forests and Bayesian networks) to predict the occurrence of gas and subsequently gas hydrate in marine sediments. The prediction (actually guided interpolation) of key parameters we use in this study is a K-nearest neighbor technique. KNN requires only minimal pre-processing of the data and predictors, and requires minimal run-time input so the results are almost entirely data-driven. Specifically we use new estimates of sedimentation rate and sediment type, along with recently derived compaction modeling to estimate profiles of porosity and age. We combined the compaction with seafloor heat flux to estimate temperature with depth and geologic age, which, with estimates of organic carbon, and models of methanogenesis yield limits on the production of methane. Results include geospatial predictions of gas (and gas hydrate) accumulations, with quantitative estimates of uncertainty. The Generic Earth Modeling System (GEMS) we have developed to derive the machine learning estimates is modular and easily updated with new algorithms or data.

  13. Nearest-neighbor thermodynamics of deoxyinosine pairs in DNA duplexes

    PubMed Central

    Watkins, Norman E.; SantaLucia, John

    2005-01-01

    Nearest-neighbor thermodynamic parameters of the ‘universal pairing base’ deoxyinosine were determined for the pairs I·C, I·A, I·T, I·G and I·I adjacent to G·C and A·T pairs. Ultraviolet absorbance melting curves were measured and non-linear regression performed on 84 oligonucleotide duplexes with 9 or 12 bp lengths. These data were combined with data for 13 inosine containing duplexes from the literature. Multiple linear regression was used to solve for the 32 nearest-neighbor unknowns. The parameters predict the Tm for all sequences within 1.2°C on average. The general trend in decreasing stability is I·C > I·A > I·T ≈ I· G > I·I. The stability trend for the base pair 5′ of the I·X pair is G·C > C·G > A·T > T·A. The stability trend for the base pair 3′ of I·X is the same. These trends indicate a complex interplay between H-bonding, nearest-neighbor stacking, and mismatch geometry. A survey of 14 tandem inosine pairs and 8 tandem self-complementary inosine pairs is also provided. These results may be used in the design of degenerate PCR primers and for degenerate microarray probes. PMID:16264087

  14. Classification of Parkinson's disease utilizing multi-edit nearest-neighbor and ensemble learning algorithms with speech samples.

    PubMed

    Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang

    2016-11-16

    The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.

  15. Hash Bit Selection for Nearest Neighbor Search.

    PubMed

    Xianglong Liu; Junfeng He; Shih-Fu Chang

    2017-11-01

    To overcome the barrier of storage and computation when dealing with gigantic-scale data sets, compact hashing has been studied extensively to approximate the nearest neighbor search. Despite the recent advances, critical design issues remain open in how to select the right features, hashing algorithms, and/or parameter settings. In this paper, we address these by posing an optimal hash bit selection problem, in which an optimal subset of hash bits are selected from a pool of candidate bits generated by different features, algorithms, or parameters. Inspired by the optimization criteria used in existing hashing algorithms, we adopt the bit reliability and their complementarity as the selection criteria that can be carefully tailored for hashing performance in different tasks. Then, the bit selection solution is discovered by finding the best tradeoff between search accuracy and time using a modified dynamic programming method. To further reduce the computational complexity, we employ the pairwise relationship among hash bits to approximate the high-order independence property, and formulate it as an efficient quadratic programming method that is theoretically equivalent to the normalized dominant set problem in a vertex- and edge-weighted graph. Extensive large-scale experiments have been conducted under several important application scenarios of hash techniques, where our bit selection framework can achieve superior performance over both the naive selection methods and the state-of-the-art hashing algorithms, with significant accuracy gains ranging from 10% to 50%, relatively.

  16. The Effective Resistance of the -Cycle Graph with Four Nearest Neighbors

    NASA Astrophysics Data System (ADS)

    Chair, Noureddine

    2014-02-01

    The exact expression for the effective resistance between any two vertices of the -cycle graph with four nearest neighbors , is given. It turns out that this expression is written in terms of the effective resistance of the -cycle graph , the square of the Fibonacci numbers, and the bisected Fibonacci numbers. As a consequence closed form formulas for the total effective resistance, the first passage time, and the mean first passage time for the simple random walk on the the -cycle graph with four nearest neighbors are obtained. Finally, a closed form formula for the effective resistance of with all first neighbors removed is obtained.

  17. Nearest unlike neighbor (NUN): an aid to decision confidence estimation

    NASA Astrophysics Data System (ADS)

    Dasarathy, Belur V.

    1995-09-01

    The concept of nearest unlike neighbor (NUN), proposed and explored previously in the design of nearest neighbor (NN) based decision systems, is further exploited in this study to develop a measure of confidence in the decisions made by NN-based decision systems. This measure of confidence, on the basis of comparison with a user-defined threshold, may be used to determine the acceptability of the decision provided by the NN-based decision system. The concepts, associated methodology, and some illustrative numerical examples using the now classical Iris data to bring out the ease of implementation and effectiveness of the proposed innovations are presented.

  18. Streamflow variability and classification using false nearest neighbor method

    NASA Astrophysics Data System (ADS)

    Vignesh, R.; Jothiprakash, V.; Sivakumar, B.

    2015-12-01

    Understanding regional streamflow dynamics and patterns continues to be a challenging problem. The present study introduces the false nearest neighbor (FNN) algorithm, a nonlinear dynamic-based method, to examine the spatial variability of streamflow over a region. The FNN method is a dimensionality-based approach, where the dimension of the time series represents its variability. The method uses phase space reconstruction and nearest neighbor concepts, and identifies false neighbors in the reconstructed phase space. The FNN method is applied to monthly streamflow data monitored over a period of 53 years (1950-2002) in an extensive network of 639 stations in the contiguous United States (US). Since selection of delay time in phase space reconstruction may influence the FNN outcomes, analysis is carried out for five different delay time values: monthly, seasonal, and annual separation of data as well as delay time values obtained using autocorrelation function (ACF) and average mutual information (AMI) methods. The FNN dimensions for the 639 streamflow series are generally identified to range from 4 to 12 (with very few exceptional cases), indicating a wide range of variability in the dynamics of streamflow across the contiguous US. However, the FNN dimensions for a majority of the streamflow series are found to be low (less than or equal to 6), suggesting low level of complexity in streamflow dynamics in most of the individual stations and over many sub-regions. The FNN dimension estimates also reveal that streamflow dynamics in the western parts of the US (including far west, northwestern, and southwestern parts) generally exhibit much greater variability compared to that in the eastern parts of the US (including far east, northeastern, and southeastern parts), although there are also differences among 'pockets' within these regions. These results are useful for identification of appropriate model complexity at individual stations, patterns across regions and sub

  19. On the consistency between nearest-neighbor peridynamic discretizations and discretized classical elasticity models

    DOE PAGES

    Seleson, Pablo; Du, Qiang; Parks, Michael L.

    2016-08-16

    The peridynamic theory of solid mechanics is a nonlocal reformulation of the classical continuum mechanics theory. At the continuum level, it has been demonstrated that classical (local) elasticity is a special case of peridynamics. Such a connection between these theories has not been extensively explored at the discrete level. This paper investigates the consistency between nearest-neighbor discretizations of linear elastic peridynamic models and finite difference discretizations of the Navier–Cauchy equation of classical elasticity. While nearest-neighbor discretizations in peridynamics have been numerically observed to present grid-dependent crack paths or spurious microcracks, this paper focuses on a different, analytical aspect of suchmore » discretizations. We demonstrate that, even in the absence of cracks, such discretizations may be problematic unless a proper selection of weights is used. Specifically, we demonstrate that using the standard meshfree approach in peridynamics, nearest-neighbor discretizations do not reduce, in general, to discretizations of corresponding classical models. We study nodal-based quadratures for the discretization of peridynamic models, and we derive quadrature weights that result in consistency between nearest-neighbor discretizations of peridynamic models and discretized classical models. The quadrature weights that lead to such consistency are, however, model-/discretization-dependent. We motivate the choice of those quadrature weights through a quadratic approximation of displacement fields. The stability of nearest-neighbor peridynamic schemes is demonstrated through a Fourier mode analysis. Finally, an approach based on a normalization of peridynamic constitutive constants at the discrete level is explored. This approach results in the desired consistency for one-dimensional models, but does not work in higher dimensions. The results of the work presented in this paper suggest that even though nearest-neighbor

  20. Error minimizing algorithms for nearest eighbor classifiers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Porter, Reid B; Hush, Don; Zimmer, G. Beate

    2011-01-03

    Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. Wemore » use the framework to investigate a new cost sensitive loss function that allows us to train a Nearest Neighbor type classifier for low false alarm rate applications. We report results on both synthetic data and real-world image data.« less

  1. Improving RNA nearest neighbor parameters for helices by going beyond the two-state model.

    PubMed

    Spasic, Aleksandar; Berger, Kyle D; Chen, Jonathan L; Seetin, Matthew G; Turner, Douglas H; Mathews, David H

    2018-06-01

    RNA folding free energy change nearest neighbor parameters are widely used to predict folding stabilities of secondary structures. They were determined by linear regression to datasets of optical melting experiments on small model systems. Traditionally, the optical melting experiments are analyzed assuming a two-state model, i.e. a structure is either complete or denatured. Experimental evidence, however, shows that structures exist in an ensemble of conformations. Partition functions calculated with existing nearest neighbor parameters predict that secondary structures can be partially denatured, which also directly conflicts with the two-state model. Here, a new approach for determining RNA nearest neighbor parameters is presented. Available optical melting data for 34 Watson-Crick helices were fit directly to a partition function model that allows an ensemble of conformations. Fitting parameters were the enthalpy and entropy changes for helix initiation, terminal AU pairs, stacks of Watson-Crick pairs and disordered internal loops. The resulting set of nearest neighbor parameters shows a 38.5% improvement in the sum of residuals in fitting the experimental melting curves compared to the current literature set.

  2. Estimating forest attribute parameters for small areas using nearest neighbors techniques

    Treesearch

    Ronald E. McRoberts

    2012-01-01

    Nearest neighbors techniques have become extremely popular, particularly for use with forest inventory data. With these techniques, a population unit prediction is calculated as a linear combination of observations for a selected number of population units in a sample that are most similar, or nearest, in a space of ancillary variables to the population unit requiring...

  3. Colorectal Cancer and Colitis Diagnosis Using Fourier Transform Infrared Spectroscopy and an Improved K-Nearest-Neighbour Classifier.

    PubMed

    Li, Qingbo; Hao, Can; Kang, Xue; Zhang, Jialin; Sun, Xuejun; Wang, Wenbo; Zeng, Haishan

    2017-11-27

    Combining Fourier transform infrared spectroscopy (FTIR) with endoscopy, it is expected that noninvasive, rapid detection of colorectal cancer can be performed in vivo in the future. In this study, Fourier transform infrared spectra were collected from 88 endoscopic biopsy colorectal tissue samples (41 colitis and 47 cancers). A new method, viz., entropy weight local-hyperplane k-nearest-neighbor (EWHK), which is an improved version of K-local hyperplane distance nearest-neighbor (HKNN), is proposed for tissue classification. In order to avoid limiting high dimensions and small values of the nearest neighbor, the new EWHK method calculates feature weights based on information entropy. The average results of the random classification showed that the EWHK classifier for differentiating cancer from colitis samples produced a sensitivity of 81.38% and a specificity of 92.69%.

  4. OCR enhancement through neighbor embedding and fast approximate nearest neighbors

    NASA Astrophysics Data System (ADS)

    Smith, D. C.

    2012-10-01

    Generic optical character recognition (OCR) engines often perform very poorly in transcribing scanned low resolution (LR) text documents. To improve OCR performance, we apply the Neighbor Embedding (NE) single-image super-resolution (SISR) technique to LR scanned text documents to obtain high resolution (HR) versions, which we subsequently process with OCR. For comparison, we repeat this procedure using bicubic interpolation (BI). We demonstrate that mean-square errors (MSE) in NE HR estimates do not increase substantially when NE is trained in one Latin font style and tested in another, provided both styles belong to the same font category (serif or sans serif). This is very important in practice, since for each font size, the number of training sets required for each category may be reduced from dozens to just one. We also incorporate randomized k-d trees into our NE implementation to perform approximate nearest neighbor search, and obtain a 1000x speed up of our original NE implementation, with negligible MSE degradation. This acceleration also made it practical to combine all of our size-specific NE Latin models into a single Universal Latin Model (ULM). The ULM eliminates the need to determine the unknown font category and size of an input LR text document and match it to an appropriate model, a very challenging task, since the dpi (pixels per inch) of the input LR image is generally unknown. Our experiments show that OCR character error rates (CER) were over 90% when we applied the Tesseract OCR engine to LR text documents (scanned at 75 dpi and 100 dpi) in the 6-10 pt range. By contrast, using k-d trees and the ULM, CER after NE preprocessing averaged less than 7% at 3x (100 dpi LR scanning) and 4x (75 dpi LR scanning) magnification, over an order of magnitude improvement. Moreover, CER after NE preprocessing was more that 6 times lower on average than after BI preprocessing.

  5. Nearest-neighbor Kitaev exchange blocked by charge order in electron-doped α -RuCl3

    NASA Astrophysics Data System (ADS)

    Koitzsch, A.; Habenicht, C.; Müller, E.; Knupfer, M.; Büchner, B.; Kretschmer, S.; Richter, M.; van den Brink, J.; Börrnert, F.; Nowak, D.; Isaeva, A.; Doert, Th.

    2017-10-01

    A quantum spin liquid might be realized in α -RuCl3 , a honeycomb-lattice magnetic material with substantial spin-orbit coupling. Moreover, α -RuCl3 is a Mott insulator, which implies the possibility that novel exotic phases occur upon doping. Here, we study the electronic structure of this material when intercalated with potassium by photoemission spectroscopy, electron energy loss spectroscopy, and density functional theory calculations. We obtain a stable stoichiometry at K0.5RuCl3 . This gives rise to a peculiar charge disproportionation into formally Ru2 + (4 d6 ) and Ru3 + (4 d5 ). Every Ru 4 d5 site with one hole in the t2 g shell is surrounded by nearest neighbors of 4 d6 character, where the t2 g level is full and magnetically inert. Thus, each type of Ru site forms a triangular lattice, and nearest-neighbor interactions of the original honeycomb are blocked.

  6. Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets.

    PubMed

    Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O; Gelfand, Alan E

    2016-01-01

    Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online.

  7. Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets

    PubMed Central

    Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O.; Gelfand, Alan E.

    2018-01-01

    Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online. PMID:29720777

  8. Nearest neighbor 3D segmentation with context features

    NASA Astrophysics Data System (ADS)

    Hristova, Evelin; Schulz, Heinrich; Brosch, Tom; Heinrich, Mattias P.; Nickisch, Hannes

    2018-03-01

    Automated and fast multi-label segmentation of medical images is challenging and clinically important. This paper builds upon a supervised machine learning framework that uses training data sets with dense organ annotations and vantage point trees to classify voxels in unseen images based on similarity of binary feature vectors extracted from the data. Without explicit model knowledge, the algorithm is applicable to different modalities and organs, and achieves high accuracy. The method is successfully tested on 70 abdominal CT and 42 pelvic MR images. With respect to ground truth, an average Dice overlap score of 0.76 for the CT segmentation of liver, spleen and kidneys is achieved. The mean score for the MR delineation of bladder, bones, prostate and rectum is 0.65. Additionally, we benchmark several variations of the main components of the method and reduce the computation time by up to 47% without significant loss of accuracy. The segmentation results are - for a nearest neighbor method - surprisingly accurate, robust as well as data and time efficient.

  9. Estimation of Carcinogenicity using Hierarchical Clustering and Nearest Neighbor Methodologies

    EPA Science Inventory

    Previously a hierarchical clustering (HC) approach and a nearest neighbor (NN) approach were developed to model acute aquatic toxicity end points. These approaches were developed to correlate the toxicity for large, noncongeneric data sets. In this study these approaches applie...

  10. GPU based cloud system for high-performance arrhythmia detection with parallel k-NN algorithm.

    PubMed

    Tae Joon Jun; Hyun Ji Park; Hyuk Yoo; Young-Hak Kim; Daeyoung Kim

    2016-08-01

    In this paper, we propose an GPU based Cloud system for high-performance arrhythmia detection. Pan-Tompkins algorithm is used for QRS detection and we optimized beat classification algorithm with K-Nearest Neighbor (K-NN). To support high performance beat classification on the system, we parallelized beat classification algorithm with CUDA to execute the algorithm on virtualized GPU devices on the Cloud system. MIT-BIH Arrhythmia database is used for validation of the algorithm. The system achieved about 93.5% of detection rate which is comparable to previous researches while our algorithm shows 2.5 times faster execution time compared to CPU only detection algorithm.

  11. Collective Behaviors of Mobile Robots Beyond the Nearest Neighbor Rules With Switching Topology.

    PubMed

    Ning, Boda; Han, Qing-Long; Zuo, Zongyu; Jin, Jiong; Zheng, Jinchuan

    2018-05-01

    This paper is concerned with the collective behaviors of robots beyond the nearest neighbor rules, i.e., dispersion and flocking, when robots interact with others by applying an acute angle test (AAT)-based interaction rule. Different from a conventional nearest neighbor rule or its variations, the AAT-based interaction rule allows interactions with some far-neighbors and excludes unnecessary nearest neighbors. The resulting dispersion and flocking hold the advantages of scalability, connectivity, robustness, and effective area coverage. For the dispersion, a spring-like controller is proposed to achieve collision-free coordination. With switching topology, a new fixed-time consensus-based energy function is developed to guarantee the system stability. An upper bound of settling time for energy consensus is obtained, and a uniform time interval is accordingly set so that energy distribution is conducted in a fair manner. For the flocking, based on a class of generalized potential functions taking nonsmooth switching into account, a new controller is proposed to ensure that the same velocity for all robots is eventually reached. A co-optimizing problem is further investigated to accomplish additional tasks, such as enhancing communication performance, while maintaining the collective behaviors of mobile robots. Simulation results are presented to show the effectiveness of the theoretical results.

  12. Competing growth processes induced by next-nearest-neighbor interactions: Effects on meandering wavelength and stiffness

    NASA Astrophysics Data System (ADS)

    Blel, Sonia; Hamouda, Ajmi BH.; Mahjoub, B.; Einstein, T. L.

    2017-02-01

    In this paper we explore the meandering instability of vicinal steps with a kinetic Monte Carlo simulations (kMC) model including the attractive next-nearest-neighbor (NNN) interactions. kMC simulations show that increase of the NNN interaction strength leads to considerable reduction of the meandering wavelength and to weaker dependence of the wavelength on the deposition rate F. The dependences of the meandering wavelength on the temperature and the deposition rate obtained with simulations are in good quantitative agreement with the experimental result on the meandering instability of Cu(0 2 24) [T. Maroutian et al., Phys. Rev. B 64, 165401 (2001), 10.1103/PhysRevB.64.165401]. The effective step stiffness is found to depend not only on the strength of NNN interactions and the Ehrlich-Schwoebel barrier, but also on F. We argue that attractive NNN interactions intensify the incorporation of adatoms at step edges and enhance step roughening. Competition between NNN and nearest-neighbor interactions results in an alternative form of meandering instability which we call "roughening-limited" growth, rather than attachment-detachment-limited growth that governs the Bales-Zangwill instability. The computed effective wavelength and the effective stiffness behave as λeff˜F-q and β˜eff˜F-p , respectively, with q ≈p /2 .

  13. Multi-strategy based quantum cost reduction of linear nearest-neighbor quantum circuit

    NASA Astrophysics Data System (ADS)

    Tan, Ying-ying; Cheng, Xue-yun; Guan, Zhi-jin; Liu, Yang; Ma, Haiying

    2018-03-01

    With the development of reversible and quantum computing, study of reversible and quantum circuits has also developed rapidly. Due to physical constraints, most quantum circuits require quantum gates to interact on adjacent quantum bits. However, many existing quantum circuits nearest-neighbor have large quantum cost. Therefore, how to effectively reduce quantum cost is becoming a popular research topic. In this paper, we proposed multiple optimization strategies to reduce the quantum cost of the circuit, that is, we reduce quantum cost from MCT gates decomposition, nearest neighbor and circuit simplification, respectively. The experimental results show that the proposed strategies can effectively reduce the quantum cost, and the maximum optimization rate is 30.61% compared to the corresponding results.

  14. Implementation of Nearest Neighbor using HSV to Identify Skin Disease

    NASA Astrophysics Data System (ADS)

    Gerhana, Y. A.; Zulfikar, W. B.; Ramdani, A. H.; Ramdhani, M. A.

    2018-01-01

    Today, Android is one of the most widely used operating system in the world. Most of android device has a camera that could capture an image, this feature could be optimized to identify skin disease. The disease is one of health problem caused by bacterium, fungi, and virus. The symptoms of skin disease usually visible. In this work, the symptoms that captured as image contains HSV in every pixel of the image. HSV can extracted and then calculate to earn euclidean value. The value compared using nearest neighbor algorithm to discover closer value between image testing and image training to get highest value that decide class label or type of skin disease. The testing result show that 166 of 200 or about 80% is accurate. There are some reasons that influence the result of classification model like number of image training and quality of android device’s camera.

  15. Efficiency of encounter-controlled reaction between diffusing reactants in a finite lattice: Non-nearest-neighbor effects

    NASA Astrophysics Data System (ADS)

    Bentz, Jonathan L.; Kozak, John J.; Nicolis, Gregoire

    2005-08-01

    The influence of non-nearest-neighbor displacements on the efficiency of diffusion-reaction processes involving one and two mobile diffusing reactants is studied. An exact analytic result is given for dimension d=1 from which, for large lattices, one can recover the asymptotic estimate reported 30 years ago by Lakatos-Lindenberg and Shuler. For dimensions d=2,3 we present numerically exact values for the mean time to reaction, as gauged by the mean walklength before reactive encounter, obtained via the theory of finite Markov processes and supported by Monte Carlo simulations. Qualitatively different results are found between processes occurring on d=1 versus d>1 lattices, and between results obtained assuming nearest-neighbor (only) versus non-nearest-neighbor displacements.

  16. Spectral properties near the Mott transition in the two-dimensional t-J model with next-nearest-neighbor hopping

    NASA Astrophysics Data System (ADS)

    Kohno, Masanori

    2018-05-01

    The single-particle spectral properties of the two-dimensional t-J model with next-nearest-neighbor hopping are investigated near the Mott transition by using cluster perturbation theory. The spectral features are interpreted by considering the effects of the next-nearest-neighbor hopping on the shift of the spectral-weight distribution of the two-dimensional t-J model. Various anomalous features observed in hole-doped and electron-doped high-temperature cuprate superconductors are collectively explained in the two-dimensional t-J model with next-nearest-neighbor hopping near the Mott transition.

  17. Thermal rectification in mass-graded next-nearest-neighbor Fermi-Pasta-Ulam lattices

    NASA Astrophysics Data System (ADS)

    Romero-Bastida, M.; Miranda-Peña, Jorge-Orlando; López, Juan M.

    2017-03-01

    We study the thermal rectification efficiency, i.e., quantification of asymmetric heat flow, of a one-dimensional mass-graded anharmonic oscillator Fermi-Pasta-Ulam lattice both with nearest-neighbor (NN) and next-nearest-neighbor (NNN) interactions. The system presents a maximum rectification efficiency for a very precise value of the parameter that controls the coupling strength of the NNN interactions, which also optimizes the rectification figure when its dependence on mass asymmetry and temperature differences is considered. The origin of the enhanced rectification is the asymmetric local heat flow response as the heat reservoirs are swapped when a finely tuned NNN contribution is taken into account. A simple theoretical analysis gives an estimate of the optimal NNN coupling in excellent agreement with our simulation results.

  18. A Nearest Neighbor Classifier Employing Critical Boundary Vectors for Efficient On-Chip Template Reduction.

    PubMed

    Xia, Wenjun; Mita, Yoshio; Shibata, Tadashi

    2016-05-01

    Aiming at efficient data condensation and improving accuracy, this paper presents a hardware-friendly template reduction (TR) method for the nearest neighbor (NN) classifiers by introducing the concept of critical boundary vectors. A hardware system is also implemented to demonstrate the feasibility of using an field-programmable gate array (FPGA) to accelerate the proposed method. Initially, k -means centers are used as substitutes for the entire template set. Then, to enhance the classification performance, critical boundary vectors are selected by a novel learning algorithm, which is completed within a single iteration. Moreover, to remove noisy boundary vectors that can mislead the classification in a generalized manner, a global categorization scheme has been explored and applied to the algorithm. The global characterization automatically categorizes each classification problem and rapidly selects the boundary vectors according to the nature of the problem. Finally, only critical boundary vectors and k -means centers are used as the new template set for classification. Experimental results for 24 data sets show that the proposed algorithm can effectively reduce the number of template vectors for classification with a high learning speed. At the same time, it improves the accuracy by an average of 2.17% compared with the traditional NN classifiers and also shows greater accuracy than seven other TR methods. We have shown the feasibility of using a proof-of-concept FPGA system of 256 64-D vectors to accelerate the proposed method on hardware. At a 50-MHz clock frequency, the proposed system achieves a 3.86 times higher learning speed than on a 3.4-GHz PC, while consuming only 1% of the power of that used by the PC.

  19. Seismic clusters analysis in Northeastern Italy by the nearest-neighbor approach

    NASA Astrophysics Data System (ADS)

    Peresan, Antonella; Gentili, Stefania

    2018-01-01

    The main features of earthquake clusters in Northeastern Italy are explored, with the aim to get new insights on local scale patterns of seismicity in the area. The study is based on a systematic analysis of robustly and uniformly detected seismic clusters, which are identified by a statistical method, based on nearest-neighbor distances of events in the space-time-energy domain. The method permits us to highlight and investigate the internal structure of earthquake sequences, and to differentiate the spatial properties of seismicity according to the different topological features of the clusters structure. To analyze seismicity of Northeastern Italy, we use information from local OGS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. A preliminary reappraisal of the earthquake bulletins is carried out and the area of sufficient completeness is outlined. Various techniques are considered to estimate the scaling parameters that characterize earthquakes occurrence in the region, namely the b-value and the fractal dimension of epicenters distribution, required for the application of the nearest-neighbor technique. Specifically, average robust estimates of the parameters of the Unified Scaling Law for Earthquakes, USLE, are assessed for the whole outlined region and are used to compute the nearest-neighbor distances. Clusters identification by the nearest-neighbor method turn out quite reliable and robust with respect to the minimum magnitude cutoff of the input catalog; the identified clusters are well consistent with those obtained from manual aftershocks identification of selected sequences. We demonstrate that the earthquake clusters have distinct preferred geographic locations, and we identify two areas that differ substantially in the examined clustering properties. Specifically, burst-like sequences are associated with the north-western part and swarm-like sequences with the south-eastern part of the study

  20. A Sensor Data Fusion System Based on k-Nearest Neighbor Pattern Classification for Structural Health Monitoring Applications

    PubMed Central

    Vitola, Jaime; Pozo, Francesc; Tibaduiza, Diego A.; Anaya, Maribel

    2017-01-01

    Civil and military structures are susceptible and vulnerable to damage due to the environmental and operational conditions. Therefore, the implementation of technology to provide robust solutions in damage identification (by using signals acquired directly from the structure) is a requirement to reduce operational and maintenance costs. In this sense, the use of sensors permanently attached to the structures has demonstrated a great versatility and benefit since the inspection system can be automated. This automation is carried out with signal processing tasks with the aim of a pattern recognition analysis. This work presents the detailed description of a structural health monitoring (SHM) system based on the use of a piezoelectric (PZT) active system. The SHM system includes: (i) the use of a piezoelectric sensor network to excite the structure and collect the measured dynamic response, in several actuation phases; (ii) data organization; (iii) advanced signal processing techniques to define the feature vectors; and finally; (iv) the nearest neighbor algorithm as a machine learning approach to classify different kinds of damage. A description of the experimental setup, the experimental validation and a discussion of the results from two different structures are included and analyzed. PMID:28230796

  1. A Sensor Data Fusion System Based on k-Nearest Neighbor Pattern Classification for Structural Health Monitoring Applications.

    PubMed

    Vitola, Jaime; Pozo, Francesc; Tibaduiza, Diego A; Anaya, Maribel

    2017-02-21

    Civil and military structures are susceptible and vulnerable to damage due to the environmental and operational conditions. Therefore, the implementation of technology to provide robust solutions in damage identification (by using signals acquired directly from the structure) is a requirement to reduce operational and maintenance costs. In this sense, the use of sensors permanently attached to the structures has demonstrated a great versatility and benefit since the inspection system can be automated. This automation is carried out with signal processing tasks with the aim of a pattern recognition analysis. This work presents the detailed description of a structural health monitoring (SHM) system based on the use of a piezoelectric (PZT) active system. The SHM system includes: (i) the use of a piezoelectric sensor network to excite the structure and collect the measured dynamic response, in several actuation phases; (ii) data organization; (iii) advanced signal processing techniques to define the feature vectors; and finally; (iv) the nearest neighbor algorithm as a machine learning approach to classify different kinds of damage. A description of the experimental setup, the experimental validation and a discussion of the results from two different structures are included and analyzed.

  2. ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data

    PubMed Central

    McKinney, Brett A.; White, Bill C.; Grill, Diane E.; Li, Peter W.; Kennedy, Richard B.; Poland, Gregory A.; Oberg, Ann L.

    2013-01-01

    Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main

  3. A Fast Implementation of the ISOCLUS Algorithm

    NASA Technical Reports Server (NTRS)

    Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

    2003-01-01

    Unsupervised clustering is a fundamental building block in numerous image processing applications. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute the coordinates of a set of cluster centers in d-space, such that those centers minimize the mean squared distance from each data point to its nearest center. This clustering algorithm is similar to another well-known clustering method, called k-means. One significant feature of ISOCLUS over k-means is that the actual number of clusters reported might be fewer or more than the number supplied as part of the input. The algorithm uses different heuristics to determine whether to merge lor split clusters. As ISOCLUS can run very slowly, particularly on large data sets, there has been a growing .interest in the remote sensing community in computing it efficiently. We have developed a faster implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm of Kanungo, et al. They showed that, by using a kd-tree data structure for storing the data, it is possible to reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm, and we show that it is possible to achieve essentially the same results as ISOCLUS on large data sets, but with significantly lower running times. This adaptation involves computing a number of cluster statistics that are needed for ISOCLUS but not for k-means. Both the k-means and ISOCLUS algorithms are based on iterative schemes, in which nearest neighbors are calculated until some convergence criterion is satisfied. Each iteration requires that the nearest center for each data point be computed. Naively, this requires O

  4. Predicting acute contact toxicity of pesticides in honeybees (Apis mellifera) through a k-nearest neighbor model.

    PubMed

    Como, F; Carnesecchi, E; Volani, S; Dorne, J L; Richardson, J; Bassan, A; Pavan, M; Benfenati, E

    2017-01-01

    Ecological risk assessment of plant protection products (PPPs) requires an understanding of both the toxicity and the extent of exposure to assess risks for a range of taxa of ecological importance including target and non-target species. Non-target species such as honey bees (Apis mellifera), solitary bees and bumble bees are of utmost importance because of their vital ecological services as pollinators of wild plants and crops. To improve risk assessment of PPPs in bee species, computational models predicting the acute and chronic toxicity of a range of PPPs and contaminants can play a major role in providing structural and physico-chemical properties for the prioritisation of compounds of concern and future risk assessments. Over the last three decades, scientific advisory bodies and the research community have developed toxicological databases and quantitative structure-activity relationship (QSAR) models that are proving invaluable to predict toxicity using historical data and reduce animal testing. This paper describes the development and validation of a k-Nearest Neighbor (k-NN) model using in-house software for the prediction of acute contact toxicity of pesticides on honey bees. Acute contact toxicity data were collected from different sources for 256 pesticides, which were divided into training and test sets. The k-NN models were validated with good prediction, with an accuracy of 70% for all compounds and of 65% for highly toxic compounds, suggesting that they might reliably predict the toxicity of structurally diverse pesticides and could be used to screen and prioritise new pesticides. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Collective coherence in nearest neighbor coupled metamaterials: A metasurface ruler equation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Ningning; Zhang, Weili, E-mail: weili.zhang@okstate.edu; Singh, Ranjan, E-mail: ranjans@ntu.edu.sg

    The collective coherent interactions in a meta-atom lattice are the key to myriad applications and functionalities offered by metasurfaces. We demonstrate a collective coherent response of the nearest neighbor coupled split-ring resonators whose resonance shift decays exponentially in the strong near-field coupled regime. This occurs due to the dominant magnetic coupling between the nearest neighbors which leads to the decay of the electromagnetic near fields. Based on the size scaling behavior of the different periodicity metasurfaces, we identified a collective coherent metasurface ruler equation. From the coherent behavior, we also show that the near-field coupling in a metasurface lattice existsmore » even when the periodicity exceeds the resonator size. The identification of a universal coherence in metasurfaces and their scaling behavior would enable the design of novel metadevices whose spectral tuning response based on near-field effects could be calibrated across microwave, terahertz, infrared, and the optical parts of the electromagnetic spectrum.« less

  6. Thermodynamics of alternating spin chains with competing nearest- and next-nearest-neighbor interactions: Ising model

    NASA Astrophysics Data System (ADS)

    Pini, Maria Gloria; Rettori, Angelo

    1993-08-01

    The thermodynamical properties of an alternating spin (S,s) one-dimensional (1D) Ising model with competing nearest- and next-nearest-neighbor interactions are exactly calculated using a transfer-matrix technique. In contrast to the case S=s=1/2, previously investigated by Harada, the alternation of different spins (S≠s) along the chain is found to give rise to two-peaked static structure factors, signaling the coexistence of different short-range-order configurations. The relevance of our calculations with regard to recent experimental data by Gatteschi et al. in quasi-1D molecular magnetic materials, R (hfac)3 NITEt (R=Gd, Tb, Dy, Ho, Er, . . .), is discussed; hfac is hexafluoro-acetylacetonate and NlTEt is 2-Ethyl-4,4,5,5-tetramethyl-4,5-dihydro-1H-imidazolyl-1-oxyl-3-oxide.

  7. α-K2AgF4: Ferromagnetism induced by the weak superexchange of different eg orbitals from the nearest neighbor Ag ions

    NASA Astrophysics Data System (ADS)

    Zhang, Xiaoli; Zhang, Guoren; Jia, Ting; Zeng, Zhi; Lin, H. Q.

    2016-05-01

    We study the abnormal ferromagnetism in α-K2AgF4, which is very similar to high-TC parent material La2CuO4 in structure. We find out that the electron correlation is very important in determining the insulating property of α-K2AgF4. The Ag(II) 4d9 in the octahedron crystal field has the t2 g 6 eg 3 electron occupation with eg x2-y2 orbital fully occupied and 3z2-r2 orbital partially occupied. The two eg orbitals are very extended indicating both of them are active in superexchange. Using the Hubbard model combined with Nth-order muffin-tin orbital (NMTO) downfolding technique, it is concluded that the exchange interaction between eg 3z2-r2 and x2-y2 from the first nearest neighbor Ag ions leads to the anomalous ferromagnetism in α-K2AgF4.

  8. Analysis and Identification of Aptamer-Compound Interactions with a Maximum Relevance Minimum Redundancy and Nearest Neighbor Algorithm

    PubMed Central

    Wang, ShaoPeng; Zhang, Yu-Hang; Lu, Jing; Cui, Weiren; Hu, Jerry; Cai, Yu-Dong

    2016-01-01

    The development of biochemistry and molecular biology has revealed an increasingly important role of compounds in several biological processes. Like the aptamer-protein interaction, aptamer-compound interaction attracts increasing attention. However, it is time-consuming to select proper aptamers against compounds using traditional methods, such as exponential enrichment. Thus, there is an urgent need to design effective computational methods for searching effective aptamers against compounds. This study attempted to extract important features for aptamer-compound interactions using feature selection methods, such as Maximum Relevance Minimum Redundancy, as well as incremental feature selection. Each aptamer-compound pair was represented by properties derived from the aptamer and compound, including frequencies of single nucleotides and dinucleotides for the aptamer, as well as the constitutional, electrostatic, quantum-chemical, and space conformational descriptors of the compounds. As a result, some important features were obtained. To confirm the importance of the obtained features, we further discussed the associations between them and aptamer-compound interactions. Simultaneously, an optimal prediction model based on the nearest neighbor algorithm was built to identify aptamer-compound interactions, which has the potential to be a useful tool for the identification of novel aptamer-compound interactions. The program is available upon the request. PMID:26955638

  9. Analysis and Identification of Aptamer-Compound Interactions with a Maximum Relevance Minimum Redundancy and Nearest Neighbor Algorithm.

    PubMed

    Wang, ShaoPeng; Zhang, Yu-Hang; Lu, Jing; Cui, Weiren; Hu, Jerry; Cai, Yu-Dong

    2016-01-01

    The development of biochemistry and molecular biology has revealed an increasingly important role of compounds in several biological processes. Like the aptamer-protein interaction, aptamer-compound interaction attracts increasing attention. However, it is time-consuming to select proper aptamers against compounds using traditional methods, such as exponential enrichment. Thus, there is an urgent need to design effective computational methods for searching effective aptamers against compounds. This study attempted to extract important features for aptamer-compound interactions using feature selection methods, such as Maximum Relevance Minimum Redundancy, as well as incremental feature selection. Each aptamer-compound pair was represented by properties derived from the aptamer and compound, including frequencies of single nucleotides and dinucleotides for the aptamer, as well as the constitutional, electrostatic, quantum-chemical, and space conformational descriptors of the compounds. As a result, some important features were obtained. To confirm the importance of the obtained features, we further discussed the associations between them and aptamer-compound interactions. Simultaneously, an optimal prediction model based on the nearest neighbor algorithm was built to identify aptamer-compound interactions, which has the potential to be a useful tool for the identification of novel aptamer-compound interactions. The program is available upon the request.

  10. Elliptic Painlevé equations from next-nearest-neighbor translations on the E_8^{(1)} lattice

    NASA Astrophysics Data System (ADS)

    Joshi, Nalini; Nakazono, Nobutaka

    2017-07-01

    The well known elliptic discrete Painlevé equation of Sakai is constructed by a standard translation on the E_8(1) lattice, given by nearest neighbor vectors. In this paper, we give a new elliptic discrete Painlevé equation obtained by translations along next-nearest-neighbor vectors. This equation is a generic (8-parameter) version of a 2-parameter elliptic difference equation found by reduction from Adler’s partial difference equation, the so-called Q4 equation. We also provide a projective reduction of the well known equation of Sakai.

  11. Nearest neighbor, bilinear interpolation and bicubic interpolation geographic correction effects on LANDSAT imagery

    NASA Technical Reports Server (NTRS)

    Jayroe, R. R., Jr.

    1976-01-01

    Geographical correction effects on LANDSAT image data are identified, using the nearest neighbor, bilinear interpolation and bicubic interpolation techniques. Potential impacts of registration on image compression and classification are explored.

  12. Parametric, bootstrap, and jackknife variance estimators for the k-Nearest Neighbors technique with illustrations using forest inventory and satellite image data

    Treesearch

    Ronald E. McRoberts; Steen Magnussen; Erkki O. Tomppo; Gherardo Chirici

    2011-01-01

    Nearest neighbors techniques have been shown to be useful for estimating forest attributes, particularly when used with forest inventory and satellite image data. Published reports of positive results have been truly international in scope. However, for these techniques to be more useful, they must be able to contribute to scientific inference which, for sample-based...

  13. Phase transition and monopole densities in a nearest neighbor two-dimensional spin ice model

    NASA Astrophysics Data System (ADS)

    Morais, C. W.; de Freitas, D. N.; Mota, A. L.; Bastone, E. C.

    2017-12-01

    In this work, we show that, due to the alternating orientation of the spins in the ground state of the artificial square spin ice, the influence of a set of spins at a certain distance of a reference spin decreases faster than the expected result for the long range dipolar interaction, justifying the use of the nearest neighbor two-dimensional square spin ice model as an effective model. Using an extension of the model presented in Y. L. Xie et al., Sci. Rep. 5, 15875 (2015), considering the influence of the eight nearest neighbors of each spin on the lattice, we analyze the thermodynamics of the model and study the dependence of monopoles and string densities as a function of the temperature.

  14. Accelerating Families of Fuzzy K-Means Algorithms for Vector Quantization Codebook Design

    PubMed Central

    Mata, Edson; Bandeira, Silvio; de Mattos Neto, Paulo; Lopes, Waslon; Madeiro, Francisco

    2016-01-01

    The performance of signal processing systems based on vector quantization depends on codebook design. In the image compression scenario, the quality of the reconstructed images depends on the codebooks used. In this paper, alternatives are proposed for accelerating families of fuzzy K-means algorithms for codebook design. The acceleration is obtained by reducing the number of iterations of the algorithms and applying efficient nearest neighbor search techniques. Simulation results concerning image vector quantization have shown that the acceleration obtained so far does not decrease the quality of the reconstructed images. Codebook design time savings up to about 40% are obtained by the accelerated versions with respect to the original versions of the algorithms. PMID:27886061

  15. Accelerating Families of Fuzzy K-Means Algorithms for Vector Quantization Codebook Design.

    PubMed

    Mata, Edson; Bandeira, Silvio; de Mattos Neto, Paulo; Lopes, Waslon; Madeiro, Francisco

    2016-11-23

    The performance of signal processing systems based on vector quantization depends on codebook design. In the image compression scenario, the quality of the reconstructed images depends on the codebooks used. In this paper, alternatives are proposed for accelerating families of fuzzy K-means algorithms for codebook design. The acceleration is obtained by reducing the number of iterations of the algorithms and applying efficient nearest neighbor search techniques. Simulation results concerning image vector quantization have shown that the acceleration obtained so far does not decrease the quality of the reconstructed images. Codebook design time savings up to about 40% are obtained by the accelerated versions with respect to the original versions of the algorithms.

  16. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery

    PubMed Central

    Thanh Noi, Phan; Kappas, Martin

    2017-01-01

    In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets. PMID:29271909

  17. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery.

    PubMed

    Thanh Noi, Phan; Kappas, Martin

    2017-12-22

    In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.

  18. Comparison of Neural Networks and Tabular Nearest Neighbor Encoding for Hyperspectral Signature Classification in Unresolved Object Detection

    NASA Astrophysics Data System (ADS)

    Schmalz, M.; Ritter, G.; Key, R.

    Accurate and computationally efficient spectral signature classification is a crucial step in the nonimaging detection and recognition of spaceborne objects. In classical hyperspectral recognition applications using linear mixing models, signature classification accuracy depends on accurate spectral endmember discrimination [1]. If the endmembers cannot be classified correctly, then the signatures cannot be classified correctly, and object recognition from hyperspectral data will be inaccurate. In practice, the number of endmembers accurately classified often depends linearly on the number of inputs. This can lead to potentially severe classification errors in the presence of noise or densely interleaved signatures. In this paper, we present an comparison of emerging technologies for nonimaging spectral signature classfication based on a highly accurate, efficient search engine called Tabular Nearest Neighbor Encoding (TNE) [3,4] and a neural network technology called Morphological Neural Networks (MNNs) [5]. Based on prior results, TNE can optimize its classifier performance to track input nonergodicities, as well as yield measures of confidence or caution for evaluation of classification results. Unlike neural networks, TNE does not have a hidden intermediate data structure (e.g., the neural net weight matrix). Instead, TNE generates and exploits a user-accessible data structure called the agreement map (AM), which can be manipulated by Boolean logic operations to effect accurate classifier refinement algorithms. The open architecture and programmability of TNE's agreement map processing allows a TNE programmer or user to determine classification accuracy, as well as characterize in detail the signatures for which TNE did not obtain classification matches, and why such mis-matches occurred. In this study, we will compare TNE and MNN based endmember classification, using performance metrics such as probability of correct classification (Pd) and rate of false

  19. Localization in one-dimensional lattices with non-nearest-neighbor hopping: Generalized Anderson and Aubry-Andre models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Biddle, J.; Priour, D. J. Jr.; Wang, B.

    We study the quantum localization phenomena of noninteracting particles in one-dimensional lattices based on tight-binding models with various forms of hopping terms beyond the nearest neighbor, which are generalizations of the famous Aubry-Andre and noninteracting Anderson models. For the case with deterministic disordered potential induced by a secondary incommensurate lattice (i.e., the Aubry-Andre model), we identify a class of self-dual models, for which the boundary between localized and extended eigenstates are determined analytically by employing a generalized Aubry-Andre transformation. We also numerically investigate the localization properties of nondual models with next-nearest-neighbor hopping, Gaussian, and power-law decay hopping terms. We findmore » that even for these nondual models, the numerically obtained mobility edges can be well approximated by the analytically obtained condition for localization transition in the self-dual models, as long as the decay of the hopping rate with respect to distance is sufficiently fast. For the disordered potential with genuinely random character, we examine scenarios with next-nearest-neighbor hopping, exponential, Gaussian, and power-law decay hopping terms numerically. We find that the higher-order hopping terms can remove the symmetry in the localization length about the energy band center compared to the Anderson model. Furthermore, our results demonstrate that for the power-law decay case, there exists a critical exponent below which mobility edges can be found. Our theoretical results could, in principle, be directly tested in shallow atomic optical lattice systems enabling non-nearest-neighbor hopping.« less

  20. Polymers with nearest- and next nearest-neighbor interactions on the Husimi lattice

    NASA Astrophysics Data System (ADS)

    Oliveira, Tiago J.

    2016-04-01

    The exact grand-canonical solution of a generalized interacting self-avoid walk (ISAW) model, placed on a Husimi lattice built with squares, is presented. In this model, beyond the traditional interaction {ω }1={{{e}}}{ɛ 1/{k}BT} between (nonconsecutive) monomers on nearest-neighbor (NN) sites, an additional energy {ɛ }2 is associated to next-NN (NNN) monomers. Three definitions of NNN sites/interactions are considered, where each monomer can have, effectively, at most two, four, or six NNN monomers on the Husimi lattice. The phase diagrams found in all cases have (qualitatively) the same thermodynamic properties: a non-polymerized (NP) and a polymerized (P) phase separated by a critical and a coexistence surface that meet at a tricritical (θ-) line. This θ-line is found even when one of the interactions is repulsive, existing for {ω }1 in the range [0,∞ ), i.e., for {ɛ }1/{k}BT in the range [-∞ ,∞ ). Thus, counterintuitively, a θ-point exists even for an infinite repulsion between NN monomers ({ω }1=0), being associated to a coil-‘soft globule’ transition. In the limit of an infinite repulsive force between NNN monomers, however, the coil-globule transition disappears, and only NP-P continuous transition is observed. This particular case, with {ω }2=0, is also solved exactly on the square lattice, using a transfer matrix calculation where a discontinuous NP-P transition is found. For attractive and repulsive forces between NN and NNN monomers, respectively, the model becomes quite similar to the semiflexible-ISAW one, whose crystalline phase is not observed here, as a consequence of the frustration due to competing NN and NNN forces. The mapping of the phase diagrams in canonical ones is discussed and compared with recent results from Monte Carlo simulations on the square lattice.

  1. Monte Carlo study of a ferrimagnetic mixed-spin (2, 5/2) system with the nearest and next-nearest neighbors exchange couplings

    NASA Astrophysics Data System (ADS)

    Bi, Jiang-lin; Wang, Wei; Li, Qi

    2017-07-01

    In this paper, the effects of the next-nearest neighbors exchange couplings on the magnetic and thermal properties of the ferrimagnetic mixed-spin (2, 5/2) Ising model on a 3D honeycomb lattice have been investigated by the use of Monte Carlo simulation. In particular, the influences of exchange couplings (Ja, Jb, Jan) and the single-ion anisotropy(Da) on the phase diagrams, the total magnetization, the sublattice magnetization, the total susceptibility, the internal energy and the specific heat have been discussed in detail. The results clearly show that the system can express the critical and compensation behavior within the next-nearest neighbors exchange coupling. Great deals of the M curves such as N-, Q-, P- and L-types have been discovered, owing to the competition between the exchange coupling and the temperature. Compared with other theoretical and experimental works, our results have an excellent consistency with theirs.

  2. Aftershock identification problem via the nearest-neighbor analysis for marked point processes

    NASA Astrophysics Data System (ADS)

    Gabrielov, A.; Zaliapin, I.; Wong, H.; Keilis-Borok, V.

    2007-12-01

    The centennial observations on the world seismicity have revealed a wide variety of clustering phenomena that unfold in the space-time-energy domain and provide most reliable information about the earthquake dynamics. However, there is neither a unifying theory nor a convenient statistical apparatus that would naturally account for the different types of seismic clustering. In this talk we present a theoretical framework for nearest-neighbor analysis of marked processes and obtain new results on hierarchical approach to studying seismic clustering introduced by Baiesi and Paczuski (2004). Recall that under this approach one defines an asymmetric distance D in space-time-energy domain such that the nearest-neighbor spanning graph with respect to D becomes a time- oriented tree. We demonstrate how this approach can be used to detect earthquake clustering. We apply our analysis to the observed seismicity of California and synthetic catalogs from ETAS model and show that the earthquake clustering part is statistically different from the homogeneous part. This finding may serve as a basis for an objective aftershock identification procedure.

  3. Dynamical phases in a one-dimensional chain of heterospecies Rydberg atoms with next-nearest-neighbor interactions

    NASA Astrophysics Data System (ADS)

    Qian, Jing; Zhang, Lu; Zhai, Jingjing; Zhang, Weiping

    2015-12-01

    We theoretically investigate the dynamical phase diagram of a one-dimensional chain of laser-excited two-species Rydberg atoms. The existence of a variety of unique dynamical phases in the experimentally achievable parameter region is predicted under the mean-field approximation, and the change in those phases when the effect of the next-nearest-neighbor interaction is included is further discussed. In particular, we find that the com-petition of the strong Rydberg-Rydberg interactions and the optical excitation imbalance can lead to the presence of complex multiple chaotic phases, which are highly sensitive to the initial Rydberg-state population and the strength of the next-nearest-neighbor interactions.

  4. A dynamical mean-field study of orbital-selective Mott phase enhanced by next-nearest neighbor hopping

    NASA Astrophysics Data System (ADS)

    Niu, Yuekun; Sun, Jian; Ni, Yu; Song, Yun

    2018-06-01

    The dynamical mean-field theory is employed to study the orbital-selective Mott transition (OSMT) of the two-orbital Hubbard model with nearest neighbor hopping and next-nearest neighbor (NNN) hopping. The NNN hopping breaks the particle-hole symmetry at half filling and gives rise to an asymmetric density of states (DOS). Our calculations show that the broken symmetry of DOS benefits the OSMT, where the region of the orbital-selective Mott phase significantly extends with the increasing NNN hopping integral. We also find that Hund's rule coupling promotes OSMT by blocking the orbital fluctuations, but the influence of NNN hopping is more remarkable.

  5. Phase transitions and critical properties in the antiferromagnetic Ising model on a layered triangular lattice with allowance for intralayer next-nearest-neighbor interactions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Badiev, M. K., E-mail: m-zagir@mail.ru; Murtazaev, A. K.; Ramazanov, M. K.

    2016-10-15

    The phase transitions (PTs) and critical properties of the antiferromagnetic Ising model on a layered (stacked) triangular lattice have been studied by the Monte Carlo method using a replica algorithm with allowance for the next-nearest-neighbor interactions. The character of PTs is analyzed using the histogram technique and the method of Binder cumulants. It is established that the transition from the disordered to paramagnetic phase in the adopted model is a second-order PT. Static critical exponents of the heat capacity (α), susceptibility (γ), order parameter (β), and correlation radius (ν) and the Fischer exponent η are calculated using the finite-size scalingmore » theory. It is shown that (i) the antiferromagnetic Ising model on a layered triangular lattice belongs to the XY universality class of critical behavior and (ii) allowance for the intralayer interactions of next-nearest neighbors in the adopted model leads to a change in the universality class of critical behavior.« less

  6. Next nearest neighbors sites and the reactivity of the CO NO surface reaction

    NASA Astrophysics Data System (ADS)

    Cortés, Joaquín.; Valencia, Eliana

    1998-04-01

    Using Monte Carlo experiments of the reduction of NO by CO, a study is made of the effect on reactivity due to the formation of N 2O and to the increased coordination of the sites considering the next nearest neighbors sites (nnn) in a square lattice of superficial sites.

  7. Discrimination of soft tissues using laser-induced breakdown spectroscopy in combination with k nearest neighbors (kNN) and support vector machine (SVM) classifiers

    NASA Astrophysics Data System (ADS)

    Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying

    2018-06-01

    In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.

  8. Distributed Adaptive Binary Quantization for Fast Nearest Neighbor Search.

    PubMed

    Xianglong Liu; Zhujin Li; Cheng Deng; Dacheng Tao

    2017-11-01

    Hashing has been proved an attractive technique for fast nearest neighbor search over big data. Compared with the projection based hashing methods, prototype-based ones own stronger power to generate discriminative binary codes for the data with complex intrinsic structure. However, existing prototype-based methods, such as spherical hashing and K-means hashing, still suffer from the ineffective coding that utilizes the complete binary codes in a hypercube. To address this problem, we propose an adaptive binary quantization (ABQ) method that learns a discriminative hash function with prototypes associated with small unique binary codes. Our alternating optimization adaptively discovers the prototype set and the code set of a varying size in an efficient way, which together robustly approximate the data relations. Our method can be naturally generalized to the product space for long hash codes, and enjoys the fast training linear to the number of the training data. We further devise a distributed framework for the large-scale learning, which can significantly speed up the training of ABQ in the distributed environment that has been widely deployed in many areas nowadays. The extensive experiments on four large-scale (up to 80 million) data sets demonstrate that our method significantly outperforms state-of-the-art hashing methods, with up to 58.84% performance gains relatively.

  9. Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lin, Jian; Hamidouche, Khaled; Zheng, Jie

    2015-08-05

    Machine Learning algorithms are benefiting from the continuous improvement of programming models, including MPI, MapReduce and PGAS. k-Nearest Neighbors (k-NN) algorithm is a widely used machine learning algorithm, applied to supervised learning tasks such as classification. Several parallel implementations of k-NN have been proposed in the literature and practice. However, on high-performance computing systems with high-speed interconnects, it is important to further accelerate existing designs of the k-NN algorithm through taking advantage of scalable programming models. To improve the performance of k-NN on large-scale environment with InfiniBand network, this paper proposes several alternative hybrid MPI+OpenSHMEM designs and performs a systemicmore » evaluation and analysis on typical workloads. The hybrid designs leverage the one-sided memory access to better overlap communication with computation than the existing pure MPI design, and propose better schemes for efficient buffer management. The implementation based on k-NN program from MaTEx with MVAPICH2-X (Unified MPI+PGAS Communication Runtime over InfiniBand) shows up to 9.0% time reduction for training KDD Cup 2010 workload over 512 cores, and 27.6% time reduction for small workload with balanced communication and computation. Experiments of running with varied number of cores show that our design can maintain good scalability.« less

  10. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

    PubMed

    Rivas, Elena; Lang, Raymond; Eddy, Sean R

    2012-02-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.

  11. Realization of the axial next-nearest-neighbor Ising model in U 3 Al 2 Ge 3

    DOE PAGES

    Fobes, David M.; Lin, Shi-Zeng; Ghimire, Nirmal J.; ...

    2017-11-09

    Inmore » this paper, we report small-angle neutron scattering (SANS) measurements and theoretical modeling of U 3 Al 2 Ge 3 . Analysis of the SANS data reveals a phase transition to sinusoidally modulated magnetic order at T N = 63 K to be second order and a first-order phase transition to ferromagnetic order at T c = 48 K. Within the sinusoidally modulated magnetic phase (T c < T < T N), we uncover a dramatic change, by a factor of 3, in the ordering wave vector as a function of temperature. Finally, these observations all indicate that U 3 Al 2 Ge 3 is a close realization of the three-dimensional axial next-nearest-neighbor Ising model, a prototypical framework for describing commensurate to incommensurate phase transitions in frustrated magnets.« less

  12. Realization of the axial next-nearest-neighbor Ising model in U 3 Al 2 Ge 3

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fobes, David M.; Lin, Shi-Zeng; Ghimire, Nirmal J.

    Inmore » this paper, we report small-angle neutron scattering (SANS) measurements and theoretical modeling of U 3 Al 2 Ge 3 . Analysis of the SANS data reveals a phase transition to sinusoidally modulated magnetic order at T N = 63 K to be second order and a first-order phase transition to ferromagnetic order at T c = 48 K. Within the sinusoidally modulated magnetic phase (T c < T < T N), we uncover a dramatic change, by a factor of 3, in the ordering wave vector as a function of temperature. Finally, these observations all indicate that U 3 Al 2 Ge 3 is a close realization of the three-dimensional axial next-nearest-neighbor Ising model, a prototypical framework for describing commensurate to incommensurate phase transitions in frustrated magnets.« less

  13. Mapping change of older forest with nearest-neighbor imputation and Landsat time-series

    Treesearch

    Janet L. Ohmann; Matthew J. Gregory; Heather M. Roberts; Warren B. Cohen; Robert E. Kennedy; Zhiqiang Yang

    2012-01-01

    The Northwest Forest Plan (NWFP), which aims to conserve late-successional and old-growth forests (older forests) and associated species, established new policies on federal lands in the Pacific Northwest USA. As part of monitoring for the NWFP, we tested nearest-neighbor imputation for mapping change in older forest, defined by threshold values for forest attributes...

  14. Terahertz metasurfaces with a high refractive index enhanced by the strong nearest neighbor coupling.

    PubMed

    Tan, Siyu; Yan, Fengping; Singh, Leena; Cao, Wei; Xu, Ningning; Hu, Xiang; Singh, Ranjan; Wang, Mingwei; Zhang, Weili

    2015-11-02

    The realization of high refractive index is of significant interest in optical imaging with enhanced resolution. Strongly coupled subwavelength resonators were proposed and demonstrated at both optical and terahertz frequencies to enhance the refractive index due to large induced dipole moment in meta-atoms. Here, we report an alternative design for flexible free-standing terahertz metasurface in the strong coupling regime where we experimentally achieve a peak refractive index value of 14.36. We also investigate the impact of the nearest neighbor coupling in the form of frequency tuning and enhancement of the peak refractive index. We provide an analytical circuit model to explain the impact of geometrical parameters and coupling on the effective refractive index of the metasurface. The proposed meta-atom structure enables tailoring of the peak refractive index based on nearest neighbor coupling and this property offers tremendous design flexibility for transformation optics and other index-gradient devices at terahertz frequencies.

  15. Phase transitions and thermodynamic properties of antiferromagnetic Ising model with next-nearest-neighbor interactions on the Kagomé lattice

    NASA Astrophysics Data System (ADS)

    Ramazanov, M. K.; Murtazaev, A. K.; Magomedov, M. A.; Badiev, M. K.

    2018-06-01

    We study phase transitions and thermodynamic properties in the two-dimensional antiferromagnetic Ising model with next-nearest-neighbor interaction on a Kagomé lattice by Monte Carlo simulations. A histogram data analysis shows that a second-order transition occurs in the model. From the analysis of obtained data, we can assume that next-nearest-neighbor ferromagnetic interactions in two-dimensional antiferromagnetic Ising model on a Kagomé lattice excite the occurrence of a second-order transition and unusual behavior of thermodynamic properties on the temperature dependence.

  16. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more

    PubMed Central

    Rivas, Elena; Lang, Raymond; Eddy, Sean R.

    2012-01-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308

  17. Moderate-resolution data and gradient nearest neighbor imputation for regional-national risk assessment

    Treesearch

    Kenneth B. Jr. Pierce; C. Kenneth Brewer; Janet L. Ohmann

    2010-01-01

    This study was designed to test the feasibility of combining a method designed to populate pixels with inventory plot data at the 30-m scale with a new national predictor data set. The new national predictor data set was developed by the USDA Forest Service Remote Sensing Applications Center (hereafter RSAC) at the 250-m scale. Gradient Nearest Neighbor (GNN)...

  18. Evaluation of nearest-neighbor methods for detection of chimeric small-subunit rRNA sequences

    NASA Technical Reports Server (NTRS)

    Robison-Cox, J. F.; Bateson, M. M.; Ward, D. M.

    1995-01-01

    Detection of chimeric artifacts formed when PCR is used to retrieve naturally occurring small-subunit (SSU) rRNA sequences may rely on demonstrating that different sequence domains have different phylogenetic affiliations. We evaluated the CHECK_CHIMERA method of the Ribosomal Database Project and another method which we developed, both based on determining nearest neighbors of different sequence domains, for their ability to discern artificially generated SSU rRNA chimeras from authentic Ribosomal Database Project sequences. The reliability of both methods decreases when the parental sequences which contribute to chimera formation are more than 82 to 84% similar. Detection is also complicated by the occurrence of authentic SSU rRNA sequences that behave like chimeras. We developed a naive statistical test based on CHECK_CHIMERA output and used it to evaluate previously reported SSU rRNA chimeras. Application of this test also suggests that chimeras might be formed by retrieving SSU rRNAs as cDNA. The amount of uncertainty associated with nearest-neighbor analyses indicates that such tests alone are insufficient and that better methods are needed.

  19. Near-Neighbor Algorithms for Processing Bearing Data

    DTIC Science & Technology

    1989-05-10

    neighbor algorithms need not be universally more cost -effective than brute force methods. While the data access time of near-neighbor techniques scales with...the number of objects N better than brute force, the cost of setting up the data structure could scale worse than (Continues) 20...for the near neighbors NN2 1 (i). Depending on the particular NN algorithm, the cost of accessing near neighbors for each ai E S1 scales as either N

  20. Estimating Stand Height and Tree Density in Pinus taeda plantations using in-situ data, airborne LiDAR and k-Nearest Neighbor Imputation.

    PubMed

    Silva, Carlos Alberto; Klauberg, Carine; Hudak, Andrew T; Vierling, Lee A; Liesenberg, Veraldo; Bernett, Luiz G; Scheraiber, Clewerson F; Schoeninger, Emerson R

    2018-01-01

    Accurate forest inventory is of great economic importance to optimize the entire supply chain management in pulp and paper companies. The aim of this study was to estimate stand dominate and mean heights (HD and HM) and tree density (TD) of Pinus taeda plantations located in South Brazil using in-situ measurements, airborne Light Detection and Ranging (LiDAR) data and the non- k-nearest neighbor (k-NN) imputation. Forest inventory attributes and LiDAR derived metrics were calculated at 53 regular sample plots and we used imputation models to retrieve the forest attributes at plot and landscape-levels. The best LiDAR-derived metrics to predict HD, HM and TD were H99TH, HSD, SKE and HMIN. The Imputation model using the selected metrics was more effective for retrieving height than tree density. The model coefficients of determination (adj.R2) and a root mean squared difference (RMSD) for HD, HM and TD were 0.90, 0.94, 0.38m and 6.99, 5.70, 12.92%, respectively. Our results show that LiDAR and k-NN imputation can be used to predict stand heights with high accuracy in Pinus taeda. However, furthers studies need to be realized to improve the accuracy prediction of TD and to evaluate and compare the cost of acquisition and processing of LiDAR data against the conventional inventory procedures.

  1. Spatio-temporal distribution of Oklahoma earthquakes: Exploring relationships using a nearest-neighbor approach: Nearest-neighbor analysis of Oklahoma

    DOE PAGES

    Vasylkivska, Veronika S.; Huerta, Nicolas J.

    2017-06-24

    Determining the spatiotemporal characteristics of natural and induced seismic events holds the opportunity to gain new insights into why these events occur. Linking the seismicity characteristics with other geologic, geographic, natural, or anthropogenic factors could help to identify the causes and suggest mitigation strategies that reduce the risk associated with such events. The nearest-neighbor approach utilized in this work represents a practical first step toward identifying statistically correlated clusters of recorded earthquake events. Detailed study of the Oklahoma earthquake catalog’s inherent errors, empirical model parameters, and model assumptions is presented. We found that the cluster analysis results are stable withmore » respect to empirical parameters (e.g., fractal dimension) but were sensitive to epicenter location errors and seismicity rates. Most critically, we show that the patterns in the distribution of earthquake clusters in Oklahoma are primarily defined by spatial relationships between events. This observation is a stark contrast to California (also known for induced seismicity) where a comparable cluster distribution is defined by both spatial and temporal interactions between events. These results highlight the difficulty in understanding the mechanisms and behavior of induced seismicity but provide insights for future work.« less

  2. Spatio-temporal distribution of Oklahoma earthquakes: Exploring relationships using a nearest-neighbor approach: Nearest-neighbor analysis of Oklahoma

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vasylkivska, Veronika S.; Huerta, Nicolas J.

    Determining the spatiotemporal characteristics of natural and induced seismic events holds the opportunity to gain new insights into why these events occur. Linking the seismicity characteristics with other geologic, geographic, natural, or anthropogenic factors could help to identify the causes and suggest mitigation strategies that reduce the risk associated with such events. The nearest-neighbor approach utilized in this work represents a practical first step toward identifying statistically correlated clusters of recorded earthquake events. Detailed study of the Oklahoma earthquake catalog’s inherent errors, empirical model parameters, and model assumptions is presented. We found that the cluster analysis results are stable withmore » respect to empirical parameters (e.g., fractal dimension) but were sensitive to epicenter location errors and seismicity rates. Most critically, we show that the patterns in the distribution of earthquake clusters in Oklahoma are primarily defined by spatial relationships between events. This observation is a stark contrast to California (also known for induced seismicity) where a comparable cluster distribution is defined by both spatial and temporal interactions between events. These results highlight the difficulty in understanding the mechanisms and behavior of induced seismicity but provide insights for future work.« less

  3. An automated algorithm for determining photometric redshifts of quasars

    NASA Astrophysics Data System (ADS)

    Wang, Dan; Zhang, Yanxia; Zhao, Yongheng

    2010-07-01

    We employ k-nearest neighbor algorithm (KNN) for photometric redshift measurement of quasars with the Fifth Data Release (DR5) of the Sloan Digital Sky Survey (SDSS). KNN is an instance learning algorithm where the result of new instance query is predicted based on the closest training samples. The regressor do not use any model to fit and only based on memory. Given a query quasar, we find the known quasars or (training points) closest to the query point, whose redshift value is simply assigned to be the average of the values of its k nearest neighbors. Three kinds of different colors (PSF, Model or Fiber) and spectral redshifts are used as input parameters, separatively. The combination of the three kinds of colors is also taken as input. The experimental results indicate that the best input pattern is PSF + Model + Fiber colors in all experiments. With this pattern, 59.24%, 77.34% and 84.68% of photometric redshifts are obtained within ▵z < 0.1, 0.2 and 0.3, respectively. If only using one kind of colors as input, the model colors achieve the best performance. However, when using two kinds of colors, the best result is achieved by PSF + Fiber colors. In addition, nearest neighbor method (k = 1) shows its superiority compared to KNN (k ≠ 1) for the given sample.

  4. Quantum phase transitions of the one-dimensional Peierls-Hubbard model with next-nearest-neighbor hopping integrals

    NASA Astrophysics Data System (ADS)

    Otsuka, Hiromi

    1998-06-01

    We investigate two kinds of quantum phase transitions observed in the one-dimensional half-filled Peierls-Hubbard model with the next-nearest-neighbor hopping integral in the strong-coupling region U>>t, t' [t (t'), nearest- (next-nearest-) neighbor hopping; U, on-site Coulomb repulsion]. In the uniform case, with the help of the conformal field theory prediction, we numerically determine a phase boundary t'c(U/t) between the spin-fluid and the dimer states, where a bare coupling of the marginal operator vanishes and the low-energy and long-distance behaviors of the spin part are described by a free-boson model. To exhibit the conformal invariance of the systems on the phase boundary, a multiplet structure of the excitation spectrum of finite-size systems and a value of the central charge are also examined. The critical phenomenological aspect of the spin-Peierls transitions accompanied by the lattice dimerization is then argued for the systems on the phase boundary; the existence of logarithmic corrections to the power-law behaviors of the energy gain and the spin gap (i.e., the Cross-Fisher scaling law) are discussed.

  5. Minimum Expected Risk Estimation for Near-neighbor Classification

    DTIC Science & Technology

    2006-04-01

    We consider the problems of class probability estimation and classification when using near-neighbor classifiers, such as k-nearest neighbors ( kNN ...estimate for weighted kNN classifiers with different prior information, for a broad class of risk functions. Theory and simulations show how significant...the difference is compared to the standard maximum likelihood weighted kNN estimates. Comparisons are made with uniform weights, symmetric weights

  6. Rapid and Robust Cross-Correlation-Based Seismic Signal Identification Using an Approximate Nearest Neighbor Method

    DOE PAGES

    Tibi, Rigobert; Young, Christopher; Gonzales, Antonio; ...

    2017-07-04

    The matched filtering technique that uses the cross correlation of a waveform of interest with archived signals from a template library has proven to be a powerful tool for detecting events in regions with repeating seismicity. However, waveform correlation is computationally expensive and therefore impractical for large template sets unless dedicated distributed computing hardware and software are used. In this paper, we introduce an approximate nearest neighbor (ANN) approach that enables the use of very large template libraries for waveform correlation. Our method begins with a projection into a reduced dimensionality space, based on correlation with a randomized subset ofmore » the full template archive. Searching for a specified number of nearest neighbors for a query waveform is accomplished by iteratively comparing it with the neighbors of its immediate neighbors. We used the approach to search for matches to each of ~2300 analyst-reviewed signal detections reported in May 2010 for the International Monitoring System station MKAR. The template library in this case consists of a data set of more than 200,000 analyst-reviewed signal detections for the same station from February 2002 to July 2016 (excluding May 2010). Of these signal detections, 73% are teleseismic first P and 17% regional phases (Pn, Pg, Sn, and Lg). Finally, the analyses performed on a standard desktop computer show that the proposed ANN approach performs a search of the large template libraries about 25 times faster than the standard full linear search and achieves recall rates greater than 80%, with the recall rate increasing for higher correlation thresholds.« less

  7. Rapid and Robust Cross-Correlation-Based Seismic Signal Identification Using an Approximate Nearest Neighbor Method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tibi, Rigobert; Young, Christopher; Gonzales, Antonio

    The matched filtering technique that uses the cross correlation of a waveform of interest with archived signals from a template library has proven to be a powerful tool for detecting events in regions with repeating seismicity. However, waveform correlation is computationally expensive and therefore impractical for large template sets unless dedicated distributed computing hardware and software are used. In this paper, we introduce an approximate nearest neighbor (ANN) approach that enables the use of very large template libraries for waveform correlation. Our method begins with a projection into a reduced dimensionality space, based on correlation with a randomized subset ofmore » the full template archive. Searching for a specified number of nearest neighbors for a query waveform is accomplished by iteratively comparing it with the neighbors of its immediate neighbors. We used the approach to search for matches to each of ~2300 analyst-reviewed signal detections reported in May 2010 for the International Monitoring System station MKAR. The template library in this case consists of a data set of more than 200,000 analyst-reviewed signal detections for the same station from February 2002 to July 2016 (excluding May 2010). Of these signal detections, 73% are teleseismic first P and 17% regional phases (Pn, Pg, Sn, and Lg). Finally, the analyses performed on a standard desktop computer show that the proposed ANN approach performs a search of the large template libraries about 25 times faster than the standard full linear search and achieves recall rates greater than 80%, with the recall rate increasing for higher correlation thresholds.« less

  8. Nearest neighbor density ratio estimation for large-scale applications in astronomy

    NASA Astrophysics Data System (ADS)

    Kremer, J.; Gieseke, F.; Steenstrup Pedersen, K.; Igel, C.

    2015-09-01

    In astronomical applications of machine learning, the distribution of objects used for building a model is often different from the distribution of the objects the model is later applied to. This is known as sample selection bias, which is a major challenge for statistical inference as one can no longer assume that the labeled training data are representative. To address this issue, one can re-weight the labeled training patterns to match the distribution of unlabeled data that are available already in the training phase. There are many examples in practice where this strategy yielded good results, but estimating the weights reliably from a finite sample is challenging. We consider an efficient nearest neighbor density ratio estimator that can exploit large samples to increase the accuracy of the weight estimates. To solve the problem of choosing the right neighborhood size, we propose to use cross-validation on a model selection criterion that is unbiased under covariate shift. The resulting algorithm is our method of choice for density ratio estimation when the feature space dimensionality is small and sample sizes are large. The approach is simple and, because of the model selection, robust. We empirically find that it is on a par with established kernel-based methods on relatively small regression benchmark datasets. However, when applied to large-scale photometric redshift estimation, our approach outperforms the state-of-the-art.

  9. Rapid and Robust Cross-Correlation-Based Seismic Phase Identification Using an Approximate Nearest Neighbor Method

    NASA Astrophysics Data System (ADS)

    Tibi, R.; Young, C. J.; Gonzales, A.; Ballard, S.; Encarnacao, A. V.

    2016-12-01

    The matched filtering technique involving the cross-correlation of a waveform of interest with archived signals from a template library has proven to be a powerful tool for detecting events in regions with repeating seismicity. However, waveform correlation is computationally expensive, and therefore impractical for large template sets unless dedicated distributed computing hardware and software are used. In this study, we introduce an Approximate Nearest Neighbor (ANN) approach that enables the use of very large template libraries for waveform correlation without requiring a complex distributed computing system. Our method begins with a projection into a reduced dimensionality space based on correlation with a randomized subset of the full template archive. Searching for a specified number of nearest neighbors is accomplished by using randomized K-dimensional trees. We used the approach to search for matches to each of 2700 analyst-reviewed signal detections reported for May 2010 for the IMS station MKAR. The template library in this case consists of a dataset of more than 200,000 analyst-reviewed signal detections for the same station from 2002-2014 (excluding May 2010). Of these signal detections, 60% are teleseismic first P, and 15% regional phases (Pn, Pg, Sn, and Lg). The analyses performed on a standard desktop computer shows that the proposed approach performs the search of the large template libraries about 20 times faster than the standard full linear search, while achieving recall rates greater than 80%, with the recall rate increasing for higher correlation values. To decide whether to confirm a match, we use a hybrid method involving a cluster approach for queries with two or more matches, and correlation score for single matches. Of the signal detections that passed our confirmation process, 52% were teleseismic first P, and 30% were regional phases.

  10. Weak doping dependence of the antiferromagnetic coupling between nearest-neighbor Mn2 + spins in (Ba1 -xKx) (Zn1-yMny) 2As2

    NASA Astrophysics Data System (ADS)

    Surmach, M. A.; Chen, B. J.; Deng, Z.; Jin, C. Q.; Glasbrenner, J. K.; Mazin, I. I.; Ivanov, A.; Inosov, D. S.

    2018-03-01

    Dilute magnetic semiconductors (DMS) are nonmagnetic semiconductors doped with magnetic transition metals. The recently discovered DMS material (Ba1 -xKx) (Zn1-yMny) 2As2 offers a unique and versatile control of the Curie temperature TC by decoupling the spin (Mn2 +, S =5 /2 ) and charge (K+) doping in different crystallographic layers. In an attempt to describe from first-principles calculations the role of hole doping in stabilizing ferromagnetic order, it was recently suggested that the antiferromagnetic exchange coupling J between the nearest-neighbor Mn ions would experience a nearly twofold suppression upon doping 20% of holes by potassium substitution. At the same time, further-neighbor interactions become increasingly ferromagnetic upon doping, leading to a rapid increase of TC. Using inelastic neutron scattering, we have observed a localized magnetic excitation at about 13 meV associated with the destruction of the nearest-neighbor Mn-Mn singlet ground state. Hole doping results in a notable broadening of this peak, evidencing significant particle-hole damping, but with only a minor change in the peak position. We argue that this unexpected result can be explained by a combined effect of superexchange and double-exchange interactions.

  11. Mapping from multiple-control Toffoli circuits to linear nearest neighbor quantum circuits

    NASA Astrophysics Data System (ADS)

    Cheng, Xueyun; Guan, Zhijin; Ding, Weiping

    2018-07-01

    In recent years, quantum computing research has been attracting more and more attention, but few studies on the limited interaction distance between quantum bits (qubit) are deeply carried out. This paper presents a mapping method for transforming multiple-control Toffoli (MCT) circuits into linear nearest neighbor (LNN) quantum circuits instead of traditional decomposition-based methods. In order to reduce the number of inserted SWAP gates, a novel type of gate with the optimal LNN quantum realization was constructed, namely NNTS gate. The MCT gate with multiple control bits could be better cascaded by the NNTS gates, in which the arrangement of the input lines was LNN arrangement of the MCT gate. Then, the communication overhead measurement model on inserted SWAP gate count from the original arrangement to the new arrangement was put forward, and we selected one of the LNN arrangements with the minimum SWAP gate count. Moreover, the LNN arrangement-based mapping algorithm was given, and it dealt with the MCT gates in turn and mapped each MCT gate into its LNN form by inserting the minimum number of SWAP gates. Finally, some simplification rules were used, which can further reduce the final quantum cost of the LNN quantum circuit. Experiments on some benchmark MCT circuits indicate that the direct mapping algorithm results in fewer additional SWAP gates in about 50%, while the average improvement rate in quantum cost is 16.95% compared to the decomposition-based method. In addition, it has been verified that the proposed method has greater superiority for reversible circuits cascaded by MCT gates with more control bits.

  12. Control of coherence among the spins of a single electron and the three nearest neighbor {sup 13}C nuclei of a nitrogen-vacancy center in diamond

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shimo-Oka, T.; Miwa, S.; Suzuki, Y.

    2015-04-13

    Individual nuclear spins in diamond can be optically detected through hyperfine couplings with the electron spin of a single nitrogen-vacancy (NV) center; such nuclear spins have outstandingly long coherence times. Among the hyperfine couplings in the NV center, the nearest neighbor {sup 13}C nuclear spins have the largest coupling strength. Nearest neighbor {sup 13}C nuclear spins have the potential to perform fastest gate operations, providing highest fidelity in quantum computing. Herein, we report on the control of coherences in the NV center where all three nearest neighbor carbons are of the {sup 13}C isotope. Coherence among the three and fourmore » qubits are generated and analyzed at room temperature.« less

  13. Heterogeneity and nearest-neighbor coupling can explain small-worldness and wave properties in pancreatic islets

    NASA Astrophysics Data System (ADS)

    Cappon, Giacomo; Pedersen, Morten Gram

    2016-05-01

    Many multicellular systems consist of coupled cells that work as a syncytium. The pancreatic islet of Langerhans is a well-studied example of such a microorgan. The islets are responsible for secretion of glucose-regulating hormones, mainly glucagon and insulin, which are released in distinct pulses. In order to observe pulsatile insulin secretion from the β-cells within the islets, the cellular responses must be synchronized. It is now well established that gap junctions provide the electrical nearest-neighbor coupling that allows excitation waves to spread across islets to synchronize the β-cell population. Surprisingly, functional coupling analysis of calcium responses in β-cells shows small-world properties, i.e., a high degree of local coupling with a few long-range "short-cut" connections that reduce the average path-length greatly. Here, we investigate how such long-range functional coupling can appear as a result of heterogeneity, nearest-neighbor coupling, and wave propagation. Heterogeneity is also able to explain a set of experimentally observed synchronization and wave properties without introducing all-or-none cell coupling and percolation theory. Our theoretical results highlight how local biological coupling can give rise to functional small-world properties via heterogeneity and wave propagation.

  14. The roles of nearest neighbor methods in imputing missing data in forest inventory and monitoring databases

    Treesearch

    Bianca N. I. Eskelson; Hailemariam Temesgen; Valerie Lemay; Tara M. Barrett; Nicholas L. Crookston; Andrew T. Hudak

    2009-01-01

    Almost universally, forest inventory and monitoring databases are incomplete, ranging from missing data for only a few records and a few variables, common for small land areas, to missing data for many observations and many variables, common for large land areas. For a wide variety of applications, nearest neighbor (NN) imputation methods have been developed to fill in...

  15. Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

    Treesearch

    Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

    2009-01-01

    Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....

  16. Kinetic Models for Topological Nearest-Neighbor Interactions

    NASA Astrophysics Data System (ADS)

    Blanchet, Adrien; Degond, Pierre

    2017-12-01

    We consider systems of agents interacting through topological interactions. These have been shown to play an important part in animal and human behavior. Precisely, the system consists of a finite number of particles characterized by their positions and velocities. At random times a randomly chosen particle, the follower, adopts the velocity of its closest neighbor, the leader. We study the limit of a system size going to infinity and, under the assumption of propagation of chaos, show that the limit kinetic equation is a non-standard spatial diffusion equation for the particle distribution function. We also study the case wherein the particles interact with their K closest neighbors and show that the corresponding kinetic equation is the same. Finally, we prove that these models can be seen as a singular limit of the smooth rank-based model previously studied in Blanchet and Degond (J Stat Phys 163:41-60, 2016). The proofs are based on a combinatorial interpretation of the rank as well as some concentration of measure arguments.

  17. nth-Nearest-neighbor distribution functions of an interacting fluid from the pair correlation function: a hierarchical approach.

    PubMed

    Bhattacharjee, Biplab

    2003-04-01

    The paper presents a general formalism for the nth-nearest-neighbor distribution (NND) of identical interacting particles in a fluid confined in a nu-dimensional space. The nth-NND functions, W(n,r) (for n=1,2,3, em leader) in a fluid are obtained hierarchically in terms of the pair correlation function and W(n-1,r) alone. The radial distribution function (RDF) profiles obtained from the molecular dynamics (MD) simulation of Lennard-Jones (LJ) fluid is used to illustrate the results. It is demonstrated that the collective structural information contained in the maxima and minima of the RDF profiles being resolved in terms of individual NND functions may provide more insights about the microscopic neighborhood structure around a reference particle in a fluid. Representative comparison between the results obtained from the formalism and the MD simulation data shows good agreement. Apart from the quantities such as nth-NND functions and nth-nearest-neighbor distances, the average neighbor population number is defined. These quantities are evaluated for the LJ model system and interesting density dependence of the microscopic neighborhood shell structures are discussed in terms of them. The relevance of the NND functions in various phenomena is also pointed out.

  18. nth-nearest-neighbor distribution functions of an interacting fluid from the pair correlation function: A hierarchical approach

    NASA Astrophysics Data System (ADS)

    Bhattacharjee, Biplab

    2003-04-01

    The paper presents a general formalism for the nth-nearest-neighbor distribution (NND) of identical interacting particles in a fluid confined in a ν-dimensional space. The nth-NND functions, W(n,r¯) (for n=1,2,3,…) in a fluid are obtained hierarchically in terms of the pair correlation function and W(n-1,r¯) alone. The radial distribution function (RDF) profiles obtained from the molecular dynamics (MD) simulation of Lennard-Jones (LJ) fluid is used to illustrate the results. It is demonstrated that the collective structural information contained in the maxima and minima of the RDF profiles being resolved in terms of individual NND functions may provide more insights about the microscopic neighborhood structure around a reference particle in a fluid. Representative comparison between the results obtained from the formalism and the MD simulation data shows good agreement. Apart from the quantities such as nth-NND functions and nth-nearest-neighbor distances, the average neighbor population number is defined. These quantities are evaluated for the LJ model system and interesting density dependence of the microscopic neighborhood shell structures are discussed in terms of them. The relevance of the NND functions in various phenomena is also pointed out.

  19. Predicting Drug-Target Interactions for New Drug Compounds Using a Weighted Nearest Neighbor Profile.

    PubMed

    van Laarhoven, Twan; Marchiori, Elena

    2013-01-01

    In silico discovery of interactions between drug compounds and target proteins is of core importance for improving the efficiency of the laborious and costly experimental determination of drug-target interaction. Drug-target interaction data are available for many classes of pharmaceutically useful target proteins including enzymes, ion channels, GPCRs and nuclear receptors. However, current drug-target interaction databases contain a small number of drug-target pairs which are experimentally validated interactions. In particular, for some drug compounds (or targets) there is no available interaction. This motivates the need for developing methods that predict interacting pairs with high accuracy also for these 'new' drug compounds (or targets). We show that a simple weighted nearest neighbor procedure is highly effective for this task. We integrate this procedure into a recent machine learning method for drug-target interaction we developed in previous work. Results of experiments indicate that the resulting method predicts true interactions with high accuracy also for new drug compounds and achieves results comparable or better than those of recent state-of-the-art algorithms. Software is publicly available at http://cs.ru.nl/~tvanlaarhoven/drugtarget2013/.

  20. A Novel Quantum Solution to Privacy-Preserving Nearest Neighbor Query in Location-Based Services

    NASA Astrophysics Data System (ADS)

    Luo, Zhen-yu; Shi, Run-hua; Xu, Min; Zhang, Shun

    2018-04-01

    We present a cheating-sensitive quantum protocol for Privacy-Preserving Nearest Neighbor Query based on Oblivious Quantum Key Distribution and Quantum Encryption. Compared with the classical related protocols, our proposed protocol has higher security, because the security of our protocol is based on basic physical principles of quantum mechanics, instead of difficulty assumptions. Especially, our protocol takes single photons as quantum resources and only needs to perform single-photon projective measurement. Therefore, it is feasible to implement this protocol with the present technologies.

  1. Empirical Mode Decomposition and k-Nearest Embedding Vectors for Timely Analyses of Antibiotic Resistance Trends

    PubMed Central

    Teodoro, Douglas; Lovis, Christian

    2013-01-01

    Background Antibiotic resistance is a major worldwide public health concern. In clinical settings, timely antibiotic resistance information is key for care providers as it allows appropriate targeted treatment or improved empirical treatment when the specific results of the patient are not yet available. Objective To improve antibiotic resistance trend analysis algorithms by building a novel, fully data-driven forecasting method from the combination of trend extraction and machine learning models for enhanced biosurveillance systems. Methods We investigate a robust model for extraction and forecasting of antibiotic resistance trends using a decade of microbiology data. Our method consists of breaking down the resistance time series into independent oscillatory components via the empirical mode decomposition technique. The resulting waveforms describing intrinsic resistance trends serve as the input for the forecasting algorithm. The algorithm applies the delay coordinate embedding theorem together with the k-nearest neighbor framework to project mappings from past events into the future dimension and estimate the resistance levels. Results The algorithms that decompose the resistance time series and filter out high frequency components showed statistically significant performance improvements in comparison with a benchmark random walk model. We present further qualitative use-cases of antibiotic resistance trend extraction, where empirical mode decomposition was applied to highlight the specificities of the resistance trends. Conclusion The decomposition of the raw signal was found not only to yield valuable insight into the resistance evolution, but also to produce novel models of resistance forecasters with boosted prediction performance, which could be utilized as a complementary method in the analysis of antibiotic resistance trends. PMID:23637796

  2. Second-Nearest-Neighbor Effects upon N NMR Shieldings in Models for Solid Si 3N 4and C 3N 4

    NASA Astrophysics Data System (ADS)

    Tossell, J. A.

    1997-07-01

    NMR shifts are generally determined mainly by the nearest-neighbor environment of an atom, with fairly small changes in the shift arising from differences in the second-nearest-neighbor environment. Previous calculations on the (SiH3)3N molecule used as a model for the local environment of N in crystalline α- and β-Si3N4gave N NMR shieldings much larger than those measured in the solids and gave the wrong order for the shifts of the inequivalent N sites (e.g., N1 and N2 in β-Si3N4). We have now calculated the N NMR shieldings in larger molecular models for the N2 site of β-Si3N4and have found that the N2 shielding is greatly reduced when additional N1 atoms (second-nearest-neighbors to the central N2) are included. The calculated N2 shieldings (using the GIAO method with the 6-31G* basis set and 6-31G* SCF optimized geometries) are 288.1, 244.7, and 206.0 ppm for the molecules (SiH3)3N, Si6N5H15, and Si9N9H21(central N2), respectively, while the experimental shielding of N2 in β-Si3N4is about 155 ppm. Second-nearest-neighbor effects of only slightly smaller magnitude are calculated for the analog C molecules. At the same time, the effects of molecule size upon Si NMR shieldings and N electric field gradients are small. The local geometries at the N2-like Ns in C6N5H15and C9N9H21are calculated to be planar, consistent with the planar local geometry recently calculated for N in crystalline C3N4using density functional theory.

  3. Classification of matrix-product ground states corresponding to one-dimensional chains of two-state sites of nearest neighbor interactions

    NASA Astrophysics Data System (ADS)

    Fatollahi, Amir H.; Khorrami, Mohammad; Shariati, Ahmad; Aghamohammadi, Amir

    2011-04-01

    A complete classification is given for one-dimensional chains with nearest-neighbor interactions having two states in each site, for which a matrix product ground state exists. The Hamiltonians and their corresponding matrix product ground states are explicitly obtained.

  4. d -wave superconductivity in the presence of nearest-neighbor Coulomb repulsion

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiang, M.; Hahner, U. R.; Schulthess, T. C.

    Dynamic cluster quantum Monte Carlo calculations for a doped two-dimensional extended Hubbard model are used to study the stability and dynamics of d-wave pairing when a nearest-neighbor Coulomb repulsion V is present in addition to the on-site Coulomb repulsion U. We find that d-wave pairing and the superconducting transition temperature Tc are only weakly suppressed as long as V does not exceed U/2. This stability is traced to the strongly retarded nature of pairing that allows the d-wave pairs to minimize the repulsive effect of V. When V approaches U/2, large momentum charge fluctuations are found to become important andmore » to give rise to a more rapid suppression of d-wave pairing and T c than for smaller V.« less

  5. Fast Demand Forecast of Electric Vehicle Charging Stations for Cell Phone Application

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Majidpour, Mostafa; Qiu, Charlie; Chung, Ching-Yen

    This paper describes the core cellphone application algorithm which has been implemented for the prediction of energy consumption at Electric Vehicle (EV) Charging Stations at UCLA. For this interactive user application, the total time of accessing database, processing the data and making the prediction, needs to be within a few seconds. We analyze four relatively fast Machine Learning based time series prediction algorithms for our prediction engine: Historical Average, kNearest Neighbor, Weighted k-Nearest Neighbor, and Lazy Learning. The Nearest Neighbor algorithm (k Nearest Neighbor with k=1) shows better performance and is selected to be the prediction algorithm implemented for themore » cellphone application. Two applications have been designed on top of the prediction algorithm: one predicts the expected available energy at the station and the other one predicts the expected charging finishing time. The total time, including accessing the database, data processing, and prediction is about one second for both applications.« less

  6. Effect of nearest-neighbor ions on excited ionic states, emission spectra, and line profiles in hot and dense plasmas

    NASA Technical Reports Server (NTRS)

    Salzmann, D.; Stein, J.; Goldberg, I. B.; Pratt, R. H.

    1991-01-01

    The effect of the cylindrical symmetry imposed by the nearest-neighbor ions on the ionic levels and the emission spectra of a Li-like Kr ion immersed in hot and dense plasmas is investigated using the Stein et al. (1989) two-centered model extended to include computations of the line profiles, shifts, and widths, as well as the energy-level mixing and the forbidden transition probabilities. It is shown that the cylindrical symmetry mixes states with different orbital quantum numbers l, particularly for highly excited states, and, thereby, gives rise to forbidden transitions in the emission spectrum. Results are obtained for the variation of the ionic level shifts and mixing coefficients with the distance to the nearest neighbor. Also obtained are representative computed spectra that show the density effects on the spectral line profiles, shifts, and widths, and the forbidden components in the spectrum.

  7. Efficient computation of k-Nearest Neighbour Graphs for large high-dimensional data sets on GPU clusters.

    PubMed

    Dashti, Ali; Komarov, Ivan; D'Souza, Roshan M

    2013-01-01

    This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible [Formula: see text]-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.

  8. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

    DTIC Science & Technology

    1999-05-17

    Experimental Results In this section, we compare kNN -mut which uses the weight vector obtained using mutual information as the fi- nal weight vector and...WAKNN against kNN , C4.5 [Qui93], RIPPER [Coh95], PEBLS [CS93], Rainbow [McC96], VSM [Low95] on several synthetic and real data sets. VSM is another k...obtained without this option. 3 C4.5 RIPPER PEBLS Rainbow kNN WAKNN Syn-1 100.0 100.0 100.0 100.0 77.3 100.0 Syn-2 67.5 69.5 62.0 50.0 66.0 68.8 Syn

  9. An improved coupled-states approximation including the nearest neighbor Coriolis couplings for diatom-diatom inelastic collision

    NASA Astrophysics Data System (ADS)

    Yang, Dongzheng; Hu, Xixi; Zhang, Dong H.; Xie, Daiqian

    2018-02-01

    Solving the time-independent close coupling equations of a diatom-diatom inelastic collision system by using the rigorous close-coupling approach is numerically difficult because of its expensive matrix manipulation. The coupled-states approximation decouples the centrifugal matrix by neglecting the important Coriolis couplings completely. In this work, a new approximation method based on the coupled-states approximation is presented and applied to time-independent quantum dynamic calculations. This approach only considers the most important Coriolis coupling with the nearest neighbors and ignores weaker Coriolis couplings with farther K channels. As a result, it reduces the computational costs without a significant loss of accuracy. Numerical tests for para-H2+ortho-H2 and para-H2+HD inelastic collision were carried out and the results showed that the improved method dramatically reduces the errors due to the neglect of the Coriolis couplings in the coupled-states approximation. This strategy should be useful in quantum dynamics of other systems.

  10. Mapping wildland fuels and forest structure for land management: a comparison of nearest neighbor imputation and other methods

    Treesearch

    Kenneth B. Pierce; Janet L. Ohmann; Michael C. Wimberly; Matthew J. Gregory; Jeremy S. Fried

    2009-01-01

    Land managers need consistent information about the geographic distribution of wildland fuels and forest structure over large areas to evaluate fire risk and plan fuel treatments. We compared spatial predictions for 12 fuel and forest structure variables across three regions in the western United States using gradient nearest neighbor (GNN) imputation, linear models (...

  11. A systematic molecular dynamics study of nearest-neighbor effects on base pair and base pair step conformations and fluctuations in B-DNA

    PubMed Central

    Lavery, Richard; Zakrzewska, Krystyna; Beveridge, David; Bishop, Thomas C.; Case, David A.; Cheatham, Thomas; Dixit, Surjit; Jayaram, B.; Lankas, Filip; Laughton, Charles; Maddocks, John H.; Michon, Alexis; Osman, Roman; Orozco, Modesto; Perez, Alberto; Singh, Tanya; Spackova, Nada; Sponer, Jiri

    2010-01-01

    It is well recognized that base sequence exerts a significant influence on the properties of DNA and plays a significant role in protein–DNA interactions vital for cellular processes. Understanding and predicting base sequence effects requires an extensive structural and dynamic dataset which is currently unavailable from experiment. A consortium of laboratories was consequently formed to obtain this information using molecular simulations. This article describes results providing information not only on all 10 unique base pair steps, but also on all possible nearest-neighbor effects on these steps. These results are derived from simulations of 50–100 ns on 39 different DNA oligomers in explicit solvent and using a physiological salt concentration. We demonstrate that the simulations are converged in terms of helical and backbone parameters. The results show that nearest-neighbor effects on base pair steps are very significant, implying that dinucleotide models are insufficient for predicting sequence-dependent behavior. Flanking base sequences can notably lead to base pair step parameters in dynamic equilibrium between two conformational sub-states. Although this study only provides limited data on next-nearest-neighbor effects, we suggest that such effects should be analyzed before attempting to predict the sequence-dependent behavior of DNA. PMID:19850719

  12. Optimal Detection Range of RFID Tag for RFID-based Positioning System Using the k-NN Algorithm.

    PubMed

    Han, Soohee; Kim, Junghwan; Park, Choung-Hwan; Yoon, Hee-Cheon; Heo, Joon

    2009-01-01

    Positioning technology to track a moving object is an important and essential component of ubiquitous computing environments and applications. An RFID-based positioning system using the k-nearest neighbor (k-NN) algorithm can determine the position of a moving reader from observed reference data. In this study, the optimal detection range of an RFID-based positioning system was determined on the principle that tag spacing can be derived from the detection range. It was assumed that reference tags without signal strength information are regularly distributed in 1-, 2- and 3-dimensional spaces. The optimal detection range was determined, through analytical and numerical approaches, to be 125% of the tag-spacing distance in 1-dimensional space. Through numerical approaches, the range was 134% in 2-dimensional space, 143% in 3-dimensional space.

  13. Identification of jasmine flower (Jasminum sp.) based on the shape of the flower using sobel edge and k-nearest neighbour

    NASA Astrophysics Data System (ADS)

    Qur’ania, A.; Sarinah, I.

    2018-03-01

    People often wrong in knowing the type of jasmine by just looking at the white color of the jasmine, while not all white flowers including jasmine and not all jasmine flowers have white. There is a jasmine that is yellow and there is a jasmine that is white and purple.The aim of this research is to identify Jasmine flower (Jasminum sp.) based on the shape of the flower image-based using Sobel edge detection and k-Nearest Neighbor. Edge detection is used to detect the type of flower from the flower shape. Edge detection aims to improve the appearance of the border of a digital image. While k-Nearest Neighbor method is used to classify the classification of test objects into classes that have neighbouring properties closest to the object of training. The data used in this study are three types of jasmine namely jasmine white (Jasminum sambac), jasmine gambir (Jasminum pubescens), and jasmine japan (Pseuderanthemum reticulatum). Testing of jasmine flower image resized 50 × 50 pixels, 100 × 100 pixels, 150 × 150 pixels yields an accuracy of 84%. Tests on distance values of the k-NN method with spacing 5, 10 and 15 resulted in different accuracy rates for 5 and 10 closest distances yielding the same accuracy rate of 84%, for the 15 shortest distance resulted in a small accuracy of 65.2%.

  14. A Coupled k-Nearest Neighbor Algorithm for Multi-Label Classification

    DTIC Science & Technology

    2015-05-22

    classification, an image may contain several concepts simultaneously, such as beach, sunset and kangaroo . Such tasks are usually denoted as multi-label...informatics, a gene can belong to both metabolism and transcription classes; and in music categorization, a song may labeled as Mozart and sad. In the

  15. Ground state of a Heisenberg chain with next-nearest-neighbor bond alternation

    NASA Astrophysics Data System (ADS)

    Capriotti, Luca; Becca, Federico; Sorella, Sandro; Parola, Alberto

    2003-05-01

    We investigate the ground-state properties of the spin-half J1-J2 Heisenberg chain with a next-nearest-neighbor spin-Peierls dimerization using conformal field theory and Lanczos exact diagonalizations. In agreement with the results of a recent bosonization analysis by Sarkar and Sen [Phys. Rev. B 65, 172408 (2002)], we find that for small frustration (J2/J1) the system is in a Luttinger spin-fluid phase, with gapless excitations, and a finite spin-wave velocity. In the regime of strong frustration the ground state is spontaneously dimerized and the bond alternation reduces the triplet gap, leading to a slight enhancement of the critical point separating the Luttinger phase from the gapped one. An accurate determination of the phase boundary is obtained numerically from the study of the excitation spectrum.

  16. Multi-color space threshold segmentation and self-learning k-NN algorithm for surge test EUT status identification

    NASA Astrophysics Data System (ADS)

    Huang, Jian; Liu, Gui-xiong

    2016-09-01

    The identification of targets varies in different surge tests. A multi-color space threshold segmentation and self-learning k-nearest neighbor algorithm ( k-NN) for equipment under test status identification was proposed after using feature matching to identify equipment status had to train new patterns every time before testing. First, color space (L*a*b*, hue saturation lightness (HSL), hue saturation value (HSV)) to segment was selected according to the high luminance points ratio and white luminance points ratio of the image. Second, the unknown class sample S r was classified by the k-NN algorithm with training set T z according to the feature vector, which was formed from number of pixels, eccentricity ratio, compactness ratio, and Euler's numbers. Last, while the classification confidence coefficient equaled k, made S r as one sample of pre-training set T z '. The training set T z increased to T z+1 by T z ' if T z ' was saturated. In nine series of illuminant, indicator light, screen, and disturbances samples (a total of 21600 frames), the algorithm had a 98.65%identification accuracy, also selected five groups of samples to enlarge the training set from T 0 to T 5 by itself.

  17. The classification of hunger behaviour of Lates Calcarifer through the integration of image processing technique and k-Nearest Neighbour learning algorithm

    NASA Astrophysics Data System (ADS)

    Taha, Z.; Razman, M. A. M.; Ghani, A. S. Abdul; Majeed, A. P. P. Abdul; Musa, R. M.; Adnan, F. A.; Sallehudin, M. F.; Mukai, Y.

    2018-04-01

    Fish Hunger behaviour is essential in determining the fish feeding routine, particularly for fish farmers. The inability to provide accurate feeding routines (under-feeding or over-feeding) may lead the death of the fish and consequently inhibits the quantity of the fish produced. Moreover, the excessive food that is not consumed by the fish will be dissolved in the water and accordingly reduce the water quality through the reduction of oxygen quantity. This problem also leads the death of the fish or even spur fish diseases. In the present study, a correlation of Barramundi fish-school behaviour with hunger condition through the hybrid data integration of image processing technique is established. The behaviour is clustered with respect to the position of the school size as well as the school density of the fish before feeding, during feeding and after feeding. The clustered fish behaviour is then classified through k-Nearest Neighbour (k-NN) learning algorithm. Three different variations of the algorithm namely cosine, cubic and weighted are assessed on its ability to classify the aforementioned fish hunger behaviour. It was found from the study that the weighted k-NN variation provides the best classification with an accuracy of 86.5%. Therefore, it could be concluded that the proposed integration technique may assist fish farmers in ascertaining fish feeding routine.

  18. Nearest-Neighbor Distances and Aggregative Effects in Turbulence

    NASA Astrophysics Data System (ADS)

    Lanerolle, Lyon W. J.; Rothschild, B. J.; Yeung, P. K.

    2000-11-01

    The dispersive nature of turbulence which causes fluid elements to move apart (on average) is well known. Here we study another facet of turbulent mixing relevant to marine population dynamics - on how small organisms (approximated by fluid particles) are brought close to each other and allowed to interact. The crucial role played by the small scales in this process allows us to use direct numerical simulations of stationary isotropic turbulence, here with Taylor-scale Reynolds numbers (R_λ) from 38 to 91. We study the evolution of the Nearest-Neighbor Distances (NND) for collections of fluid particles initially located randomly in space satisfying Poisson-type distributions with mean values from 0.5 to 2.0 Kolmogorov length scales. Our results show that as particles begin to disperse on average, some also begin to aggregate in space. In particular, we find that (i) a significant proportion of particles are closer to each other than if their NNDs were randomly distributed, (ii) aggregative effects become stronger with R_λ, and (iii) although the mean value of NND grows monotonically with time in Kolmogorov variables, the growth rates are slower at higher R_λ. These results may assist in explaining the ``patchiness'' in plankton distributions observed in biological oceanography. Further details are given in B. J. Rothschild et al., The Biophysical Interpretation of Spatial Effects of Small-scale Turbulent Flow in the Ocean (paper in prep.).

  19. Multiobjective immune algorithm with nondominated neighbor-based selection.

    PubMed

    Gong, Maoguo; Jiao, Licheng; Du, Haifeng; Bo, Liefeng

    2008-01-01

    Abstract Nondominated Neighbor Immune Algorithm (NNIA) is proposed for multiobjective optimization by using a novel nondominated neighbor-based selection technique, an immune inspired operator, two heuristic search operators, and elitism. The unique selection technique of NNIA only selects minority isolated nondominated individuals in the population. The selected individuals are then cloned proportionally to their crowding-distance values before heuristic search. By using the nondominated neighbor-based selection and proportional cloning, NNIA pays more attention to the less-crowded regions of the current trade-off front. We compare NNIA with NSGA-II, SPEA2, PESA-II, and MISA in solving five DTLZ problems, five ZDT problems, and three low-dimensional problems. The statistical analysis based on three performance metrics including the coverage of two sets, the convergence metric, and the spacing, show that the unique selection method is effective, and NNIA is an effective algorithm for solving multiobjective optimization problems. The empirical study on NNIA's scalability with respect to the number of objectives shows that the new algorithm scales well along the number of objectives.

  20. Nearest Neighbor Averaging and its Effect on the Critical Level and Minimum Detectable Concentration for Scanning Radiological Survey Instruments that Perform Facility Release Surveys.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fournier, Sean Donovan; Beall, Patrick S; Miller, Mark L

    2014-08-01

    Through the SNL New Mexico Small Business Assistance (NMSBA) program, several Sandia engineers worked with the Environmental Restoration Group (ERG) Inc. to verify and validate a novel algorithm used to determine the scanning Critical Level (L c ) and Minimum Detectable Concentration (MDC) (or Minimum Detectable Areal Activity) for the 102F scanning system. Through the use of Monte Carlo statistical simulations the algorithm mathematically demonstrates accuracy in determining the L c and MDC when a nearest-neighbor averaging (NNA) technique was used. To empirically validate this approach, SNL prepared several spiked sources and ran a test with the ERG 102F instrumentmore » on a bare concrete floor known to have no radiological contamination other than background naturally occurring radioactive material (NORM). The tests conclude that the NNA technique increases the sensitivity (decreases the L c and MDC) for high-density data maps that are obtained by scanning radiological survey instruments.« less

  1. Impact of nearest-neighbor repulsion on superconducting pairing in 2D extended Hubbard model

    NASA Astrophysics Data System (ADS)

    Jiang, Mi; Hahner, U. R.; Maier, T. A.; Schulthess, T. C.

    Using dynamical cluster approximation (DCA) with an continuous-time QMC solver for the two-dimensional extended Hubbard model, we studied the impact of nearest-neighbor Coulomb repulsion V on d-wave superconducting pairing dynamics. By solving Bethe-Salpeter equation for particle-particle superconducting channel, we focused on the evolution of leading d-wave eigenvalue with V and the momentum and frequency dependence of the corresponding eigenfunction. The comparison with the evolution of both spin and charge susceptibilities versus V is presented showing the competition between spin and charge fluctuations. This research received generous support from the MARVEL NCCR and used resources of the Swiss National Supercomputing Center, as well as (INCITE) program in Oak Ridge Leadership Computing Facility.

  2. A multilevel-skin neighbor list algorithm for molecular dynamics simulation

    NASA Astrophysics Data System (ADS)

    Zhang, Chenglong; Zhao, Mingcan; Hou, Chaofeng; Ge, Wei

    2018-01-01

    Searching of the interaction pairs and organization of the interaction processes are important steps in molecular dynamics (MD) algorithms and are critical to the overall efficiency of the simulation. Neighbor lists are widely used for these steps, where thicker skin can reduce the frequency of list updating but is discounted by more computation in distance check for the particle pairs. In this paper, we propose a new neighbor-list-based algorithm with a precisely designed multilevel skin which can reduce unnecessary computation on inter-particle distances. The performance advantages over traditional methods are then analyzed against the main simulation parameters on Intel CPUs and MICs (many integrated cores), and are clearly demonstrated. The algorithm can be generalized for various discrete simulations using neighbor lists.

  3. A comparison of 12 algorithms for matching on the propensity score.

    PubMed

    Austin, Peter C

    2014-03-15

    Propensity-score matching is increasingly being used to reduce the confounding that can occur in observational studies examining the effects of treatments or interventions on outcomes. We used Monte Carlo simulations to examine the following algorithms for forming matched pairs of treated and untreated subjects: optimal matching, greedy nearest neighbor matching without replacement, and greedy nearest neighbor matching without replacement within specified caliper widths. For each of the latter two algorithms, we examined four different sub-algorithms defined by the order in which treated subjects were selected for matching to an untreated subject: lowest to highest propensity score, highest to lowest propensity score, best match first, and random order. We also examined matching with replacement. We found that (i) nearest neighbor matching induced the same balance in baseline covariates as did optimal matching; (ii) when at least some of the covariates were continuous, caliper matching tended to induce balance on baseline covariates that was at least as good as the other algorithms; (iii) caliper matching tended to result in estimates of treatment effect with less bias compared with optimal and nearest neighbor matching; (iv) optimal and nearest neighbor matching resulted in estimates of treatment effect with negligibly less variability than did caliper matching; (v) caliper matching had amongst the best performance when assessed using mean squared error; (vi) the order in which treated subjects were selected for matching had at most a modest effect on estimation; and (vii) matching with replacement did not have superior performance compared with caliper matching without replacement. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.

  4. A comparison of 12 algorithms for matching on the propensity score

    PubMed Central

    Austin, Peter C

    2014-01-01

    Propensity-score matching is increasingly being used to reduce the confounding that can occur in observational studies examining the effects of treatments or interventions on outcomes. We used Monte Carlo simulations to examine the following algorithms for forming matched pairs of treated and untreated subjects: optimal matching, greedy nearest neighbor matching without replacement, and greedy nearest neighbor matching without replacement within specified caliper widths. For each of the latter two algorithms, we examined four different sub-algorithms defined by the order in which treated subjects were selected for matching to an untreated subject: lowest to highest propensity score, highest to lowest propensity score, best match first, and random order. We also examined matching with replacement. We found that (i) nearest neighbor matching induced the same balance in baseline covariates as did optimal matching; (ii) when at least some of the covariates were continuous, caliper matching tended to induce balance on baseline covariates that was at least as good as the other algorithms; (iii) caliper matching tended to result in estimates of treatment effect with less bias compared with optimal and nearest neighbor matching; (iv) optimal and nearest neighbor matching resulted in estimates of treatment effect with negligibly less variability than did caliper matching; (v) caliper matching had amongst the best performance when assessed using mean squared error; (vi) the order in which treated subjects were selected for matching had at most a modest effect on estimation; and (vii) matching with replacement did not have superior performance compared with caliper matching without replacement. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24123228

  5. Predictive mapping of forest composition and structure with direct gradient analysis and nearest neighbor imputation in coastal Oregon, U.S.A.

    Treesearch

    Janet L. Ohmann; Matthew J. Gregory

    2002-01-01

    Spatially explicit information on the species composition and structure of forest vegetation is needed at broad spatial scales for natural resource policy analysis and ecological research. We present a method for predictive vegetation mapping that applies direct gradient analysis and nearest-neighbor imputation to ascribe detailed ground attributes of vegetation to...

  6. Spin canting in a Dy-based single-chain magnet with dominant next-nearest-neighbor antiferromagnetic interactions

    NASA Astrophysics Data System (ADS)

    Bernot, K.; Luzon, J.; Caneschi, A.; Gatteschi, D.; Sessoli, R.; Bogani, L.; Vindigni, A.; Rettori, A.; Pini, M. G.

    2009-04-01

    We investigate theoretically and experimentally the static magnetic properties of single crystals of the molecular-based single-chain magnet of formula [Dy(hfac)3NIT(C6H4OPh)]∞ comprising alternating Dy3+ and organic radicals. The magnetic molar susceptibility χM displays a strong angular variation for sample rotations around two directions perpendicular to the chain axis. A peculiar inversion between maxima and minima in the angular dependence of χM occurs on increasing temperature. Using information regarding the monomeric building block as well as an ab initio estimation of the magnetic anisotropy of the Dy3+ ion, this “anisotropy-inversion” phenomenon can be assigned to weak one-dimensional ferromagnetism along the chain axis. This indicates that antiferromagnetic next-nearest-neighbor interactions between Dy3+ ions dominate, despite the large Dy-Dy separation, over the nearest-neighbor interactions between the radicals and the Dy3+ ions. Measurements of the field dependence of the magnetization, both along and perpendicularly to the chain, and of the angular dependence of χM in a strong magnetic field confirm such an interpretation. Transfer-matrix simulations of the experimental measurements are performed using a classical one-dimensional spin model with antiferromagnetic Heisenberg exchange interaction and noncollinear uniaxial single-ion anisotropies favoring a canted antiferromagnetic spin arrangement, with a net magnetic moment along the chain axis. The fine agreement obtained with experimental data provides estimates of the Hamiltonian parameters, essential for further study of the dynamics of rare-earth-based molecular chains.

  7. Reentrant behavior in the nearest-neighbor Ising antiferromagnet in a magnetic field

    NASA Astrophysics Data System (ADS)

    Neto, Minos A.; de Sousa, J. Ricardo

    2004-12-01

    Motived by the H-T phase diagram in the bcc Ising antiferromagnetic with nearest-neighbor interactions obtained by Monte Carlo simulation [Landau, Phys. Rev. B 16, 4164 (1977)] that shows a reentrant behavior at low temperature, with two critical temperatures in magnetic field about 2% greater than the critical value Hc=8J , we apply the effective field renormalization group (EFRG) approach in this model on three-dimensional lattices (simple cubic-sc and body centered cubic-bcc). We find that the critical curve TN(H) exhibits a maximum point around of H≃Hc only in the bcc lattice case. We also discuss the critical behavior by the effective field theory in clusters with one (EFT-1) and two (EFT-2) spins, and a reentrant behavior is observed for the sc and bcc lattices. We have compared our results of EFRG in the bcc lattice with Monte Carlo and series expansion, and we observe a good accordance between the methods.

  8. Nearest neighbor imputation using spatial–temporal correlations in wireless sensor networks

    PubMed Central

    Li, YuanYuan; Parker, Lynne E.

    2016-01-01

    Missing data is common in Wireless Sensor Networks (WSNs), especially with multi-hop communications. There are many reasons for this phenomenon, such as unstable wireless communications, synchronization issues, and unreliable sensors. Unfortunately, missing data creates a number of problems for WSNs. First, since most sensor nodes in the network are battery-powered, it is too expensive to have the nodes retransmit missing data across the network. Data re-transmission may also cause time delays when detecting abnormal changes in an environment. Furthermore, localized reasoning techniques on sensor nodes (such as machine learning algorithms to classify states of the environment) are generally not robust enough to handle missing data. Since sensor data collected by a WSN is generally correlated in time and space, we illustrate how replacing missing sensor values with spatially and temporally correlated sensor values can significantly improve the network’s performance. However, our studies show that it is important to determine which nodes are spatially and temporally correlated with each other. Simple techniques based on Euclidean distance are not sufficient for complex environmental deployments. Thus, we have developed a novel Nearest Neighbor (NN) imputation method that estimates missing data in WSNs by learning spatial and temporal correlations between sensor nodes. To improve the search time, we utilize a kd-tree data structure, which is a non-parametric, data-driven binary search tree. Instead of using traditional mean and variance of each dimension for kd-tree construction, and Euclidean distance for kd-tree search, we use weighted variances and weighted Euclidean distances based on measured percentages of missing data. We have evaluated this approach through experiments on sensor data from a volcano dataset collected by a network of Crossbow motes, as well as experiments using sensor data from a highway traffic monitoring application. Our experimental results

  9. Heat perturbation spreading in the Fermi-Pasta-Ulam-β system with next-nearest-neighbor coupling: Competition between phonon dispersion and nonlinearity

    NASA Astrophysics Data System (ADS)

    Xiong, Daxing

    2017-06-01

    We employ the heat perturbation correlation function to study thermal transport in the one-dimensional Fermi-Pasta-Ulam-β lattice with both nearest-neighbor and next-nearest-neighbor couplings. We find that such a system bears a peculiar phonon dispersion relation, and thus there exists a competition between phonon dispersion and nonlinearity that can strongly affect the heat correlation function's shape and scaling property. Specifically, for small and large anharmoncities, the scaling laws are ballistic and superdiffusive types, respectively, which are in good agreement with the recent theoretical predictions; whereas in the intermediate range of the nonlinearity, we observe an unusual multiscaling property characterized by a nonmonotonic delocalization process of the central peak of the heat correlation function. To understand these multiscaling laws, we also examine the momentum perturbation correlation function and find a transition process with the same turning point of the anharmonicity as that shown in the heat correlation function. This suggests coupling between the momentum transport and the heat transport, in agreement with the theoretical arguments of mode cascade theory.

  10. Novel approach for image skeleton and distance transformation parallel algorithms

    NASA Astrophysics Data System (ADS)

    Qing, Kent P.; Means, Robert W.

    1994-05-01

    Image Understanding is more important in medical imaging than ever, particularly where real-time automatic inspection, screening and classification systems are installed. Skeleton and distance transformations are among the common operations that extract useful information from binary images and aid in Image Understanding. The distance transformation describes the objects in an image by labeling every pixel in each object with the distance to its nearest boundary. The skeleton algorithm starts from the distance transformation and finds the set of pixels that have a locally maximum label. The distance algorithm has to scan the entire image several times depending on the object width. For each pixel, the algorithm must access the neighboring pixels and find the maximum distance from the nearest boundary. It is a computational and memory access intensive procedure. In this paper, we propose a novel parallel approach to the distance transform and skeleton algorithms using the latest VLSI high- speed convolutional chips such as HNC's ViP. The algorithm speed is dependent on the object's width and takes (k + [(k-1)/3]) * 7 milliseconds for a 512 X 512 image with k being the maximum distance of the largest object. All objects in the image will be skeletonized at the same time in parallel.

  11. pKa shifting in double-stranded RNA is highly dependent upon nearest neighbors and bulge positioning.

    PubMed

    Wilcox, Jennifer L; Bevilacqua, Philip C

    2013-10-22

    Shifting of pKa's in RNA is important for many biological processes; however, the driving forces responsible for shifting are not well understood. Herein, we determine how structural environments surrounding protonated bases affect pKa shifting in double-stranded RNA (dsRNA). Using (31)P NMR, we determined the pKa of the adenine in an A(+)·C base pair in various sequence and structural environments. We found a significant dependence of pKa on the base pairing strength of nearest neighbors and the location of a nearby bulge. Increasing nearest neighbor base pairing strength shifted the pKa of the adenine in an A(+)·C base pair higher by an additional 1.6 pKa units, from 6.5 to 8.1, which is well above neutrality. The addition of a bulge two base pairs away from a protonated A(+)·C base pair shifted the pKa by only ~0.5 units less than a perfectly base paired hairpin; however, positioning the bulge just one base pair away from the A(+)·C base pair prohibited formation of the protonated base pair as well as several flanking base pairs. Comparison of data collected at 25 °C and 100 mM KCl to biological temperature and Mg(2+) concentration revealed only slight pKa changes, suggesting that similar sequence contexts in biological systems have the potential to be protonated at biological pH. We present a general model to aid in the determination of the roles protonated bases may play in various dsRNA-mediated processes including ADAR editing, miRNA processing, programmed ribosomal frameshifting, and general acid-base catalysis in ribozymes.

  12. Nearest-neighbor guided evaluation of data reliability and its applications.

    PubMed

    Boongoen, Tossapon; Shen, Qiang

    2010-12-01

    The intuition of data reliability has recently been incorporated into the main stream of research on ordered weighted averaging (OWA) operators. Instead of relying on human-guided variables, the aggregation behavior is determined in accordance with the underlying characteristics of the data being aggregated. Data-oriented operators such as the dependent OWA (DOWA) utilize centralized data structures to generate reliable weights, however. Despite their simplicity, the approach taken by these operators neglects entirely any local data structure that represents a strong agreement or consensus. To address this issue, the cluster-based OWA (Clus-DOWA) operator has been proposed. It employs a cluster-based reliability measure that is effective to differentiate the accountability of different input arguments. Yet, its actual application is constrained by the high computational requirement. This paper presents a more efficient nearest-neighbor-based reliability assessment for which an expensive clustering process is not required. The proposed measure can be perceived as a stress function, from which the OWA weights and associated decision-support explanations can be generated. To illustrate the potential of this measure, it is applied to both the problem of information aggregation for alias detection and the problem of unsupervised feature selection (in which unreliable features are excluded from an actual learning process). Experimental results demonstrate that these techniques usually outperform their conventional state-of-the-art counterparts.

  13. Reducing the Time Requirement of k-Means Algorithm

    PubMed Central

    Osamor, Victor Chukwudi; Adebiyi, Ezekiel Femi; Oyelade, Jelilli Olarenwaju; Doumbia, Seydou

    2012-01-01

    Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in d-dimensional space Rd and an integer k. The problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We provided the correctness proof for this algorithm. Results obtained from testing the algorithm on three biological data and six non-biological data (three of these data are real, while the other three are simulated) also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of a known structure using the Hubert-Arabie Adjusted Rand index (ARIHA). We found that when k is close to d, the quality is good (ARIHA>0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARIHA>0.9). In this paper, emphases are on the reduction of the time requirement of the k-means algorithm and its application to microarray data due to the desire to create a tool for clustering and malaria research. However, the new clustering algorithm can be used for other clustering needs as long as an appropriate measure of distance between the centroids and the members is used. This has been demonstrated in this work on six non-biological data. PMID:23239974

  14. Reducing the time requirement of k-means algorithm.

    PubMed

    Osamor, Victor Chukwudi; Adebiyi, Ezekiel Femi; Oyelade, Jelilli Olarenwaju; Doumbia, Seydou

    2012-01-01

    Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in d-dimensional space R(d) and an integer k. The problem is to determine a set of k points in R(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We provided the correctness proof for this algorithm. Results obtained from testing the algorithm on three biological data and six non-biological data (three of these data are real, while the other three are simulated) also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of a known structure using the Hubert-Arabie Adjusted Rand index (ARI(HA)). We found that when k is close to d, the quality is good (ARI(HA)>0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARI(HA)>0.9). In this paper, emphases are on the reduction of the time requirement of the k-means algorithm and its application to microarray data due to the desire to create a tool for clustering and malaria research. However, the new clustering algorithm can be used for other clustering needs as long as an appropriate measure of distance between the centroids and the members is used. This has been demonstrated in this work on six non-biological data.

  15. A nearest neighbor approach for automated transporter prediction and categorization from protein sequences.

    PubMed

    Li, Haiquan; Dai, Xinbin; Zhao, Xuechun

    2008-05-01

    Membrane transport proteins play a crucial role in the import and export of ions, small molecules or macromolecules across biological membranes. Currently, there are a limited number of published computational tools which enable the systematic discovery and categorization of transporters prior to costly experimental validation. To approach this problem, we utilized a nearest neighbor method which seamlessly integrates homologous search and topological analysis into a machine-learning framework. Our approach satisfactorily distinguished 484 transporter families in the Transporter Classification Database, a curated and representative database for transporters. A five-fold cross-validation on the database achieved a positive classification rate of 72.3% on average. Furthermore, this method successfully detected transporters in seven model and four non-model organisms, ranging from archaean to mammalian species. A preliminary literature-based validation has cross-validated 65.8% of our predictions on the 11 organisms, including 55.9% of our predictions overlapping with 83.6% of the predicted transporters in TransportDB.

  16. Exact density functional theory for ideal polymer fluids with nearest neighbor bonding constraints.

    PubMed

    Woodward, Clifford E; Forsman, Jan

    2008-08-07

    We present a new density functional theory of ideal polymer fluids, assuming nearest-neighbor bonding constraints. The free energy functional is expressed in terms of end site densities of chain segments and thus has a simpler mathematical structure than previously used expressions using multipoint distributions. This work is based on a formalism proposed by Tripathi and Chapman [Phys. Rev. Lett. 94, 087801 (2005)]. Those authors obtain an approximate free energy functional for ideal polymers in terms of monomer site densities. Calculations on both repulsive and attractive surfaces show that their theory is reasonably accurate in some cases, but does differ significantly from the exact result for longer polymers with attractive surfaces. We suggest that segment end site densities, rather than monomer site densities, are the preferred choice of "site functions" for expressing the free energy functional of polymer fluids. We illustrate the application of our theory to derive an expression for the free energy of an ideal fluid of infinitely long polymers.

  17. [Classification of Children with Attention-Deficit/Hyperactivity Disorder and Typically Developing Children Based on Electroencephalogram Principal Component Analysis and k-Nearest Neighbor].

    PubMed

    Yang, Jiaojiao; Guo, Qian; Li, Wenjie; Wang, Suhong; Zou, Ling

    2016-04-01

    This paper aims to assist the individual clinical diagnosis of children with attention-deficit/hyperactivity disorder using electroencephalogram signal detection method.Firstly,in our experiments,we obtained and studied the electroencephalogram signals from fourteen attention-deficit/hyperactivity disorder children and sixteen typically developing children during the classic interference control task of Simon-spatial Stroop,and we completed electroencephalogram data preprocessing including filtering,segmentation,removal of artifacts and so on.Secondly,we selected the subset electroencephalogram electrodes using principal component analysis(PCA)method,and we collected the common channels of the optimal electrodes which occurrence rates were more than 90%in each kind of stimulation.We then extracted the latency(200~450ms)mean amplitude features of the common electrodes.Finally,we used the k-nearest neighbor(KNN)classifier based on Euclidean distance and the support vector machine(SVM)classifier based on radial basis kernel function to classify.From the experiment,at the same kind of interference control task,the attention-deficit/hyperactivity disorder children showed lower correct response rates and longer reaction time.The N2 emerged in prefrontal cortex while P2 presented in the inferior parietal area when all kinds of stimuli demonstrated.Meanwhile,the children with attention-deficit/hyperactivity disorder exhibited markedly reduced N2 and P2amplitude compared to typically developing children.KNN resulted in better classification accuracy than SVM classifier,and the best classification rate was 89.29%in StI task.The results showed that the electroencephalogram signals were different in the brain regions of prefrontal cortex and inferior parietal cortex between attention-deficit/hyperactivity disorder and typically developing children during the interference control task,which provided a scientific basis for the clinical diagnosis of attention

  18. Microscopic theory of the nearest-neighbor valence bond sector of the spin-1/2 kagome antiferromagnet

    NASA Astrophysics Data System (ADS)

    Ralko, Arnaud; Mila, Frédéric; Rousochatzakis, Ioannis

    2018-03-01

    The spin-1/2 Heisenberg model on the kagome lattice, which is closely realized in layered Mott insulators such as ZnCu3(OH) 6Cl2 , is one of the oldest and most enigmatic spin-1/2 lattice models. While the numerical evidence has accumulated in favor of a quantum spin liquid, the debate is still open as to whether it is a Z2 spin liquid with very short-range correlations (some kind of resonating valence bond spin liquid), or an algebraic spin liquid with power-law correlations. To address this issue, we have pushed the program started by Rokhsar and Kivelson in their derivation of the effective quantum dimer model description of Heisenberg models to unprecedented accuracy for the spin-1/2 kagome, by including all the most important virtual singlet contributions on top of the orthogonalization of the nearest-neighbor valence bond singlet basis. Quite remarkably, the resulting picture is a competition between a Z2 spin liquid and a diamond valence bond crystal with a 12-site unit cell, as in the density-matrix renormalization group simulations of Yan et al. Furthermore, we found that, on cylinders of finite diameter d , there is a transition between the Z2 spin liquid at small d and the diamond valence bond crystal at large d , the prediction of the present microscopic description for the two-dimensional lattice. These results show that, if the ground state of the spin-1/2 kagome antiferromagnet can be described by nearest-neighbor singlet dimers, it is a diamond valence bond crystal, and, a contrario, that, if the system is a quantum spin liquid, it has to involve long-range singlets, consistent with the algebraic spin liquid scenario.

  19. Spectral identification of melon seeds variety based on k-nearest neighbor and Fisher discriminant analysis

    NASA Astrophysics Data System (ADS)

    Li, Cuiling; Jiang, Kai; Zhao, Xueguan; Fan, Pengfei; Wang, Xiu; Liu, Chuan

    2017-10-01

    Impurity of melon seeds variety will cause reductions of melon production and economic benefits of farmers, this research aimed to adopt spectral technology combined with chemometrics methods to identify melon seeds variety. Melon seeds whose varieties were "Yi Te Bai", "Yi Te Jin", "Jing Mi NO.7", "Jing Mi NO.11" and " Yi Li Sha Bai "were used as research samples. A simple spectral system was developed to collect reflective spectral data of melon seeds, including a light source unit, a spectral data acquisition unit and a data processing unit, the detection wavelength range of this system was 200-1100nm with spectral resolution of 0.14 7.7nm. The original reflective spectral data was pre-treated with de-trend (DT), multiple scattering correction (MSC), first derivative (FD), normalization (NOR) and Savitzky-Golay (SG) convolution smoothing methods. Principal Component Analysis (PCA) method was adopted to reduce the dimensions of reflective spectral data and extract principal components. K-nearest neighbour (KNN) and Fisher discriminant analysis (FDA) methods were used to develop discriminant models of melon seeds variety based on PCA. Spectral data pretreatments improved the discriminant effects of KNN and FDA, FDA generated better discriminant results than KNN, both KNN and FDA methods produced discriminant accuracies reaching to 90.0% for validation set. Research results showed that using spectral technology in combination with KNN and FDA modelling methods to identify melon seeds variety was feasible.

  20. Testing Spatial Symmetry Using Contingency Tables Based on Nearest Neighbor Relations

    PubMed Central

    Ceyhan, Elvan

    2014-01-01

    We consider two types of spatial symmetry, namely, symmetry in the mixed or shared nearest neighbor (NN) structures. We use Pielou's and Dixon's symmetry tests which are defined using contingency tables based on the NN relationships between the data points. We generalize these tests to multiple classes and demonstrate that both the asymptotic and exact versions of Pielou's first type of symmetry test are extremely conservative in rejecting symmetry in the mixed NN structure and hence should be avoided or only the Monte Carlo randomized version should be used. Under RL, we derive the asymptotic distribution for Dixon's symmetry test and also observe that the usual independence test seems to be appropriate for Pielou's second type of test. Moreover, we apply variants of Fisher's exact test on the shared NN contingency table for Pielou's second test and determine the most appropriate version for our setting. We also consider pairwise and one-versus-rest type tests in post hoc analysis after a significant overall symmetry test. We investigate the asymptotic properties of the tests, prove their consistency under appropriate null hypotheses, and investigate finite sample performance of them by extensive Monte Carlo simulations. The methods are illustrated on a real-life ecological data set. PMID:24605061

  1. Anomaly Detection Based on Local Nearest Neighbor Distance Descriptor in Crowded Scenes

    PubMed Central

    Hu, Shiqiang; Zhang, Huanlong; Luo, Lingkun

    2014-01-01

    We propose a novel local nearest neighbor distance (LNND) descriptor for anomaly detection in crowded scenes. Comparing with the commonly used low-level feature descriptors in previous works, LNND descriptor has two major advantages. First, LNND descriptor efficiently incorporates spatial and temporal contextual information around the video event that is important for detecting anomalous interaction among multiple events, while most existing feature descriptors only contain the information of single event. Second, LNND descriptor is a compact representation and its dimensionality is typically much lower than the low-level feature descriptor. Therefore, not only the computation time and storage requirement can be accordingly saved by using LNND descriptor for the anomaly detection method with offline training fashion, but also the negative aspects caused by using high-dimensional feature descriptor can be avoided. We validate the effectiveness of LNND descriptor by conducting extensive experiments on different benchmark datasets. Experimental results show the promising performance of LNND-based method against the state-of-the-art methods. It is worthwhile to notice that the LNND-based approach requires less intermediate processing steps without any subsequent processing such as smoothing but achieves comparable event better performance. PMID:25105164

  2. Spatiotemporal distribution of Oklahoma earthquakes: Exploring relationships using a nearest-neighbor approach

    NASA Astrophysics Data System (ADS)

    Vasylkivska, Veronika S.; Huerta, Nicolas J.

    2017-07-01

    Determining the spatiotemporal characteristics of natural and induced seismic events holds the opportunity to gain new insights into why these events occur. Linking the seismicity characteristics with other geologic, geographic, natural, or anthropogenic factors could help to identify the causes and suggest mitigation strategies that reduce the risk associated with such events. The nearest-neighbor approach utilized in this work represents a practical first step toward identifying statistically correlated clusters of recorded earthquake events. Detailed study of the Oklahoma earthquake catalog's inherent errors, empirical model parameters, and model assumptions is presented. We found that the cluster analysis results are stable with respect to empirical parameters (e.g., fractal dimension) but were sensitive to epicenter location errors and seismicity rates. Most critically, we show that the patterns in the distribution of earthquake clusters in Oklahoma are primarily defined by spatial relationships between events. This observation is a stark contrast to California (also known for induced seismicity) where a comparable cluster distribution is defined by both spatial and temporal interactions between events. These results highlight the difficulty in understanding the mechanisms and behavior of induced seismicity but provide insights for future work.

  3. Neighbor Discovery Algorithm in Wireless Local Area Networks Using Multi-beam Directional Antennas

    NASA Astrophysics Data System (ADS)

    Wang, Jin; Peng, Wei; Liu, Song

    2017-10-01

    Neighbor discovery is an important step for Wireless Local Area Networks (WLAN) and the use of multi-beam directional antennas can greatly improve the network performance. However, most neighbor discovery algorithms in WLAN, based on multi-beam directional antennas, can only work effectively in synchronous system but not in asynchro-nous system. And collisions at AP remain a bottleneck for neighbor discovery. In this paper, we propose two asynchrono-us neighbor discovery algorithms: asynchronous hierarchical scanning (AHS) and asynchronous directional scanning (ADS) algorithm. Both of them are based on three-way handshaking mechanism. AHS and ADS reduce collisions at AP to have a good performance in a hierarchical way and directional way respectively. In the end, the performance of the AHS and ADS are tested on OMNeT++. Moreover, it is analyzed that different application scenarios and the factors how to affect the performance of these algorithms. The simulation results show that AHS is suitable for the densely populated scenes around AP while ADS is suitable for that most of the neighborhood nodes are far from AP.

  4. Floating phase in the one-dimensional transverse axial next-nearest-neighbor Ising model.

    PubMed

    Chandra, Anjan Kumar; Dasgupta, Subinay

    2007-02-01

    To study the ground state of an axial next-nearest-neighbor Ising chain under transverse field as a function of frustration parameter kappa and field strength Gamma, we present here two different perturbative analyses. In one, we consider the (known) ground state at kappa=0.5 and Gamma=0 as the unperturbed state and treat an increase of the field from 0 to Gamma coupled with an increase of kappa from 0.5 to 0.5+rGamma/J as perturbation. The first-order perturbation correction to eigenvalue can be calculated exactly and we could conclude that there are only two phase-transition lines emanating from the point kappa=0.5, Gamma=0. In the second perturbation scheme, we consider the number of domains of length 1 as the perturbation and obtain the zeroth-order eigenfunction for the perturbed ground state. From the longitudinal spin-spin correlation, we conclude that floating phase exists for small values of transverse field over the entire region intermediate between the ferromagnetic phase and antiphase.

  5. Magnetization reversal in magnetic dot arrays: Nearest-neighbor interactions and global configurational anisotropy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Van de Wiele, Ben; Fin, Samuele; Pancaldi, Matteo

    2016-05-28

    Various proposals for future magnetic memories, data processing devices, and sensors rely on a precise control of the magnetization ground state and magnetization reversal process in periodically patterned media. In finite dot arrays, such control is hampered by the magnetostatic interactions between the nanomagnets, leading to the non-uniform magnetization state distributions throughout the sample while reversing. In this paper, we evidence how during reversal typical geometric arrangements of dots in an identical magnetization state appear that originate in the dominance of either Global Configurational Anisotropy or Nearest-Neighbor Magnetostatic interactions, which depends on the fields at which the magnetization reversal setsmore » in. Based on our findings, we propose design rules to obtain the uniform magnetization state distributions throughout the array, and also suggest future research directions to achieve non-uniform state distributions of interest, e.g., when aiming at guiding spin wave edge-modes through dot arrays. Our insights are based on the Magneto-Optical Kerr Effect and Magnetic Force Microscopy measurements as well as the extensive micromagnetic simulations.« less

  6. Teaching a Machine to Feel Postoperative Pain: Combining High-Dimensional Clinical Data with Machine Learning Algorithms to Forecast Acute Postoperative Pain

    PubMed Central

    Tighe, Patrick J.; Harle, Christopher A.; Hurley, Robert W.; Aytug, Haldun; Boezaart, Andre P.; Fillingim, Roger B.

    2015-01-01

    Background Given their ability to process highly dimensional datasets with hundreds of variables, machine learning algorithms may offer one solution to the vexing challenge of predicting postoperative pain. Methods Here, we report on the application of machine learning algorithms to predict postoperative pain outcomes in a retrospective cohort of 8071 surgical patients using 796 clinical variables. Five algorithms were compared in terms of their ability to forecast moderate to severe postoperative pain: Least Absolute Shrinkage and Selection Operator (LASSO), gradient-boosted decision tree, support vector machine, neural network, and k-nearest neighbor, with logistic regression included for baseline comparison. Results In forecasting moderate to severe postoperative pain for postoperative day (POD) 1, the LASSO algorithm, using all 796 variables, had the highest accuracy with an area under the receiver-operating curve (ROC) of 0.704. Next, the gradient-boosted decision tree had an ROC of 0.665 and the k-nearest neighbor algorithm had an ROC of 0.643. For POD 3, the LASSO algorithm, using all variables, again had the highest accuracy, with an ROC of 0.727. Logistic regression had a lower ROC of 0.5 for predicting pain outcomes on POD 1 and 3. Conclusions Machine learning algorithms, when combined with complex and heterogeneous data from electronic medical record systems, can forecast acute postoperative pain outcomes with accuracies similar to methods that rely only on variables specifically collected for pain outcome prediction. PMID:26031220

  7. ``Glue" approximation for the pairing interaction in the Hubbard model with next nearest neighbor hopping

    NASA Astrophysics Data System (ADS)

    Khatami, Ehsan; Macridin, Alexandru; Jarrell, Mark

    2008-03-01

    Recently, several authors have employed the ``glue" approximation for the Cuprates in which the full pairing vertex is approximated by the spin susceptibility. We study this approximation using Quantum Monte Carlo Dynamical Cluster Approximation methods on a 2D Hubbard model. By considering a reasonable finite value for the next nearest neighbor hopping, we find that this ``glue" approximation, in the current form, does not capture the correct pairing symmetry. Here, d-wave is not the leading pairing symmetry while it is the dominant symmetry using the ``exact" QMC results. We argue that the sensitivity of this approximation to the band structure changes leads to this inconsistency and that this form of interaction may not be the appropriate description of the pairing mechanism in Cuprates. We suggest improvements to this approximation which help to capture the the essential features of the QMC data.

  8. Fracton topological order from nearest-neighbor two-spin interactions and dualities

    NASA Astrophysics Data System (ADS)

    Slagle, Kevin; Kim, Yong Baek

    2017-10-01

    Fracton topological order describes a remarkable phase of matter, which can be characterized by fracton excitations with constrained dynamics and a ground-state degeneracy that increases exponentially with the length of the system on a three-dimensional torus. However, previous models exhibiting this order require many-spin interactions, which may be very difficult to realize in a real material or cold atom system. In this work, we present a more physically realistic model which has the so-called X-cube fracton topological order [Vijay, Haah, and Fu, Phys. Rev. B 94, 235157 (2016), 10.1103/PhysRevB.94.235157] but only requires nearest-neighbor two-spin interactions. The model lives on a three-dimensional honeycomb-based lattice with one to two spin-1/2 degrees of freedom on each site and a unit cell of six sites. The model is constructed from two orthogonal stacks of Z2 topologically ordered Kitaev honeycomb layers [Kitaev, Ann. Phys. 321, 2 (2006), 10.1016/j.aop.2005.10.005], which are coupled together by a two-spin interaction. It is also shown that a four-spin interaction can be included to instead stabilize 3+1D Z2 topological order. We also find dual descriptions of four quantum phase transitions in our model, all of which appear to be discontinuous first-order transitions.

  9. Relationship between neighbor number and vibrational spectra in disordered colloidal clusters with attractive interactions

    NASA Astrophysics Data System (ADS)

    Yunker, Peter J.; Zhang, Zexin; Gratale, Matthew; Chen, Ke; Yodh, A. G.

    2013-03-01

    We study connections between vibrational spectra and average nearest neighbor number in disordered clusters of colloidal particles with attractive interactions. Measurements of displacement covariances between particles in each cluster permit calculation of the stiffness matrix, which contains effective spring constants linking pairs of particles. From the cluster stiffness matrix, we derive vibrational properties of corresponding "shadow" glassy clusters, with the same geometric configuration and interactions as the "source" cluster but without damping. Here, we investigate the stiffness matrix to elucidate the origin of the correlations between the median frequency of cluster vibrational modes and average number of nearest neighbors in the cluster. We find that the mean confining stiffness of particles in a cluster, i.e., the ensemble-averaged sum of nearest neighbor spring constants, correlates strongly with average nearest neighbor number, and even more strongly with median frequency. Further, we find that the average oscillation frequency of an individual particle is set by the total stiffness of its nearest neighbor bonds; this average frequency increases as the square root of the nearest neighbor bond stiffness, in a manner similar to the simple harmonic oscillator.

  10. A collaborative filtering recommendation algorithm based on weighted SimRank and social trust

    NASA Astrophysics Data System (ADS)

    Su, Chang; Zhang, Butao

    2017-05-01

    Collaborative filtering is one of the most widely used recommendation technologies, but the data sparsity and cold start problem of collaborative filtering algorithms are difficult to solve effectively. In order to alleviate the problem of data sparsity in collaborative filtering algorithm, firstly, a weighted improved SimRank algorithm is proposed to compute the rating similarity between users in rating data set. The improved SimRank can find more nearest neighbors for target users according to the transmissibility of rating similarity. Then, we build trust network and introduce the calculation of trust degree in the trust relationship data set. Finally, we combine rating similarity and trust to build a comprehensive similarity in order to find more appropriate nearest neighbors for target user. Experimental results show that the algorithm proposed in this paper improves the recommendation precision of the Collaborative algorithm effectively.

  11. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

    PubMed

    Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi

    2017-11-02

    Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.

  12. Geographical traceability of Marsdenia tenacissima by Fourier transform infrared spectroscopy and chemometrics

    NASA Astrophysics Data System (ADS)

    Li, Chao; Yang, Sheng-Chao; Guo, Qiao-Sheng; Zheng, Kai-Yan; Wang, Ping-Li; Meng, Zhen-Gui

    2016-01-01

    A combination of Fourier transform infrared spectroscopy with chemometrics tools provided an approach for studying Marsdenia tenacissima according to its geographical origin. A total of 128 M. tenacissima samples from four provinces in China were analyzed with FTIR spectroscopy. Six pattern recognition methods were used to construct the discrimination models: support vector machine-genetic algorithms, support vector machine-particle swarm optimization, K-nearest neighbors, radial basis function neural network, random forest and support vector machine-grid search. Experimental results showed that K-nearest neighbors was superior to other mathematical algorithms after data were preprocessed with wavelet de-noising, with a discrimination rate of 100% in both the training and prediction sets. This study demonstrated that FTIR spectroscopy coupled with K-nearest neighbors could be successfully applied to determine the geographical origins of M. tenacissima samples, thereby providing reliable authentication in a rapid, cheap and noninvasive way.

  13. Liquid li structure and dynamics: A comparison between OFDFT and second nearest-neighbor embedded-atom method

    DOE PAGES

    Chen, Mohan; Vella, Joseph R.; Panagiotopoulos, Athanassios Z.; ...

    2015-04-08

    The structure and dynamics of liquid lithium are studied using two simulation methods: orbital-free (OF) first-principles molecular dynamics (MD), which employs OF density functional theory (DFT), and classical MD utilizing a second nearest-neighbor embedded-atom method potential. The properties we studied include the dynamic structure factor, the self-diffusion coefficient, the dispersion relation, the viscosity, and the bond angle distribution function. Our simulation results were compared to available experimental data when possible. Each method has distinct advantages and disadvantages. For example, OFDFT gives better agreement with experimental dynamic structure factors, yet is more computationally demanding than classical simulations. Classical simulations can accessmore » a broader temperature range and longer time scales. The combination of first-principles and classical simulations is a powerful tool for studying properties of liquid lithium.« less

  14. Heterogeneous autoregressive model with structural break using nearest neighbor truncation volatility estimators for DAX.

    PubMed

    Chin, Wen Cheong; Lee, Min Cherng; Yap, Grace Lee Ching

    2016-01-01

    High frequency financial data modelling has become one of the important research areas in the field of financial econometrics. However, the possible structural break in volatile financial time series often trigger inconsistency issue in volatility estimation. In this study, we propose a structural break heavy-tailed heterogeneous autoregressive (HAR) volatility econometric model with the enhancement of jump-robust estimators. The breakpoints in the volatility are captured by dummy variables after the detection by Bai-Perron sequential multi breakpoints procedure. In order to further deal with possible abrupt jump in the volatility, the jump-robust volatility estimators are composed by using the nearest neighbor truncation approach, namely the minimum and median realized volatility. Under the structural break improvements in both the models and volatility estimators, the empirical findings show that the modified HAR model provides the best performing in-sample and out-of-sample forecast evaluations as compared with the standard HAR models. Accurate volatility forecasts have direct influential to the application of risk management and investment portfolio analysis.

  15. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.

    PubMed

    Haghverdi, Laleh; Lun, Aaron T L; Morgan, Michael D; Marioni, John C

    2018-06-01

    Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.

  16. Ground-state entropy of the potts antiferromagnet with next-nearest-neighbor spin-spin couplings on strips of the square lattice

    PubMed

    Chang; Shrock

    2000-10-01

    We present exact calculations of the zero-temperature partition function (chromatic polynomial) and W(q), the exponent of the ground-state entropy, for the q-state Potts antiferromagnet with next-nearest-neighbor spin-spin couplings on square lattice strips, of width L(y)=3 and L(y)=4 vertices and arbitrarily great length Lx vertices, with both free and periodic boundary conditions. The resultant values of W for a range of physical q values are compared with each other and with the values for the full two-dimensional lattice. These results give insight into the effect of such nonnearest-neighbor couplings on the ground-state entropy. We show that the q=2 (Ising) and q=4 Potts antiferromagnets have zero-temperature critical points on the Lx-->infinity limits of the strips that we study. With the generalization of q from Z+ to C, we determine the analytic structure of W(q) in the q plane for the various cases.

  17. Ising model of cardiac thin filament activation with nearest-neighbor cooperative interactions

    NASA Technical Reports Server (NTRS)

    Rice, John Jeremy; Stolovitzky, Gustavo; Tu, Yuhai; de Tombe, Pieter P.; Bers, D. M. (Principal Investigator)

    2003-01-01

    We have developed a model of cardiac thin filament activation using an Ising model approach from equilibrium statistical physics. This model explicitly represents nearest-neighbor interactions between 26 troponin/tropomyosin units along a one-dimensional array that represents the cardiac thin filament. With transition rates chosen to match experimental data, the results show that the resulting force-pCa (F-pCa) relations are similar to Hill functions with asymmetries, as seen in experimental data. Specifically, Hill plots showing (log(F/(1-F)) vs. log [Ca]) reveal a steeper slope below the half activation point (Ca(50)) compared with above. Parameter variation studies show interplay of parameters that affect the apparent cooperativity and asymmetry in the F-pCa relations. The model also predicts that Ca binding is uncooperative for low [Ca], becomes steeper near Ca(50), and becomes uncooperative again at higher [Ca]. The steepness near Ca(50) mirrors the steep F-pCa as a result of thermodynamic considerations. The model also predicts that the correlation between troponin/tropomyosin units along the one-dimensional array quickly decays at high and low [Ca], but near Ca(50), high correlation occurs across the whole array. This work provides a simple model that can account for the steepness and shape of F-pCa relations that other models fail to reproduce.

  18. Missing value imputation in DNA microarrays based on conjugate gradient method.

    PubMed

    Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh

    2012-02-01

    Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.

  19. A Fast Robot Identification and Mapping Algorithm Based on Kinect Sensor.

    PubMed

    Zhang, Liang; Shen, Peiyi; Zhu, Guangming; Wei, Wei; Song, Houbing

    2015-08-14

    Internet of Things (IoT) is driving innovation in an ever-growing set of application domains such as intelligent processing for autonomous robots. For an autonomous robot, one grand challenge is how to sense its surrounding environment effectively. The Simultaneous Localization and Mapping with RGB-D Kinect camera sensor on robot, called RGB-D SLAM, has been developed for this purpose but some technical challenges must be addressed. Firstly, the efficiency of the algorithm cannot satisfy real-time requirements; secondly, the accuracy of the algorithm is unacceptable. In order to address these challenges, this paper proposes a set of novel improvement methods as follows. Firstly, the ORiented Brief (ORB) method is used in feature detection and descriptor extraction. Secondly, a bidirectional Fast Library for Approximate Nearest Neighbors (FLANN) k-Nearest Neighbor (KNN) algorithm is applied to feature match. Then, the improved RANdom SAmple Consensus (RANSAC) estimation method is adopted in the motion transformation. In the meantime, high precision General Iterative Closest Points (GICP) is utilized to register a point cloud in the motion transformation optimization. To improve the accuracy of SLAM, the reduced dynamic covariance scaling (DCS) algorithm is formulated as a global optimization problem under the G2O framework. The effectiveness of the improved algorithm has been verified by testing on standard data and comparing with the ground truth obtained on Freiburg University's datasets. The Dr Robot X80 equipped with a Kinect camera is also applied in a building corridor to verify the correctness of the improved RGB-D SLAM algorithm. With the above experiments, it can be seen that the proposed algorithm achieves higher processing speed and better accuracy.

  20. Activity recognition in planetary navigation field tests using classification algorithms applied to accelerometer data.

    PubMed

    Song, Wen; Ade, Carl; Broxterman, Ryan; Barstow, Thomas; Nelson, Thomas; Warren, Steve

    2012-01-01

    Accelerometer data provide useful information about subject activity in many different application scenarios. For this study, single-accelerometer data were acquired from subjects participating in field tests that mimic tasks that astronauts might encounter in reduced gravity environments. The primary goal of this effort was to apply classification algorithms that could identify these tasks based on features present in their corresponding accelerometer data, where the end goal is to establish methods to unobtrusively gauge subject well-being based on sensors that reside in their local environment. In this initial analysis, six different activities that involve leg movement are classified. The k-Nearest Neighbors (kNN) algorithm was found to be the most effective, with an overall classification success rate of 90.8%.

  1. Iris recognition using image moments and k-means algorithm.

    PubMed

    Khan, Yaser Daanial; Khan, Sher Afzal; Ahmad, Farooq; Islam, Saeed

    2014-01-01

    This paper presents a biometric technique for identification of a person using the iris image. The iris is first segmented from the acquired image of an eye using an edge detection algorithm. The disk shaped area of the iris is transformed into a rectangular form. Described moments are extracted from the grayscale image which yields a feature vector containing scale, rotation, and translation invariant moments. Images are clustered using the k-means algorithm and centroids for each cluster are computed. An arbitrary image is assumed to belong to the cluster whose centroid is the nearest to the feature vector in terms of Euclidean distance computed. The described model exhibits an accuracy of 98.5%.

  2. Iris Recognition Using Image Moments and k-Means Algorithm

    PubMed Central

    Khan, Yaser Daanial; Khan, Sher Afzal; Ahmad, Farooq; Islam, Saeed

    2014-01-01

    This paper presents a biometric technique for identification of a person using the iris image. The iris is first segmented from the acquired image of an eye using an edge detection algorithm. The disk shaped area of the iris is transformed into a rectangular form. Described moments are extracted from the grayscale image which yields a feature vector containing scale, rotation, and translation invariant moments. Images are clustered using the k-means algorithm and centroids for each cluster are computed. An arbitrary image is assumed to belong to the cluster whose centroid is the nearest to the feature vector in terms of Euclidean distance computed. The described model exhibits an accuracy of 98.5%. PMID:24977221

  3. A neighboring structure reconstructed matching algorithm based on LARK features

    NASA Astrophysics Data System (ADS)

    Xue, Taobei; Han, Jing; Zhang, Yi; Bai, Lianfa

    2015-11-01

    Aimed at the low contrast ratio and high noise of infrared images, and the randomness and ambient occlusion of its objects, this paper presents a neighboring structure reconstructed matching (NSRM) algorithm based on LARK features. The neighboring structure relationships of local window are considered based on a non-negative linear reconstruction method to build a neighboring structure relationship matrix. Then the LARK feature matrix and the NSRM matrix are processed separately to get two different similarity images. By fusing and analyzing the two similarity images, those infrared objects are detected and marked by the non-maximum suppression. The NSRM approach is extended to detect infrared objects with incompact structure. High performance is demonstrated on infrared body set, indicating a lower false detecting rate than conventional methods in complex natural scenes.

  4. The probability of misassociation between neighboring targets

    NASA Astrophysics Data System (ADS)

    Areta, Javier A.; Bar-Shalom, Yaakov; Rothrock, Ronald

    2008-04-01

    This paper presents procedures to calculate the probability that the measurement originating from an extraneous target will be (mis)associated with a target of interest for the cases of Nearest Neighbor and Global association. It is shown that these misassociation probabilities depend, under certain assumptions, on a particular - covariance weighted - norm of the difference between the targets' predicted measurements. For the Nearest Neighbor association, the exact solution, obtained for the case of equal innovation covariances, is based on a noncentral chi-square distribution. An approximate solution is also presented for the case of unequal innovation covariances. For the Global case an approximation is presented for the case of "similar" innovation covariances. In the general case of unequal innovation covariances where this approximation fails, an exact method based on the inversion of the characteristic function is presented. The theoretical results, confirmed by Monte Carlo simulations, quantify the benefit of Global vs. Nearest Neighbor association. These results are applied to problems of single sensor as well as centralized fusion architecture multiple sensor tracking.

  5. Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines.

    PubMed

    Majid, Abdul; Ali, Safdar; Iqbal, Mubashar; Kausar, Nabeela

    2014-03-01

    This study proposes a novel prediction approach for human breast and colon cancers using different feature spaces. The proposed scheme consists of two stages: the preprocessor and the predictor. In the preprocessor stage, the mega-trend diffusion (MTD) technique is employed to increase the samples of the minority class, thereby balancing the dataset. In the predictor stage, machine-learning approaches of K-nearest neighbor (KNN) and support vector machines (SVM) are used to develop hybrid MTD-SVM and MTD-KNN prediction models. MTD-SVM model has provided the best values of accuracy, G-mean and Matthew's correlation coefficient of 96.71%, 96.70% and 71.98% for cancer/non-cancer dataset, breast/non-breast cancer dataset and colon/non-colon cancer dataset, respectively. We found that hybrid MTD-SVM is the best with respect to prediction performance and computational cost. MTD-KNN model has achieved moderately better prediction as compared to hybrid MTD-NB (Naïve Bayes) but at the expense of higher computing cost. MTD-KNN model is faster than MTD-RF (random forest) but its prediction is not better than MTD-RF. To the best of our knowledge, the reported results are the best results, so far, for these datasets. The proposed scheme indicates that the developed models can be used as a tool for the prediction of cancer. This scheme may be useful for study of any sequential information such as protein sequence or any nucleic acid sequence. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  6. Fuzzy-Rough Nearest Neighbour Classification

    NASA Astrophysics Data System (ADS)

    Jensen, Richard; Cornelis, Chris

    A new fuzzy-rough nearest neighbour (FRNN) classification algorithm is presented in this paper, as an alternative to Sarkar's fuzzy-rough ownership function (FRNN-O) approach. By contrast to the latter, our method uses the nearest neighbours to construct lower and upper approximations of decision classes, and classifies test instances based on their membership to these approximations. In the experimental analysis, we evaluate our approach with both classical fuzzy-rough approximations (based on an implicator and a t-norm), as well as with the recently introduced vaguely quantified rough sets. Preliminary results are very good, and in general FRNN outperforms FRNN-O, as well as the traditional fuzzy nearest neighbour (FNN) algorithm.

  7. Influence of the number of topologically interacting neighbors on swarm dynamics

    PubMed Central

    Shang, Yilun; Bouffanais, Roland

    2014-01-01

    Recent empirical and theoretical works on collective behaviors based on a topological interaction are beginning to offer some explanations as for the physical reasons behind the selection of a particular number of nearest neighbors locally affecting each individual's dynamics. Recently, flocking starlings have been shown to topologically interact with a very specific number of neighbors, between six to eight, while metric-free interactions were found to govern human crowd dynamics. Here, we use network- and graph-theoretic approaches combined with a dynamical model of locally interacting self-propelled particles to study how the consensus reaching process and its dynamics are influenced by the number k of topological neighbors. Specifically, we prove exactly that, in the absence of noise, consensus is always attained with a speed to consensus strictly increasing with k. The analysis of both speed and time to consensus reveals that, irrespective of the swarm size, a value of k ~ 10 speeds up the rate of convergence to consensus to levels close to the one of the optimal all-to-all interaction signaling. Furthermore, this effect is found to be more pronounced in the presence of environmental noise. PMID:24567077

  8. Equilibrium, metastability, and hysteresis in a model spin-crossover material with nearest-neighbor antiferromagnetic-like and long-range ferromagnetic-like interactions

    NASA Astrophysics Data System (ADS)

    Rikvold, Per Arne; Brown, Gregory; Miyashita, Seiji; Omand, Conor; Nishino, Masamichi

    2016-02-01

    Phase diagrams and hysteresis loops were obtained by Monte Carlo simulations and a mean-field method for a simplified model of a spin-crossover material with a two-step transition between the high-spin and low-spin states. This model is a mapping onto a square-lattice S =1 /2 Ising model with antiferromagnetic nearest-neighbor and ferromagnetic Husimi-Temperley (equivalent-neighbor) long-range interactions. Phase diagrams obtained by the two methods for weak and strong long-range interactions are found to be similar. However, for intermediate-strength long-range interactions, the Monte Carlo simulations show that tricritical points decompose into pairs of critical end points and mean-field critical points surrounded by horn-shaped regions of metastability. Hysteresis loops along paths traversing the horn regions are strongly reminiscent of thermal two-step transition loops with hysteresis, recently observed experimentally in several spin-crossover materials. We believe analogous phenomena should be observable in experiments and simulations for many systems that exhibit competition between local antiferromagnetic-like interactions and long-range ferromagnetic-like interactions caused by elastic distortions.

  9. Equilibrium, metastability, and hysteresis in a model spin-crossover material with nearest-neighbor antiferromagnetic-like and long-range ferromagnetic-like interactions

    DOE PAGES

    Rikvold, Per Arne; Brown, Gregory; Miyashita, Seiji; ...

    2016-02-16

    Phase diagrams and hysteresis loops were obtained by Monte Carlo simulations and a mean- field method for a simplified model of a spin-crossovermaterialwith a two-step transition between the high-spin and low-spin states. This model is a mapping onto a square-lattice S = 1/2 Ising model with antiferromagnetic nearest-neighbor and ferromagnetic Husimi-Temperley ( equivalent-neighbor) long-range interactions. Phase diagrams obtained by the two methods for weak and strong long-range interactions are found to be similar. However, for intermediate-strength long-range interactions, the Monte Carlo simulations show that tricritical points decompose into pairs of critical end points and mean-field critical points surrounded by horn-shapedmore » regions of metastability. Hysteresis loops along paths traversing the horn regions are strongly reminiscent of thermal two-step transition loops with hysteresis, recently observed experimentally in several spin-crossover materials. As a result, we believe analogous phenomena should be observable in experiments and simulations for many systems that exhibit competition between local antiferromagnetic-like interactions and long-range ferromagnetic-like interactions caused by elastic distortions.« less

  10. Symmetrized Nearest Neighbor Regression Estimates.

    DTIC Science & Technology

    1987-12-01

    TELEPHONE NUMBER 22C. OFFICE SYMBO0L (Inetude A me. Code) Major Brian Woodruff 1(202) 767-5026 1 Dr -’ 00 PORN 147,303- APR EDI1TION OF I JAN 73 IS...in tenth of a pence) in 1973. The data come from the Family Ex- penditure Survey, Annual Base Tapes 1968-198S, Department of Employment, Statistics...Statistics, 13, 1465- 1481. Hildenbrand, K. and Hildenbrand, W. (1986). On the mean income effect: a data analysis of the U.K. family expenditure

  11. Randomized Approaches for Nearest Neighbor Search in Metric Space When Computing the Pairwise Distance Is Extremely Expensive

    NASA Astrophysics Data System (ADS)

    Wang, Lusheng; Yang, Yong; Lin, Guohui

    Finding the closest object for a query in a database is a classical problem in computer science. For some modern biological applications, computing the similarity between two objects might be very time consuming. For example, it takes a long time to compute the edit distance between two whole chromosomes and the alignment cost of two 3D protein structures. In this paper, we study the nearest neighbor search problem in metric space, where the pair-wise distance between two objects in the database is known and we want to minimize the number of distances computed on-line between the query and objects in the database in order to find the closest object. We have designed two randomized approaches for indexing metric space databases, where objects are purely described by their distances with each other. Analysis and experiments show that our approaches only need to compute O(logn) objects in order to find the closest object, where n is the total number of objects in the database.

  12. Personalised news filtering and recommendation system using Chi-square statistics-based K-nearest neighbour (χ2SB-KNN) model

    NASA Astrophysics Data System (ADS)

    Adeniyi, D. A.; Wei, Z.; Yang, Y.

    2017-10-01

    Recommendation problem has been extensively studied by researchers in the field of data mining, database and information retrieval. This study presents the design and realisation of an automated, personalised news recommendations system based on Chi-square statistics-based K-nearest neighbour (χ2SB-KNN) model. The proposed χ2SB-KNN model has the potential to overcome computational complexity and information overloading problems, reduces runtime and speeds up execution process through the use of critical value of χ2 distribution. The proposed recommendation engine can alleviate scalability challenges through combined online pattern discovery and pattern matching for real-time recommendations. This work also showcases the development of a novel method of feature selection referred to as Data Discretisation-Based feature selection method. This is used for selecting the best features for the proposed χ2SB-KNN algorithm at the preprocessing stage of the classification procedures. The implementation of the proposed χ2SB-KNN model is achieved through the use of a developed in-house Java program on an experimental website called OUC newsreaders' website. Finally, we compared the performance of our system with two baseline methods which are traditional Euclidean distance K-nearest neighbour and Naive Bayesian techniques. The result shows a significant improvement of our method over the baseline methods studied.

  13. Generative Models for Similarity-based Classification

    DTIC Science & Technology

    2007-01-01

    NC), local nearest centroid (local NC), k-nearest neighbors ( kNN ), and condensed nearest neighbors (CNN) are all similarity-based classifiers which...vector machine to the k nearest neighbors of the test sample [80]. The SVM- KNN method was developed to address the robustness and dimensionality...concerns that afflict nearest neighbors and SVMs. Similarly to the nearest-means classifier, the SVM- KNN is a hybrid local and global classifier developed

  14. Active learning for solving the incomplete data problem in facial age classification by the furthest nearest-neighbor criterion.

    PubMed

    Wang, Jian-Gang; Sung, Eric; Yau, Wei-Yun

    2011-07-01

    Facial age classification is an approach to classify face images into one of several predefined age groups. One of the difficulties in applying learning techniques to the age classification problem is the large amount of labeled training data required. Acquiring such training data is very costly in terms of age progress, privacy, human time, and effort. Although unlabeled face images can be obtained easily, it would be expensive to manually label them on a large scale and getting the ground truth. The frugal selection of the unlabeled data for labeling to quickly reach high classification performance with minimal labeling efforts is a challenging problem. In this paper, we present an active learning approach based on an online incremental bilateral two-dimension linear discriminant analysis (IB2DLDA) which initially learns from a small pool of labeled data and then iteratively selects the most informative samples from the unlabeled set to increasingly improve the classifier. Specifically, we propose a novel data selection criterion called the furthest nearest-neighbor (FNN) that generalizes the margin-based uncertainty to the multiclass case and which is easy to compute, so that the proposed active learning algorithm can handle a large number of classes and large data sizes efficiently. Empirical experiments on FG-NET and Morph databases together with a large unlabeled data set for age categorization problems show that the proposed approach can achieve results comparable or even outperform a conventionally trained active classifier that requires much more labeling effort. Our IB2DLDA-FNN algorithm can achieve similar results much faster than random selection and with fewer samples for age categorization. It also can achieve comparable results with active SVM but is much faster than active SVM in terms of training because kernel methods are not needed. The results on the face recognition database and palmprint/palm vein database showed that our approach can handle

  15. A Distributed and Energy-Efficient Algorithm for Event K-Coverage in Underwater Sensor Networks

    PubMed Central

    Jiang, Peng; Xu, Yiming; Liu, Jun

    2017-01-01

    For event dynamic K-coverage algorithms, each management node selects its assistant node by using a greedy algorithm without considering the residual energy and situations in which a node is selected by several events. This approach affects network energy consumption and balance. Therefore, this study proposes a distributed and energy-efficient event K-coverage algorithm (DEEKA). After the network achieves 1-coverage, the nodes that detect the same event compete for the event management node with the number of candidate nodes and the average residual energy, as well as the distance to the event. Second, each management node estimates the probability of its neighbor nodes’ being selected by the event it manages with the distance level, the residual energy level, and the number of dynamic coverage event of these nodes. Third, each management node establishes an optimization model that uses expectation energy consumption and the residual energy variance of its neighbor nodes and detects the performance of the events it manages as targets. Finally, each management node uses a constrained non-dominated sorting genetic algorithm (NSGA-II) to obtain the Pareto set of the model and the best strategy via technique for order preference by similarity to an ideal solution (TOPSIS). The algorithm first considers the effect of harsh underwater environments on information collection and transmission. It also considers the residual energy of a node and a situation in which the node is selected by several other events. Simulation results show that, unlike the on-demand variable sensing K-coverage algorithm, DEEKA balances and reduces network energy consumption, thereby prolonging the network’s best service quality and lifetime. PMID:28106837

  16. A Distributed and Energy-Efficient Algorithm for Event K-Coverage in Underwater Sensor Networks.

    PubMed

    Jiang, Peng; Xu, Yiming; Liu, Jun

    2017-01-19

    For event dynamic K-coverage algorithms, each management node selects its assistant node by using a greedy algorithm without considering the residual energy and situations in which a node is selected by several events. This approach affects network energy consumption and balance. Therefore, this study proposes a distributed and energy-efficient event K-coverage algorithm (DEEKA). After the network achieves 1-coverage, the nodes that detect the same event compete for the event management node with the number of candidate nodes and the average residual energy, as well as the distance to the event. Second, each management node estimates the probability of its neighbor nodes' being selected by the event it manages with the distance level, the residual energy level, and the number of dynamic coverage event of these nodes. Third, each management node establishes an optimization model that uses expectation energy consumption and the residual energy variance of its neighbor nodes and detects the performance of the events it manages as targets. Finally, each management node uses a constrained non-dominated sorting genetic algorithm (NSGA-II) to obtain the Pareto set of the model and the best strategy via technique for order preference by similarity to an ideal solution (TOPSIS). The algorithm first considers the effect of harsh underwater environments on information collection and transmission. It also considers the residual energy of a node and a situation in which the node is selected by several other events. Simulation results show that, unlike the on-demand variable sensing K-coverage algorithm, DEEKA balances and reduces network energy consumption, thereby prolonging the network's best service quality and lifetime.

  17. TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach

    PubMed Central

    Diaz, Naryttza N; Krause, Lutz; Goesmann, Alexander; Niehaus, Karsten; Nattkemper, Tim W

    2009-01-01

    Background Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. Results Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained. Conclusion An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference

  18. Application of affinity propagation algorithm based on manifold distance for transformer PD pattern recognition

    NASA Astrophysics Data System (ADS)

    Wei, B. G.; Huo, K. X.; Yao, Z. F.; Lou, J.; Li, X. Y.

    2018-03-01

    It is one of the difficult problems encountered in the research of condition maintenance technology of transformers to recognize partial discharge (PD) pattern. According to the main physical characteristics of PD, three models of oil-paper insulation defects were set up in laboratory to study the PD of transformers, and phase resolved partial discharge (PRPD) was constructed. By using least square method, the grey-scale images of PRPD were constructed and features of each grey-scale image were 28 box dimensions and 28 information dimensions. Affinity propagation algorithm based on manifold distance (AP-MD) for transformers PD pattern recognition was established, and the data of box dimension and information dimension were clustered based on AP-MD. Study shows that clustering result of AP-MD is better than the results of affinity propagation (AP), k-means and fuzzy c-means algorithm (FCM). By choosing different k values of k-nearest neighbor, we find clustering accuracy of AP-MD falls when k value is larger or smaller, and the optimal k value depends on sample size.

  19. Multispectral imaging burn wound tissue classification system: a comparison of test accuracies between several common machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Squiers, John J.; Li, Weizhi; King, Darlene R.; Mo, Weirong; Zhang, Xu; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.

    2016-03-01

    The clinical judgment of expert burn surgeons is currently the standard on which diagnostic and therapeutic decisionmaking regarding burn injuries is based. Multispectral imaging (MSI) has the potential to increase the accuracy of burn depth assessment and the intraoperative identification of viable wound bed during surgical debridement of burn injuries. A highly accurate classification model must be developed using machine-learning techniques in order to translate MSI data into clinically-relevant information. An animal burn model was developed to build an MSI training database and to study the burn tissue classification ability of several models trained via common machine-learning algorithms. The algorithms tested, from least to most complex, were: K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), weighted linear discriminant analysis (W-LDA), quadratic discriminant analysis (QDA), ensemble linear discriminant analysis (EN-LDA), ensemble K-nearest neighbors (EN-KNN), and ensemble decision tree (EN-DT). After the ground-truth database of six tissue types (healthy skin, wound bed, blood, hyperemia, partial injury, full injury) was generated by histopathological analysis, we used 10-fold cross validation to compare the algorithms' performances based on their accuracies in classifying data against the ground truth, and each algorithm was tested 100 times. The mean test accuracy of the algorithms were KNN 68.3%, DT 61.5%, LDA 70.5%, W-LDA 68.1%, QDA 68.9%, EN-LDA 56.8%, EN-KNN 49.7%, and EN-DT 36.5%. LDA had the highest test accuracy, reflecting the bias-variance tradeoff over the range of complexities inherent to the algorithms tested. Several algorithms were able to match the current standard in burn tissue classification, the clinical judgment of expert burn surgeons. These results will guide further development of an MSI burn tissue classification system. Given that there are few surgeons and facilities specializing in burn care

  20. AVNM: A Voting based Novel Mathematical Rule for Image Classification.

    PubMed

    Vidyarthi, Ankit; Mittal, Namita

    2016-12-01

    In machine learning, the accuracy of the system depends upon classification result. Classification accuracy plays an imperative role in various domains. Non-parametric classifier like K-Nearest Neighbor (KNN) is the most widely used classifier for pattern analysis. Besides its easiness, simplicity and effectiveness characteristics, the main problem associated with KNN classifier is the selection of a number of nearest neighbors i.e. "k" for computation. At present, it is hard to find the optimal value of "k" using any statistical algorithm, which gives perfect accuracy in terms of low misclassification error rate. Motivated by the prescribed problem, a new sample space reduction weighted voting mathematical rule (AVNM) is proposed for classification in machine learning. The proposed AVNM rule is also non-parametric in nature like KNN. AVNM uses the weighted voting mechanism with sample space reduction to learn and examine the predicted class label for unidentified sample. AVNM is free from any initial selection of predefined variable and neighbor selection as found in KNN algorithm. The proposed classifier also reduces the effect of outliers. To verify the performance of the proposed AVNM classifier, experiments are made on 10 standard datasets taken from UCI database and one manually created dataset. The experimental result shows that the proposed AVNM rule outperforms the KNN classifier and its variants. Experimentation results based on confusion matrix accuracy parameter proves higher accuracy value with AVNM rule. The proposed AVNM rule is based on sample space reduction mechanism for identification of an optimal number of nearest neighbor selections. AVNM results in better classification accuracy and minimum error rate as compared with the state-of-art algorithm, KNN, and its variants. The proposed rule automates the selection of nearest neighbor selection and improves classification rate for UCI dataset and manually created dataset. Copyright © 2016 Elsevier

  1. Applied algorithm in the liner inspection of solid rocket motors

    NASA Astrophysics Data System (ADS)

    Hoffmann, Luiz Felipe Simões; Bizarria, Francisco Carlos Parquet; Bizarria, José Walter Parquet

    2018-03-01

    In rocket motors, the bonding between the solid propellant and thermal insulation is accomplished by a thin adhesive layer, known as liner. The liner application method involves a complex sequence of tasks, which includes in its final stage, the surface integrity inspection. Nowadays in Brazil, an expert carries out a thorough visual inspection to detect defects on the liner surface that may compromise the propellant interface bonding. Therefore, this paper proposes an algorithm that uses the photometric stereo technique and the K-nearest neighbor (KNN) classifier to assist the expert in the surface inspection. Photometric stereo allows the surface information recovery of the test images, while the KNN method enables image pixels classification into two classes: non-defect and defect. Tests performed on a computer vision based prototype validate the algorithm. The positive results suggest that the algorithm is feasible and when implemented in a real scenario, will be able to help the expert in detecting defective areas on the liner surface.

  2. Evidence of codon usage in the nearest neighbor spacing distribution of bases in bacterial genomes

    NASA Astrophysics Data System (ADS)

    Higareda, M. F.; Geiger, O.; Mendoza, L.; Méndez-Sánchez, R. A.

    2012-02-01

    Statistical analysis of whole genomic sequences usually assumes a homogeneous nucleotide density throughout the genome, an assumption that has been proved incorrect for several organisms since the nucleotide density is only locally homogeneous. To avoid giving a single numerical value to this variable property, we propose the use of spectral statistics, which characterizes the density of nucleotides as a function of its position in the genome. We show that the cumulative density of bases in bacterial genomes can be separated into an average (or secular) plus a fluctuating part. Bacterial genomes can be divided into two groups according to the qualitative description of their secular part: linear and piecewise linear. These two groups of genomes show different properties when their nucleotide spacing distribution is studied. In order to analyze genomes having a variable nucleotide density, statistically, the use of unfolding is necessary, i.e., to get a separation between the secular part and the fluctuations. The unfolding allows an adequate comparison with the statistical properties of other genomes. With this methodology, four genomes were analyzed Burkholderia, Bacillus, Clostridium and Corynebacterium. Interestingly, the nearest neighbor spacing distributions or detrended distance distributions are very similar for species within the same genus but they are very different for species from different genera. This difference can be attributed to the difference in the codon usage.

  3. yaImpute: An R package for kNN imputation

    Treesearch

    Nicholas L. Crookston; Andrew O. Finley

    2008-01-01

    This article introduces yaImpute, an R package for nearest neighbor search and imputation. Although nearest neighbor imputation is used in a host of disciplines, the methods implemented in the yaImpute package are tailored to imputation-based forest attribute estimation and mapping. The impetus to writing the yaImpute is a growing interest in nearest neighbor...

  4. Optimized Seizure Detection Algorithm: A Fast Approach for Onset of Epileptic in EEG Signals Using GT Discriminant Analysis and K-NN Classifier

    PubMed Central

    Rezaee, Kh.; Azizi, E.; Haddadnia, J.

    2016-01-01

    Background Epilepsy is a severe disorder of the central nervous system that predisposes the person to recurrent seizures. Fifty million people worldwide suffer from epilepsy; after Alzheimer’s and stroke, it is the third widespread nervous disorder. Objective In this paper, an algorithm to detect the onset of epileptic seizures based on the analysis of brain electrical signals (EEG) has been proposed. 844 hours of EEG were recorded form 23 pediatric patients consecutively with 163 occurrences of seizures. Signals had been collected from Children’s Hospital Boston with a sampling frequency of 256 Hz through 18 channels in order to assess epilepsy surgery. By selecting effective features from seizure and non-seizure signals of each individual and putting them into two categories, the proposed algorithm detects the onset of seizures quickly and with high sensitivity. Method In this algorithm, L-sec epochs of signals are displayed in form of a third-order tensor in spatial, spectral and temporal spaces by applying wavelet transform. Then, after applying general tensor discriminant analysis (GTDA) on tensors and calculating mapping matrix, feature vectors are extracted. GTDA increases the sensitivity of the algorithm by storing data without deleting them. Finally, K-Nearest neighbors (KNN) is used to classify the selected features. Results The results of simulating algorithm on algorithm standard dataset shows that the algorithm is capable of detecting 98 percent of seizures with an average delay of 4.7 seconds and the average error rate detection of three errors in 24 hours. Conclusion Today, the lack of an automated system to detect or predict the seizure onset is strongly felt. PMID:27672628

  5. Effect of the next-nearest-neighbor hopping on the charge collective modes in the paramagnetic phase of the Hubbard model

    NASA Astrophysics Data System (ADS)

    Dao, Vu Hung; Frésard, Raymond

    2017-10-01

    The charge dynamical response function of the t-t'-U Hubbard model is investigated on the square lattice in the thermodynamical limit. The correlation function is calculated from Gaussian fluctuations around the paramagnetic saddle-point within the Kotliar and Ruckenstein slave-boson representation. The next-nearest-neighbor hopping only slightly affects the renormalization of the quasiparticle mass. In contrast a negative t'/t notably decreases (increases) their velocity, and hence the zero-sound velocity, at positive (negative) doping. For low (high) density n ≲ 0.5 (n ≳ 1.5) we find that it enhances (reduces) the damping of the zero-sound mode. Furthermore it softens (hardens) the upper-Hubbard-band collective mode at positive (negative) doping. It is also shown that our results differ markedly from the random-phase approximation in the strong-coupling limit, even at high doping, while they compare favorably with existing quantum Monte Carlo numerical simulations.

  6. The global Minmax k-means algorithm.

    PubMed

    Wang, Xiaoyan; Bai, Yanping

    2016-01-01

    The global k -means algorithm is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure from suitable initial positions, and employs k -means to minimize the sum of the intra-cluster variances. However the global k -means algorithm sometimes results singleton clusters and the initial positions sometimes are bad, after a bad initialization, poor local optimal can be easily obtained by k -means algorithm. In this paper, we modified the global k -means algorithm to eliminate the singleton clusters at first, and then we apply MinMax k -means clustering error method to global k -means algorithm to overcome the effect of bad initialization, proposed the global Minmax k -means algorithm. The proposed clustering method is tested on some popular data sets and compared to the k -means algorithm, the global k -means algorithm and the MinMax k -means algorithm. The experiment results show our proposed algorithm outperforms other algorithms mentioned in the paper.

  7. A Parametric k-Means Algorithm

    PubMed Central

    Tarpey, Thaddeus

    2007-01-01

    Summary The k points that optimally represent a distribution (usually in terms of a squared error loss) are called the k principal points. This paper presents a computationally intensive method that automatically determines the principal points of a parametric distribution. Cluster means from the k-means algorithm are nonparametric estimators of principal points. A parametric k-means approach is introduced for estimating principal points by running the k-means algorithm on a very large simulated data set from a distribution whose parameters are estimated using maximum likelihood. Theoretical and simulation results are presented comparing the parametric k-means algorithm to the usual k-means algorithm and an example on determining sizes of gas masks is used to illustrate the parametric k-means algorithm. PMID:17917692

  8. Statistical analysis for validating ACO-KNN algorithm as feature selection in sentiment analysis

    NASA Astrophysics Data System (ADS)

    Ahmad, Siti Rohaidah; Yusop, Nurhafizah Moziyana Mohd; Bakar, Azuraliza Abu; Yaakub, Mohd Ridzwan

    2017-10-01

    This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbor (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, IG-GA, and IG-RSAR algorithms. This study evaluated the performance of the ACO-KNN algorithm using precision, recall, and F-score, which were validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.

  9. Fast Query-Optimized Kernel-Machine Classification

    NASA Technical Reports Server (NTRS)

    Mazzoni, Dominic; DeCoste, Dennis

    2004-01-01

    A recently developed algorithm performs kernel-machine classification via incremental approximate nearest support vectors. The algorithm implements support-vector machines (SVMs) at speeds 10 to 100 times those attainable by use of conventional SVM algorithms. The algorithm offers potential benefits for classification of images, recognition of speech, recognition of handwriting, and diverse other applications in which there are requirements to discern patterns in large sets of data. SVMs constitute a subset of kernel machines (KMs), which have become popular as models for machine learning and, more specifically, for automated classification of input data on the basis of labeled training data. While similar in many ways to k-nearest-neighbors (k-NN) models and artificial neural networks (ANNs), SVMs tend to be more accurate. Using representations that scale only linearly in the numbers of training examples, while exploring nonlinear (kernelized) feature spaces that are exponentially larger than the original input dimensionality, KMs elegantly and practically overcome the classic curse of dimensionality. However, the price that one must pay for the power of KMs is that query-time complexity scales linearly with the number of training examples, making KMs often orders of magnitude more computationally expensive than are ANNs, decision trees, and other popular machine learning alternatives. The present algorithm treats an SVM classifier as a special form of a k-NN. The algorithm is based partly on an empirical observation that one can often achieve the same classification as that of an exact KM by using only small fraction of the nearest support vectors (SVs) of a query. The exact KM output is a weighted sum over the kernel values between the query and the SVs. In this algorithm, the KM output is approximated with a k-NN classifier, the output of which is a weighted sum only over the kernel values involving k selected SVs. Before query time, there are gathered

  10. DichroMatch at the protein circular dichroism data bank (DM@PCDDB): A web-based tool for identifying protein nearest neighbors using circular dichroism spectroscopy.

    PubMed

    Whitmore, Lee; Mavridis, Lazaros; Wallace, B A; Janes, Robert W

    2018-01-01

    Circular dichroism spectroscopy is a well-used, but simple method in structural biology for providing information on the secondary structure and folds of proteins. DichroMatch (DM@PCDDB) is an online tool that is newly available in the Protein Circular Dichroism Data Bank (PCDDB), which takes advantage of the wealth of spectral and metadata deposited therein, to enable identification of spectral nearest neighbors of a query protein based on four different methods of spectral matching. DM@PCDDB can potentially provide novel information about structural relationships between proteins and can be used in comparison studies of protein homologs and orthologs. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  11. Characterization of 3D Voronoi Tessellation Nearest Neighbor Lipid Shells Provides Atomistic Lipid Disruption Profile of Protein Containing Lipid Membranes

    PubMed Central

    Cheng, Sara Y.; Duong, Hai V.; Compton, Campbell; Vaughn, Mark W.; Nguyen, Hoa; Cheng, Kwan H.

    2015-01-01

    Quantifying protein-induced lipid disruptions at the atomistic level is a challenging problem in membrane biophysics. Here we propose a novel 3D Voronoi tessellation nearest-atom-neighbor shell method to classify and characterize lipid domains into discrete concentric lipid shells surrounding membrane proteins in structurally heterogeneous lipid membranes. This method needs only the coordinates of the system and is independent of force fields and simulation conditions. As a proof-of-principle, we use this multiple lipid shell method to analyze the lipid disruption profiles of three simulated membrane systems: phosphatidylcholine, phosphatidylcholine/cholesterol, and beta-amyloid/phosphatidylcholine/cholesterol. We observed different atomic volume disruption mechanisms due to cholesterol and beta-amyloid Additionally, several lipid fractional groups and lipid-interfacial water did not converge to their control values with increasing distance or shell order from the protein. This volume divergent behavior was confirmed by bilayer thickness and chain orientational order calculations. Our method can also be used to analyze high-resolution structural experimental data. PMID:25637891

  12. Activity Recognition in Egocentric video using SVM, kNN and Combined SVMkNN Classifiers

    NASA Astrophysics Data System (ADS)

    Sanal Kumar, K. P.; Bhavani, R., Dr.

    2017-08-01

    Egocentric vision is a unique perspective in computer vision which is human centric. The recognition of egocentric actions is a challenging task which helps in assisting elderly people, disabled patients and so on. In this work, life logging activity videos are taken as input. There are 2 categories, first one is the top level and second one is second level. Here, the recognition is done using the features like Histogram of Oriented Gradients (HOG), Motion Boundary Histogram (MBH) and Trajectory. The features are fused together and it acts as a single feature. The extracted features are reduced using Principal Component Analysis (PCA). The features that are reduced are provided as input to the classifiers like Support Vector Machine (SVM), k nearest neighbor (kNN) and combined Support Vector Machine (SVM) and k Nearest Neighbor (kNN) (combined SVMkNN). These classifiers are evaluated and the combined SVMkNN provided better results than other classifiers in the literature.

  13. A class of parallel algorithms for computation of the manipulator inertia matrix

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Bejczy, Antal K.

    1989-01-01

    Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.

  14. Attention Recognition in EEG-Based Affective Learning Research Using CFS+KNN Algorithm.

    PubMed

    Hu, Bin; Li, Xiaowei; Sun, Shuting; Ratcliffe, Martyn

    2018-01-01

    The research detailed in this paper focuses on the processing of Electroencephalography (EEG) data to identify attention during the learning process. The identification of affect using our procedures is integrated into a simulated distance learning system that provides feedback to the user with respect to attention and concentration. The authors propose a classification procedure that combines correlation-based feature selection (CFS) and a k-nearest-neighbor (KNN) data mining algorithm. To evaluate the CFS+KNN algorithm, it was test against CFS+C4.5 algorithm and other classification algorithms. The classification performance was measured 10 times with different 3-fold cross validation data. The data was derived from 10 subjects while they were attempting to learn material in a simulated distance learning environment. A self-assessment model of self-report was used with a single valence to evaluate attention on 3 levels (high, neutral, low). It was found that CFS+KNN had a much better performance, giving the highest correct classification rate (CCR) of % for the valence dimension divided into three classes.

  15. Meat and Fish Freshness Inspection System Based on Odor Sensing

    PubMed Central

    Hasan, Najam ul; Ejaz, Naveed; Ejaz, Waleed; Kim, Hyung Seok

    2012-01-01

    We propose a method for building a simple electronic nose based on commercially available sensors used to sniff in the market and identify spoiled/contaminated meat stocked for sale in butcher shops. Using a metal oxide semiconductor-based electronic nose, we measured the smell signature from two of the most common meat foods (beef and fish) stored at room temperature. Food samples were divided into two groups: fresh beef with decayed fish and fresh fish with decayed beef. The prime objective was to identify the decayed item using the developed electronic nose. Additionally, we tested the electronic nose using three pattern classification algorithms (artificial neural network, support vector machine and k-nearest neighbor), and compared them based on accuracy, sensitivity, and specificity. The results demonstrate that the k-nearest neighbor algorithm has the highest accuracy. PMID:23202222

  16. Vehicle Classification Using an Imbalanced Dataset Based on a Single Magnetic Sensor.

    PubMed

    Xu, Chang; Wang, Yingguan; Bao, Xinghe; Li, Fengrong

    2018-05-24

    This paper aims to improve the accuracy of automatic vehicle classifiers for imbalanced datasets. Classification is made through utilizing a single anisotropic magnetoresistive sensor, with the models of vehicles involved being classified into hatchbacks, sedans, buses, and multi-purpose vehicles (MPVs). Using time domain and frequency domain features in combination with three common classification algorithms in pattern recognition, we develop a novel feature extraction method for vehicle classification. These three common classification algorithms are the k-nearest neighbor, the support vector machine, and the back-propagation neural network. Nevertheless, a problem remains with the original vehicle magnetic dataset collected being imbalanced, and may lead to inaccurate classification results. With this in mind, we propose an approach called SMOTE, which can further boost the performance of classifiers. Experimental results show that the k-nearest neighbor (KNN) classifier with the SMOTE algorithm can reach a classification accuracy of 95.46%, thus minimizing the effect of the imbalance.

  17. A semi-supervised classification algorithm using the TAD-derived background as training data

    NASA Astrophysics Data System (ADS)

    Fan, Lei; Ambeau, Brittany; Messinger, David W.

    2013-05-01

    In general, spectral image classification algorithms fall into one of two categories: supervised and unsupervised. In unsupervised approaches, the algorithm automatically identifies clusters in the data without a priori information about those clusters (except perhaps the expected number of them). Supervised approaches require an analyst to identify training data to learn the characteristics of the clusters such that they can then classify all other pixels into one of the pre-defined groups. The classification algorithm presented here is a semi-supervised approach based on the Topological Anomaly Detection (TAD) algorithm. The TAD algorithm defines background components based on a mutual k-Nearest Neighbor graph model of the data, along with a spectral connected components analysis. Here, the largest components produced by TAD are used as regions of interest (ROI's),or training data for a supervised classification scheme. By combining those ROI's with a Gaussian Maximum Likelihood (GML) or a Minimum Distance to the Mean (MDM) algorithm, we are able to achieve a semi supervised classification method. We test this classification algorithm against data collected by the HyMAP sensor over the Cooke City, MT area and University of Pavia scene.

  18. A Novel Color Image Encryption Algorithm Based on Quantum Chaos Sequence

    NASA Astrophysics Data System (ADS)

    Liu, Hui; Jin, Cong

    2017-03-01

    In this paper, a novel algorithm of image encryption based on quantum chaotic is proposed. The keystreams are generated by the two-dimensional logistic map as initial conditions and parameters. And then general Arnold scrambling algorithm with keys is exploited to permute the pixels of color components. In diffusion process, a novel encryption algorithm, folding algorithm, is proposed to modify the value of diffused pixels. In order to get the high randomness and complexity, the two-dimensional logistic map and quantum chaotic map are coupled with nearest-neighboring coupled-map lattices. Theoretical analyses and computer simulations confirm that the proposed algorithm has high level of security.

  19. Fast clustering algorithm for large ECG data sets based on CS theory in combination with PCA and K-NN methods.

    PubMed

    Balouchestani, Mohammadreza; Krishnan, Sridhar

    2014-01-01

    Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.

  20. An integrated classifier for computer-aided diagnosis of colorectal polyps based on random forest and location index strategies

    NASA Astrophysics Data System (ADS)

    Hu, Yifan; Han, Hao; Zhu, Wei; Li, Lihong; Pickhardt, Perry J.; Liang, Zhengrong

    2016-03-01

    Feature classification plays an important role in differentiation or computer-aided diagnosis (CADx) of suspicious lesions. As a widely used ensemble learning algorithm for classification, random forest (RF) has a distinguished performance for CADx. Our recent study has shown that the location index (LI), which is derived from the well-known kNN (k nearest neighbor) and wkNN (weighted k nearest neighbor) classifier [1], has also a distinguished role in the classification for CADx. Therefore, in this paper, based on the property that the LI will achieve a very high accuracy, we design an algorithm to integrate the LI into RF for improved or higher value of AUC (area under the curve of receiver operating characteristics -- ROC). Experiments were performed by the use of a database of 153 lesions (polyps), including 116 neoplastic lesions and 37 hyperplastic lesions, with comparison to the existing classifiers of RF and wkNN, respectively. A noticeable gain by the proposed integrated classifier was quantified by the AUC measure.

  1. Portable Language-Independent Adaptive Translation from OCR. Phase 1

    DTIC Science & Technology

    2009-04-01

    including brute-force k-Nearest Neighbors ( kNN ), fast approximate kNN using hashed k-d trees, classification and regression trees, and locality...achieved by refinements in ground-truthing protocols. Recent algorithmic improvements to our approximate kNN classifier using hashed k-D trees allows...recent years discriminative training has been shown to outperform phonetic HMMs estimated using ML for speech recognition. Standard ML estimation

  2. Smart BIT/TSMD Integration

    DTIC Science & Technology

    1991-12-01

    user using the ’: knn ’ option in the do-scenario command line). An instance of the K-Nearest Neighbor object is first created and initialized before...Navigation Computer HF High Frequency ILS Instrument Landing System KNN K - Nearest Neighbor LRU Line Replaceable Unit MC Mission Computer MTCA...approaches have been investigated here, K-nearest Neighbors ( KNN ) and neural networks (NN). Both approaches require that previously classified examples of

  3. K-Nearest Neighbors Relevance Annotation Model for Distance Education

    ERIC Educational Resources Information Center

    Ke, Xiao; Li, Shaozi; Cao, Donglin

    2011-01-01

    With the rapid development of Internet technologies, distance education has become a popular educational mode. In this paper, the authors propose an online image automatic annotation distance education system, which could effectively help children learn interrelations between image content and corresponding keywords. Image automatic annotation is…

  4. Quantum soliton in 1D Heisenberg spin chains with Dzyaloshinsky-Moriya and next-nearest-neighbor interactions.

    PubMed

    Djoufack, Z I; Tala-Tebue, E; Nguenang, J P; Kenfack-Jiotsa, A

    2016-10-01

    We report in this work, an analytical study of quantum soliton in 1D Heisenberg spin chains with Dzyaloshinsky-Moriya Interaction (DMI) and Next-Nearest-Neighbor Interactions (NNNI). By means of the time-dependent Hartree approximation and the semi-discrete multiple-scale method, the equation of motion for the single-boson wave function is reduced to the nonlinear Schrödinger equation. It comes from this present study that the spectrum of the frequencies increases, its periodicity changes, in the presence of NNNI. The antisymmetric feature of the DMI was probed from the dispersion curve while changing the sign of the parameter controlling it. Five regions were identified in the dispersion spectrum, when the NNNI are taken into account instead of three as in the opposite case. In each of these regions, the quantum model can exhibit quantum stationary localized and stable bright or dark soliton solutions. In each region, we could set up quantum localized n-boson Hartree states as well as the analytical expression of their energy level, respectively. The accuracy of the analytical studies is confirmed by the excellent agreement with the numerical calculations, and it certifies the stability of the stationary quantum localized solitons solutions exhibited in each region. In addition, we found that the intensity of the localization of quantum localized n-boson Hartree states increases when the NNNI are considered. We also realized that the intensity of Hartree n-boson states corresponding to quantum discrete soliton states depend on the wave vector.

  5. A Scalable O(N) Algorithm for Large-Scale Parallel First-Principles Molecular Dynamics Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Osei-Kuffuor, Daniel; Fattebert, Jean-Luc

    2014-01-01

    Traditional algorithms for first-principles molecular dynamics (FPMD) simulations only gain a modest capability increase from current petascale computers, due to their O(N 3) complexity and their heavy use of global communications. To address this issue, we are developing a truly scalable O(N) complexity FPMD algorithm, based on density functional theory (DFT), which avoids global communications. The computational model uses a general nonorthogonal orbital formulation for the DFT energy functional, which requires knowledge of selected elements of the inverse of the associated overlap matrix. We present a scalable algorithm for approximately computing selected entries of the inverse of the overlap matrix,more » based on an approximate inverse technique, by inverting local blocks corresponding to principal submatrices of the global overlap matrix. The new FPMD algorithm exploits sparsity and uses nearest neighbor communication to provide a computational scheme capable of extreme scalability. Accuracy is controlled by the mesh spacing of the finite difference discretization, the size of the localization regions in which the electronic orbitals are confined, and a cutoff beyond which the entries of the overlap matrix can be omitted when computing selected entries of its inverse. We demonstrate the algorithm's excellent parallel scaling for up to O(100K) atoms on O(100K) processors, with a wall-clock time of O(1) minute per molecular dynamics time step.« less

  6. Three Dimensional Object Recognition Using a Complex Autoregressive Model

    DTIC Science & Technology

    1993-12-01

    3.4.2 Template Matching Algorithm ...................... 3-16 3.4.3 K-Nearest-Neighbor ( KNN ) Techniques ................. 3-25 3.4.4 Hidden Markov Model...Neighbor ( KNN ) Test Results ...................... 4-13 4.2.1 Single-Look 1-NN Testing .......................... 4-14 4.2.2 Multiple-Look 1-NN Testing...4-15 4.2.3 Discussion of KNN Test Results ...................... 4-15 4.3 Hidden Markov Model (HMM) Test Results

  7. A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data.

    PubMed

    Baur, Brittany; Bozdag, Serdar

    2016-01-01

    DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.

  8. Quasiclassical description of the nearest-neighbor hopping dc conduction via hydrogen-like donors in intermediately compensated GaAs crystals

    NASA Astrophysics Data System (ADS)

    Poklonski, N. A.; Vyrko, S. A.; Zabrodskii, A. G.

    2010-08-01

    Expressions for the pre-exponential factor σ3 and the thermal activation energy ɛ3 of hopping electric conductivity of electrons via hydrogen-like donors in n-type gallium arsenide are obtained in the quasiclassical approximation. Crystals with the donor concentration N and the acceptor concentration KN at the intermediate compensation ratio K (approximately from 0.25 to 0.75) are considered. We assume that the donors in the charge states (0) and (+1) and the acceptors in the charge state (-1) form a joint nonstoichiometric simple cubic 'sublattice' within the crystalline matrix. In such sublattice the distance between nearest impurity atoms is Rh = [(1 + K)N]-1/3 which is also the length of an electron hop between donors. To take into account orientational disorder of hops we assume that the impurity sublattice randomly and smoothly changes orientation inside a macroscopic sample. Values of σ3(N) and ɛ3(N) calculated for the temperature of 2.5 K agree with known experimental data at the insulator side of the insulator-metal phase transition.

  9. A Proposed Methodology to Classify Frontier Capital Markets

    DTIC Science & Technology

    2011-07-31

    but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project involves basic...machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ), ensemble...Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F. Horizontal

  10. A Proposed Methodology to Classify Frontier Capital Markets

    DTIC Science & Technology

    2011-07-31

    out of charity, but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project...identification, and machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ...Support Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F

  11. Phase diagram and quantum order by disorder in the Kitaev K1-K2 honeycomb magnet

    NASA Astrophysics Data System (ADS)

    Rousochatzakis, Ioannis; Reuther, Johannes; Thomale, Ronny; Rachel, Stephan; Perkins, Natalia

    We show that the topological Kitaev spin liquid on the honeycomb lattice is extremely fragile against the second neighbor Kitaev coupling K2, which has been recently identified as the dominant perturbation away from the nearest neighbor model in iridate Na2IrO3, and may also play a role in α-RuCl3. This coupling explains naturally the zig-zag ordering and the special entanglement between real and spin space observed recently in Na2IrO3. The minimal K1-K2 model that we present here holds in addition the unique property that the classical and quantum phase diagrams and their respective order-by-disorder mechanisms are qualitatively different due to their fundamentally different symmetry structure. Nsf DMR-1511768; Freie Univ. Berlin Excellence Initiative of German Research Foundation; European Research Council, ERC-StG-336012; DFG-SFB 1170; DFG-SFB 1143, DFG-SPP 1666, and Helmholtz association VI-521.

  12. Providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer

    DOEpatents

    Archer, Charles J.; Faraj, Ahmad A.; Inglett, Todd A.; Ratterman, Joseph D.

    2012-10-23

    Methods, apparatus, and products are disclosed for providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: identifying each link in the global combining network for each compute node of the operational group; designating one of a plurality of point-to-point class routing identifiers for each link such that no compute node in the operational group is connected to two adjacent compute nodes in the operational group with links designated for the same class routing identifiers; and configuring each compute node of the operational group for point-to-point communications with each adjacent compute node in the global combining network through the link between that compute node and that adjacent compute node using that link's designated class routing identifier.

  13. Identification of Anisomerous Motor Imagery EEG Signals Based on Complex Algorithms

    PubMed Central

    Zhang, Zhiwen; Duan, Feng; Zhou, Xin; Meng, Zixuan

    2017-01-01

    Motor imagery (MI) electroencephalograph (EEG) signals are widely applied in brain-computer interface (BCI). However, classified MI states are limited, and their classification accuracy rates are low because of the characteristics of nonlinearity and nonstationarity. This study proposes a novel MI pattern recognition system that is based on complex algorithms for classifying MI EEG signals. In electrooculogram (EOG) artifact preprocessing, band-pass filtering is performed to obtain the frequency band of MI-related signals, and then, canonical correlation analysis (CCA) combined with wavelet threshold denoising (WTD) is used for EOG artifact preprocessing. We propose a regularized common spatial pattern (R-CSP) algorithm for EEG feature extraction by incorporating the principle of generic learning. A new classifier combining the K-nearest neighbor (KNN) and support vector machine (SVM) approaches is used to classify four anisomerous states, namely, imaginary movements with the left hand, right foot, and right shoulder and the resting state. The highest classification accuracy rate is 92.5%, and the average classification accuracy rate is 87%. The proposed complex algorithm identification method can significantly improve the identification rate of the minority samples and the overall classification performance. PMID:28874909

  14. A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

    ERIC Educational Resources Information Center

    Chahine, Firas Safwan

    2012-01-01

    Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…

  15. Electromagnetic Induction Spectroscopy for the Detection of Subsurface Targets

    DTIC Science & Technology

    2012-12-01

    curves of the proposed method and that of Fails et al.. For the kNN ROC curve, k = 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81...et al. [6] and Ramachandran et al. [7] both demonstrated success in detecting mines using the k-nearest-neighbor ( kNN ) algorithm based on the EMI...error is also included in the feature vector. The kNN labels an unknown target based on the closest targets in a training set. Collins et al. [2] and

  16. A novel feature ranking algorithm for biometric recognition with PPG signals.

    PubMed

    Reşit Kavsaoğlu, A; Polat, Kemal; Recep Bozkurt, M

    2014-06-01

    This study is intended for describing the application of the Photoplethysmography (PPG) signal and the time domain features acquired from its first and second derivatives for biometric identification. For this purpose, a sum of 40 features has been extracted and a feature-ranking algorithm is proposed. This proposed algorithm calculates the contribution of each feature to biometric recognition and collocates the features, the contribution of which is from great to small. While identifying the contribution of the features, the Euclidean distance and absolute distance formulas are used. The efficiency of the proposed algorithms is demonstrated by the results of the k-NN (k-nearest neighbor) classifier applications of the features. During application, each 15-period-PPG signal belonging to two different durations from each of the thirty healthy subjects were used with a PPG data acquisition card. The first PPG signals recorded from the subjects were evaluated as the 1st configuration; the PPG signals recorded later at a different time as the 2nd configuration and the combination of both were evaluated as the 3rd configuration. When the results were evaluated for the k-NN classifier model created along with the proposed algorithm, an identification of 90.44% for the 1st configuration, 94.44% for the 2nd configuration, and 87.22% for the 3rd configuration has successfully been attained. The obtained results showed that both the proposed algorithm and the biometric identification model based on this developed PPG signal are very promising for contactless recognizing the people with the proposed method. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. Comparative Analysis of Document level Text Classification Algorithms using R

    NASA Astrophysics Data System (ADS)

    Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.

    2017-08-01

    From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Majidpour, Mostafa; Qiu, Charlie; Chu, Peter

    Three algorithms for the forecasting of energy consumption at individual EV charging outlets have been applied to real world data from the UCLA campus. Out of these three algorithms, namely k-Nearest Neighbor (kNN), ARIMA, and Pattern Sequence Forecasting (PSF), kNN with k=1, was the best and PSF was the worst performing algorithm with respect to the SMAPE measure. The advantage of PSF is its increased robustness to noise by substituting the real valued time series with an integer valued one, and the advantage of NN is having the least SMAPE for our data. We propose a Modified PSF algorithm (MPSF)more » which is a combination of PSF and NN; it could be interpreted as NN on integer valued data or as PSF with considering only the most recent neighbor to produce the output. Some other shortcomings of PSF are also addressed in the MPSF. Results show that MPSF has improved the forecast performance.« less

  19. A Vectorized ’Nearest-NeighborsAlgorithm of Order N Using a Monotonic Logical Grid

    DTIC Science & Technology

    1985-05-29

    Computational Phy’sics 0 4 May 29 , 1985 This work was supported by the Office of Naval Research. . ~ Q~JUN 1719851- * NAVAL RESEARCH LABORATORY * lit...YE ’.ARK:NGS UNCLASSIFIED_______________ _____ K -. A R, ~ CA7,ON 4, 71CO1, 3 :)-S7R,9U-ON AdA,.A3:L ’Y OF REPOR7Io - EC..ASi.’ CA27 ON., DOWNGAZING...Year. Month. Day) 5 PAGE COUNT Interim FROM -____ Toi__ 1985 May 29 50 𔄀 SuPPILSMENTARY NOTATION This work was supported by the Office of Naval

  20. Latent Dirichlet Allocation (LDA) Model and kNN Algorithm to Classify Research Project Selection

    NASA Astrophysics Data System (ADS)

    Safi’ie, M. A.; Utami, E.; Fatta, H. A.

    2018-03-01

    Universitas Sebelas Maret has a teaching staff more than 1500 people, and one of its tasks is to carry out research. In the other side, the funding support for research and service is limited, so there is need to be evaluated to determine the Research proposal submission and devotion on society (P2M). At the selection stage, research proposal documents are collected as unstructured data and the data stored is very large. To extract information contained in the documents therein required text mining technology. This technology applied to gain knowledge to the documents by automating the information extraction. In this articles we use Latent Dirichlet Allocation (LDA) to the documents as a model in feature extraction process, to get terms that represent its documents. Hereafter we use k-Nearest Neighbour (kNN) algorithm to classify the documents based on its terms.

  1. Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data.

    PubMed

    Rahman, Shah Atiqur; Huang, Yuxiao; Claassen, Jan; Heintzman, Nathaniel; Kleinberg, Samantha

    2015-12-01

    Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Rule groupings in expert systems using nearest neighbour decision rules, and convex hulls

    NASA Technical Reports Server (NTRS)

    Anastasiadis, Stergios

    1991-01-01

    Expert System shells are lacking in many areas of software engineering. Large rule based systems are not semantically comprehensible, difficult to debug, and impossible to modify or validate. Partitioning a set of rules found in CLIPS (C Language Integrated Production System) into groups of rules which reflect the underlying semantic subdomains of the problem, will address adequately the concerns stated above. Techniques are introduced to structure a CLIPS rule base into groups of rules that inherently have common semantic information. The concepts involved are imported from the field of A.I., Pattern Recognition, and Statistical Inference. Techniques focus on the areas of feature selection, classification, and a criteria of how 'good' the classification technique is, based on Bayesian Decision Theory. A variety of distance metrics are discussed for measuring the 'closeness' of CLIPS rules and various Nearest Neighbor classification algorithms are described based on the above metric.

  3. Using machine learning algorithms to guide rehabilitation planning for home care clients.

    PubMed

    Zhu, Mu; Zhang, Zhanyang; Hirdes, John P; Stolee, Paul

    2007-12-20

    Targeting older clients for rehabilitation is a clinical challenge and a research priority. We investigate the potential of machine learning algorithms - Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) - to guide rehabilitation planning for home care clients. This study is a secondary analysis of data on 24,724 longer-term clients from eight home care programs in Ontario. Data were collected with the RAI-HC assessment system, in which the Activities of Daily Living Clinical Assessment Protocol (ADLCAP) is used to identify clients with rehabilitation potential. For study purposes, a client is defined as having rehabilitation potential if there was: i) improvement in ADL functioning, or ii) discharge home. SVM and KNN results are compared with those obtained using the ADLCAP. For comparison, the machine learning algorithms use the same functional and health status indicators as the ADLCAP. The KNN and SVM algorithms achieved similar substantially improved performance over the ADLCAP, although false positive and false negative rates were still fairly high (FP > .18, FN > .34 versus FP > .29, FN. > .58 for ADLCAP). Results are used to suggest potential revisions to the ADLCAP. Machine learning algorithms achieved superior predictions than the current protocol. Machine learning results are less readily interpretable, but can also be used to guide development of improved clinical protocols.

  4. Predicting missing links in complex networks based on common neighbors and distance

    PubMed Central

    Yang, Jinxuan; Zhang, Xiao-Dong

    2016-01-01

    The algorithms based on common neighbors metric to predict missing links in complex networks are very popular, but most of these algorithms do not account for missing links between nodes with no common neighbors. It is not accurate enough to reconstruct networks by using these methods in some cases especially when between nodes have less common neighbors. We proposed in this paper a new algorithm based on common neighbors and distance to improve accuracy of link prediction. Our proposed algorithm makes remarkable effect in predicting the missing links between nodes with no common neighbors and performs better than most existing currently used methods for a variety of real-world networks without increasing complexity. PMID:27905526

  5. On the classification techniques in data mining for microarray data classification

    NASA Astrophysics Data System (ADS)

    Aydadenta, Husna; Adiwijaya

    2018-03-01

    Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.

  6. Classification of multispectral image data by the Binary Diamond neural network and by nonparametric, pixel-by-pixel methods

    NASA Technical Reports Server (NTRS)

    Salu, Yehuda; Tilton, James

    1993-01-01

    The classification of multispectral image data obtained from satellites has become an important tool for generating ground cover maps. This study deals with the application of nonparametric pixel-by-pixel classification methods in the classification of pixels, based on their multispectral data. A new neural network, the Binary Diamond, is introduced, and its performance is compared with a nearest neighbor algorithm and a back-propagation network. The Binary Diamond is a multilayer, feed-forward neural network, which learns from examples in unsupervised, 'one-shot' mode. It recruits its neurons according to the actual training set, as it learns. The comparisons of the algorithms were done by using a realistic data base, consisting of approximately 90,000 Landsat 4 Thematic Mapper pixels. The Binary Diamond and the nearest neighbor performances were close, with some advantages to the Binary Diamond. The performance of the back-propagation network lagged behind. An efficient nearest neighbor algorithm, the binned nearest neighbor, is described. Ways for improving the performances, such as merging categories, and analyzing nonboundary pixels, are addressed and evaluated.

  7. Nation-Building Modeling and Resource Allocation Via Dynamic Programming

    DTIC Science & Technology

    2014-09-01

    Figure 2. RAND Study Models[59:98,115] (WMA) and used both the k-Nearest Neighbor ( KNN ) and Nearest Centroid (NC) algorithms to classify future features...The study found that KNN performed bet- ter than NC with 85% or greater accuracy in all test cases. The methodology was adopted for use under the...analysis feature of the model. 3.7.1 The No Surge Alternative. On the 10th of January 2007, President George W. Bush delivered a speech to the American

  8. Understanding the Instruments of National Power through a System of Differential Equations in a Counterinsurgency

    DTIC Science & Technology

    2012-03-01

    WMA) and used both the k-Nearest Neighbor ( KNN ) and Nearest Centroid 27 (a) Coalition and Regional (b) Indigenous Figure 3. RAND Study Models[32:98,115...NC) algorithms to classify future features. The study found that KNN performed better than NC with 85% or greater accuracy in all test cases. The...the model. 4.2.1 No Surge. On the 10th of January 2007, President George W. Bush delivered a speech to the American Public outlining a new strategy in

  9. Improved collaborative filtering recommendation algorithm of similarity measure

    NASA Astrophysics Data System (ADS)

    Zhang, Baofu; Yuan, Baoping

    2017-05-01

    The Collaborative filtering recommendation algorithm is one of the most widely used recommendation algorithm in personalized recommender systems. The key is to find the nearest neighbor set of the active user by using similarity measure. However, the methods of traditional similarity measure mainly focus on the similarity of user common rating items, but ignore the relationship between the user common rating items and all items the user rates. And because rating matrix is very sparse, traditional collaborative filtering recommendation algorithm is not high efficiency. In order to obtain better accuracy, based on the consideration of common preference between users, the difference of rating scale and score of common items, this paper presents an improved similarity measure method, and based on this method, a collaborative filtering recommendation algorithm based on similarity improvement is proposed. Experimental results show that the algorithm can effectively improve the quality of recommendation, thus alleviate the impact of data sparseness.

  10. Geometry-based populated chessboard recognition

    NASA Astrophysics Data System (ADS)

    Xie, Youye; Tang, Gongguo; Hoff, William

    2018-04-01

    Chessboards are commonly used to calibrate cameras, and many robust methods have been developed to recognize the unoccupied boards. However, when the chessboard is populated with chess pieces, such as during an actual game, the problem of recognizing the board is much harder. Challenges include occlusion caused by the chess pieces, the presence of outlier lines and low viewing angles of the chessboard. In this paper, we present a novel approach to address the above challenges and recognize the chessboard. The Canny edge detector and Hough transform are used to capture all possible lines in the scene. The k-means clustering and a k-nearest-neighbors inspired algorithm are applied to cluster and reject the outlier lines based on their Euclidean distances to the nearest neighbors in a scaled Hough transform space. Finally, based on prior knowledge of the chessboard structure, a geometric constraint is used to find the correspondences between image lines and the lines on the chessboard through the homography transformation. The proposed algorithm works for a wide range of the operating angles and achieves high accuracy in experiments.

  11. Phase diagrams and free-energy landscapes for model spin-crossover materials with antiferromagnetic-like nearest-neighbor and ferromagnetic-like long-range interactions

    NASA Astrophysics Data System (ADS)

    Chan, C. H.; Brown, G.; Rikvold, P. A.

    2017-11-01

    We present phase diagrams, free-energy landscapes, and order-parameter distributions for a model spin-crossover material with a two-step transition between the high-spin and low-spin states (a square-lattice Ising model with antiferromagnetic-like nearest-neighbor and ferromagnetic-like long-range interactions) [P. A. Rikvold et al., Phys. Rev. B 93, 064109 (2016), 10.1103/PhysRevB.93.064109]. The results are obtained by a recently introduced, macroscopically constrained Wang-Landau Monte Carlo simulation method [Phys. Rev. E 95, 053302 (2017), 10.1103/PhysRevE.95.053302]. The method's computational efficiency enables calculation of thermodynamic quantities for a wide range of temperatures, applied fields, and long-range interaction strengths. For long-range interactions of intermediate strength, tricritical points in the phase diagrams are replaced by pairs of critical end points and mean-field critical points that give rise to horn-shaped regions of metastability. The corresponding free-energy landscapes offer insights into the nature of asymmetric, multiple hysteresis loops that have been experimentally observed in spin-crossover materials characterized by competing short-range interactions and long-range elastic interactions.

  12. Incremental k-core decomposition: Algorithms and evaluation

    DOE PAGES

    Sariyuce, Ahmet Erdem; Gedik, Bugra; Jacques-SIlva, Gabriela; ...

    2016-02-01

    A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-hard problems on real networks efficiently, like maximal clique finding. In many real-world applications, networks change over time. As a result, it is essential to develop efficient incremental algorithms for dynamic graph data. In this paper, we propose a suite of incremental k-core decomposition algorithms for dynamic graph data. These algorithms locate a small subgraph that ismore » guaranteed to contain the list of vertices whose maximum k-core values have changed and efficiently process this subgraph to update the k-core decomposition. We present incremental algorithms for both insertion and deletion operations, and propose auxiliary vertex state maintenance techniques that can further accelerate these operations. Our results show a significant reduction in runtime compared to non-incremental alternatives. We illustrate the efficiency of our algorithms on different types of real and synthetic graphs, at varying scales. Furthermore, for a graph of 16 million vertices, we observe relative throughputs reaching a million times, relative to the non-incremental algorithms.« less

  13. Solar Flare Prediction Model with Three Machine-learning Algorithms using Ultraviolet Brightening and Vector Magnetograms

    NASA Astrophysics Data System (ADS)

    Nishizuka, N.; Sugiura, K.; Kubo, Y.; Den, M.; Watari, S.; Ishii, M.

    2017-02-01

    We developed a flare prediction model using machine learning, which is optimized to predict the maximum class of flares occurring in the following 24 hr. Machine learning is used to devise algorithms that can learn from and make decisions on a huge amount of data. We used solar observation data during the period 2010-2015, such as vector magnetograms, ultraviolet (UV) emission, and soft X-ray emission taken by the Solar Dynamics Observatory and the Geostationary Operational Environmental Satellite. We detected active regions (ARs) from the full-disk magnetogram, from which ˜60 features were extracted with their time differentials, including magnetic neutral lines, the current helicity, the UV brightening, and the flare history. After standardizing the feature database, we fully shuffled and randomly separated it into two for training and testing. To investigate which algorithm is best for flare prediction, we compared three machine-learning algorithms: the support vector machine, k-nearest neighbors (k-NN), and extremely randomized trees. The prediction score, the true skill statistic, was higher than 0.9 with a fully shuffled data set, which is higher than that for human forecasts. It was found that k-NN has the highest performance among the three algorithms. The ranking of the feature importance showed that previous flare activity is most effective, followed by the length of magnetic neutral lines, the unsigned magnetic flux, the area of UV brightening, and the time differentials of features over 24 hr, all of which are strongly correlated with the flux emergence dynamics in an AR.

  14. Solar Flare Prediction Model with Three Machine-learning Algorithms using Ultraviolet Brightening and Vector Magnetograms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nishizuka, N.; Kubo, Y.; Den, M.

    We developed a flare prediction model using machine learning, which is optimized to predict the maximum class of flares occurring in the following 24 hr. Machine learning is used to devise algorithms that can learn from and make decisions on a huge amount of data. We used solar observation data during the period 2010–2015, such as vector magnetograms, ultraviolet (UV) emission, and soft X-ray emission taken by the Solar Dynamics Observatory and the Geostationary Operational Environmental Satellite . We detected active regions (ARs) from the full-disk magnetogram, from which ∼60 features were extracted with their time differentials, including magnetic neutralmore » lines, the current helicity, the UV brightening, and the flare history. After standardizing the feature database, we fully shuffled and randomly separated it into two for training and testing. To investigate which algorithm is best for flare prediction, we compared three machine-learning algorithms: the support vector machine, k-nearest neighbors (k-NN), and extremely randomized trees. The prediction score, the true skill statistic, was higher than 0.9 with a fully shuffled data set, which is higher than that for human forecasts. It was found that k-NN has the highest performance among the three algorithms. The ranking of the feature importance showed that previous flare activity is most effective, followed by the length of magnetic neutral lines, the unsigned magnetic flux, the area of UV brightening, and the time differentials of features over 24 hr, all of which are strongly correlated with the flux emergence dynamics in an AR.« less

  15. Progress in adapting k-NN methods for forest mapping and estimation using the new annual Forest Inventory and Analysis data

    Treesearch

    Reija Haapanen; Kimmo Lehtinen; Jukka Miettinen; Marvin E. Bauer; Alan R. Ek

    2002-01-01

    The k-nearest neighbor (k-NN) method has been undergoing development and testing for applications with USDA Forest Service Forest Inventory and Analysis (FIA) data in Minnesota since 1997. Research began using the 1987-1990 FIA inventory of the state, the then standard 10-point cluster plots, and Landsat TM imagery. In the past year, research has moved to examine...

  16. A Crowd-Sourcing Indoor Localization Algorithm via Optical Camera on a Smartphone Assisted by Wi-Fi Fingerprint RSSI.

    PubMed

    Chen, Wei; Wang, Weiping; Li, Qun; Chang, Qiang; Hou, Hongtao

    2016-03-19

    Indoor positioning based on existing Wi-Fi fingerprints is becoming more and more common. Unfortunately, the Wi-Fi fingerprint is susceptible to multiple path interferences, signal attenuation, and environmental changes, which leads to low accuracy. Meanwhile, with the recent advances in charge-coupled device (CCD) technologies and the processing speed of smartphones, indoor positioning using the optical camera on a smartphone has become an attractive research topic; however, the major challenge is its high computational complexity; as a result, real-time positioning cannot be achieved. In this paper we introduce a crowd-sourcing indoor localization algorithm via an optical camera and orientation sensor on a smartphone to address these issues. First, we use Wi-Fi fingerprint based on the K Weighted Nearest Neighbor (KWNN) algorithm to make a coarse estimation. Second, we adopt a mean-weighted exponent algorithm to fuse optical image features and orientation sensor data as well as KWNN in the smartphone to refine the result. Furthermore, a crowd-sourcing approach is utilized to update and supplement the positioning database. We perform several experiments comparing our approach with other positioning algorithms on a common smartphone to evaluate the performance of the proposed sensor-calibrated algorithm, and the results demonstrate that the proposed algorithm could significantly improve accuracy, stability, and applicability of positioning.

  17. Model-based mean square error estimators for k-nearest neighbour predictions and applications using remotely sensed data for forest inventories

    Treesearch

    Steen Magnussen; Ronald E. McRoberts; Erkki O. Tomppo

    2009-01-01

    New model-based estimators of the uncertainty of pixel-level and areal k-nearest neighbour (knn) predictions of attribute Y from remotely-sensed ancillary data X are presented. Non-parametric functions predict Y from scalar 'Single Index Model' transformations of X. Variance functions generated...

  18. A novel image retrieval algorithm based on PHOG and LSH

    NASA Astrophysics Data System (ADS)

    Wu, Hongliang; Wu, Weimin; Peng, Jiajin; Zhang, Junyuan

    2017-08-01

    PHOG can describe the local shape of the image and its relationship between the spaces. The using of PHOG algorithm to extract image features in image recognition and retrieval and other aspects have achieved good results. In recent years, locality sensitive hashing (LSH) algorithm has been superior to large-scale data in solving near-nearest neighbor problems compared with traditional algorithms. This paper presents a novel image retrieval algorithm based on PHOG and LSH. First, we use PHOG to extract the feature vector of the image, then use L different LSH hash table to reduce the dimension of PHOG texture to index values and map to different bucket, and finally extract the corresponding value of the image in the bucket for second image retrieval using Manhattan distance. This algorithm can adapt to the massive image retrieval, which ensures the high accuracy of the image retrieval and reduces the time complexity of the retrieval. This algorithm is of great significance.

  19. Exact ground states for the nearest neighbor quantum XXZ model on the kagome and other lattices with triangular motifs at Jz /Jxy = - 1 / 2

    NASA Astrophysics Data System (ADS)

    Changlani, Hitesh; Kumar, Krishna; Kochkov, Dmitrii; Fradkin, Eduardo; Clark, Bryan

    We report the existence of a quantum macroscopically degenerate ground state manifold on the nearest neighbor XXZ model on the kagome lattice at the point Jz /Jxy = - 1 / 2 . On many lattices with triangular motifs (including the kagome, sawtooth, icosidodecahedron and Shastry-Sutherland lattice for a certain choice of couplings) this Hamiltonian is found to be frustration-free with exact ground states which correspond to three-colorings of these lattices. Several results also generalize to the case of variable couplings and to other motifs (albeit with possibly more complex Hamiltonians). The degenerate manifold on the kagome lattice corresponds to a ''many-body flat band'' of interacting hard-core bosons; and for the one boson case our results also explain the well-known non-interacting flat band. On adding realistic perturbations, state selection in this manifold of quantum many-body states is discussed along with the implications for the phase diagram of the kagome lattice antiferromagnet. supported by DE-FG02-12ER46875, DMR 1408713, DE-FG02-08ER46544.

  20. Velocity correlations and spatial dependencies between neighbors in a unidirectional flow of pedestrians

    NASA Astrophysics Data System (ADS)

    Porzycki, Jakub; WÄ s, Jarosław; Hedayatifar, Leila; Hassanibesheli, Forough; Kułakowski, Krzysztof

    2017-08-01

    The aim of the paper is an analysis of self-organization patterns observed in the unidirectional flow of pedestrians. On the basis of experimental data from Zhang et al. [J. Zhang et al., J. Stat. Mech. (2011) P06004, 10.1088/1742-5468/2011/06/P06004], we analyze the mutual positions and velocity correlations between pedestrians when walking along a corridor. The angular and spatial dependencies of the mutual positions reveal a spatial structure that remains stable during the crowd motion. This structure differs depending on the value of n , for the consecutive n th -nearest-neighbor position set. The preferred position for the first-nearest neighbor is on the side of the pedestrian, while for further neighbors, this preference shifts to the axis of movement. The velocity correlations vary with the angle formed by the pair of neighboring pedestrians and the direction of motion and with the time delay between pedestrians' movements. The delay dependence of the correlations shows characteristic oscillations, produced by the velocity oscillations when striding; however, a filtering of the main frequency of individual striding out reduces the oscillations only partially. We conclude that pedestrians select their path directions so as to evade the necessity of continuously adjusting their speed to their neighbors'. They try to keep a given distance, but follow the person in front of them, as well as accepting and observing pedestrians on their sides. Additionally, we show an empirical example that illustrates the shape of a pedestrian's personal space during movement.

  1. A floor-map-aided WiFi/pseudo-odometry integration algorithm for an indoor positioning system.

    PubMed

    Wang, Jian; Hu, Andong; Liu, Chunyan; Li, Xin

    2015-03-24

    This paper proposes a scheme for indoor positioning by fusing floor map, WiFi and smartphone sensor data to provide meter-level positioning without additional infrastructure. A topology-constrained K nearest neighbor (KNN) algorithm based on a floor map layout provides the coordinates required to integrate WiFi data with pseudo-odometry (P-O) measurements simulated using a pedestrian dead reckoning (PDR) approach. One method of further improving the positioning accuracy is to use a more effective multi-threshold step detection algorithm, as proposed by the authors. The "go and back" phenomenon caused by incorrect matching of the reference points (RPs) of a WiFi algorithm is eliminated using an adaptive fading-factor-based extended Kalman filter (EKF), taking WiFi positioning coordinates, P-O measurements and fused heading angles as observations. The "cross-wall" problem is solved based on the development of a floor-map-aided particle filter algorithm by weighting the particles, thereby also eliminating the gross-error effects originating from WiFi or P-O measurements. The performance observed in a field experiment performed on the fourth floor of the School of Environmental Science and Spatial Informatics (SESSI) building on the China University of Mining and Technology (CUMT) campus confirms that the proposed scheme can reliably achieve meter-level positioning.

  2. Jastrow-like ground states for quantum many-body potentials with near-neighbors interactions

    NASA Astrophysics Data System (ADS)

    Baradaran, Marzieh; Carrasco, José A.; Finkel, Federico; González-López, Artemio

    2018-01-01

    We completely solve the problem of classifying all one-dimensional quantum potentials with nearest- and next-to-nearest-neighbors interactions whose ground state is Jastrow-like, i.e., of Jastrow type but depending only on differences of consecutive particles. In particular, we show that these models must necessarily contain a three-body interaction term, as was the case with all previously known examples. We discuss several particular instances of the general solution, including a new hyperbolic potential and a model with elliptic interactions which reduces to the known rational and trigonometric ones in appropriate limits.

  3. An Examination of Diameter Density Prediction with k-NN and Airborne Lidar

    DOE PAGES

    Strunk, Jacob L.; Gould, Peter J.; Packalen, Petteri; ...

    2017-11-16

    While lidar-based forest inventory methods have been widely demonstrated, performances of methods to predict tree diameters with airborne lidar (lidar) are not well understood. One cause for this is that the performance metrics typically used in studies for prediction of diameters can be difficult to interpret, and may not support comparative inferences between sampling designs and study areas. To help with this problem we propose two indices and use them to evaluate a variety of lidar and k nearest neighbor (k-NN) strategies for prediction of tree diameter distributions. The indices are based on the coefficient of determination ( R 2),more » and root mean square deviation (RMSD). Both of the indices are highly interpretable, and the RMSD-based index facilitates comparisons with alternative (non-lidar) inventory strategies, and with projects in other regions. K-NN diameter distribution prediction strategies were examined using auxiliary lidar for 190 training plots distribute across the 800 km 2 Savannah River Site in South Carolina, USA. In conclusion, we evaluate the performance of k-NN with respect to distance metrics, number of neighbors, predictor sets, and response sets. K-NN and lidar explained 80% of variability in diameters, and Mahalanobis distance with k = 3 neighbors performed best according to a number of criteria.« less

  4. An Examination of Diameter Density Prediction with k-NN and Airborne Lidar

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Strunk, Jacob L.; Gould, Peter J.; Packalen, Petteri

    While lidar-based forest inventory methods have been widely demonstrated, performances of methods to predict tree diameters with airborne lidar (lidar) are not well understood. One cause for this is that the performance metrics typically used in studies for prediction of diameters can be difficult to interpret, and may not support comparative inferences between sampling designs and study areas. To help with this problem we propose two indices and use them to evaluate a variety of lidar and k nearest neighbor (k-NN) strategies for prediction of tree diameter distributions. The indices are based on the coefficient of determination ( R 2),more » and root mean square deviation (RMSD). Both of the indices are highly interpretable, and the RMSD-based index facilitates comparisons with alternative (non-lidar) inventory strategies, and with projects in other regions. K-NN diameter distribution prediction strategies were examined using auxiliary lidar for 190 training plots distribute across the 800 km 2 Savannah River Site in South Carolina, USA. In conclusion, we evaluate the performance of k-NN with respect to distance metrics, number of neighbors, predictor sets, and response sets. K-NN and lidar explained 80% of variability in diameters, and Mahalanobis distance with k = 3 neighbors performed best according to a number of criteria.« less

  5. Improving a maximum horizontal gradient algorithm to determine geological body boundaries and fault systems based on gravity data

    NASA Astrophysics Data System (ADS)

    Van Kha, Tran; Van Vuong, Hoang; Thanh, Do Duc; Hung, Duong Quoc; Anh, Le Duc

    2018-05-01

    The maximum horizontal gradient method was first proposed by Blakely and Simpson (1986) for determining the boundaries between geological bodies with different densities. The method involves the comparison of a center point with its eight nearest neighbors in four directions within each 3 × 3 calculation grid. The horizontal location and magnitude of the maximum values are found by interpolating a second-order polynomial through the trio of points provided that the magnitude of the middle point is greater than its two nearest neighbors in one direction. In theoretical models of multiple sources, however, the above condition does not allow the maximum horizontal locations to be fully located, and it could be difficult to correlate the edges of complicated sources. In this paper, the authors propose an additional condition to identify more maximum horizontal locations within the calculation grid. This additional condition will improve the method algorithm for interpreting the boundaries of magnetic and/or gravity sources. The improved algorithm was tested on gravity models and applied to gravity data for the Phu Khanh basin on the continental shelf of the East Vietnam Sea. The results show that the additional locations of the maximum horizontal gradient could be helpful for connecting the edges of complicated source bodies.

  6. Prediction of Effective Drug Combinations by an Improved Naïve Bayesian Algorithm.

    PubMed

    Bai, Li-Yue; Dai, Hao; Xu, Qin; Junaid, Muhammad; Peng, Shao-Liang; Zhu, Xiaolei; Xiong, Yi; Wei, Dong-Qing

    2018-02-05

    Drug combinatorial therapy is a promising strategy for combating complex diseases due to its fewer side effects, lower toxicity and better efficacy. However, it is not feasible to determine all the effective drug combinations in the vast space of possible combinations given the increasing number of approved drugs in the market, since the experimental methods for identification of effective drug combinations are both labor- and time-consuming. In this study, we conducted systematic analysis of various types of features to characterize pairs of drugs. These features included information about the targets of the drugs, the pathway in which the target protein of a drug was involved in, side effects of drugs, metabolic enzymes of the drugs, and drug transporters. The latter two features (metabolic enzymes and drug transporters) were related to the metabolism and transportation properties of drugs, which were not analyzed or used in previous studies. Then, we devised a novel improved naïve Bayesian algorithm to construct classification models to predict effective drug combinations by using the individual types of features mentioned above. Our results indicated that the performance of our proposed method was indeed better than the naïve Bayesian algorithm and other conventional classification algorithms such as support vector machine and K-nearest neighbor.

  7. An autoregressive model-based particle filtering algorithms for extraction of respiratory rates as high as 90 breaths per minute from pulse oximeter.

    PubMed

    Lee, Jinseok; Chon, Ki H

    2010-09-01

    We present particle filtering (PF) algorithms for an accurate respiratory rate extraction from pulse oximeter recordings over a broad range: 12-90 breaths/min. These methods are based on an autoregressive (AR) model, where the aim is to find the pole angle with the highest magnitude as it corresponds to the respiratory rate. However, when SNR is low, the pole angle with the highest magnitude may not always lead to accurate estimation of the respiratory rate. To circumvent this limitation, we propose a probabilistic approach, using a sequential Monte Carlo method, named PF, which is combined with the optimal parameter search (OPS) criterion for an accurate AR model-based respiratory rate extraction. The PF technique has been widely adopted in many tracking applications, especially for nonlinear and/or non-Gaussian problems. We examine the performances of five different likelihood functions of the PF algorithm: the strongest neighbor, nearest neighbor (NN), weighted nearest neighbor (WNN), probability data association (PDA), and weighted probability data association (WPDA). The performance of these five combined OPS-PF algorithms was measured against a solely OPS-based AR algorithm for respiratory rate extraction from pulse oximeter recordings. The pulse oximeter data were collected from 33 healthy subjects with breathing rates ranging from 12 to 90 breaths/ min. It was found that significant improvement in accuracy can be achieved by employing particle filters, and that the combined OPS-PF employing either the NN or WNN likelihood function achieved the best results for all respiratory rates considered in this paper. The main advantage of the combined OPS-PF with either the NN or WNN likelihood function is that for the first time, respiratory rates as high as 90 breaths/min can be accurately extracted from pulse oximeter recordings.

  8. Efficient Fingercode Classification

    NASA Astrophysics Data System (ADS)

    Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang

    In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.

  9. Structure of the first- and second-neighbor shells of simulated water: Quantitative relation to translational and orientational order

    NASA Astrophysics Data System (ADS)

    Yan, Zhenyu; Buldyrev, Sergey V.; Kumar, Pradeep; Giovambattista, Nicolas; Debenedetti, Pablo G.; Stanley, H. Eugene

    2007-11-01

    We perform molecular dynamics simulations of water using the five-site transferable interaction potential (TIP5P) model to quantify structural order in both the first shell (defined by four nearest neighbors) and second shell (defined by twelve next-nearest neighbors) of a central water molecule. We find that the anomalous decrease of orientational order upon compression occurs in both shells, but the anomalous decrease of translational order upon compression occurs mainly in the second shell. The decreases of translational order and orientational order upon compression (called the “structural anomaly”) are thus correlated only in the second shell. Our findings quantitatively confirm the qualitative idea that the thermodynamic, structural, and hence dynamic anomalies of water are related to changes upon compression in the second shell.

  10. Probability machines: consistent probability estimation using nonparametric learning machines.

    PubMed

    Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A

    2012-01-01

    Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.

  11. Efficient Kriging Algorithms

    NASA Technical Reports Server (NTRS)

    Memarsadeghi, Nargess

    2011-01-01

    More efficient versions of an interpolation method, called kriging, have been introduced in order to reduce its traditionally high computational cost. Written in C++, these approaches were tested on both synthetic and real data. Kriging is a best unbiased linear estimator and suitable for interpolation of scattered data points. Kriging has long been used in the geostatistic and mining communities, but is now being researched for use in the image fusion of remotely sensed data. This allows a combination of data from various locations to be used to fill in any missing data from any single location. To arrive at the faster algorithms, sparse SYMMLQ iterative solver, covariance tapering, Fast Multipole Methods (FMM), and nearest neighbor searching techniques were used. These implementations were used when the coefficient matrix in the linear system is symmetric, but not necessarily positive-definite.

  12. A Crowd-Sourcing Indoor Localization Algorithm via Optical Camera on a Smartphone Assisted by Wi-Fi Fingerprint RSSI

    PubMed Central

    Chen, Wei; Wang, Weiping; Li, Qun; Chang, Qiang; Hou, Hongtao

    2016-01-01

    Indoor positioning based on existing Wi-Fi fingerprints is becoming more and more common. Unfortunately, the Wi-Fi fingerprint is susceptible to multiple path interferences, signal attenuation, and environmental changes, which leads to low accuracy. Meanwhile, with the recent advances in charge-coupled device (CCD) technologies and the processing speed of smartphones, indoor positioning using the optical camera on a smartphone has become an attractive research topic; however, the major challenge is its high computational complexity; as a result, real-time positioning cannot be achieved. In this paper we introduce a crowd-sourcing indoor localization algorithm via an optical camera and orientation sensor on a smartphone to address these issues. First, we use Wi-Fi fingerprint based on the K Weighted Nearest Neighbor (KWNN) algorithm to make a coarse estimation. Second, we adopt a mean-weighted exponent algorithm to fuse optical image features and orientation sensor data as well as KWNN in the smartphone to refine the result. Furthermore, a crowd-sourcing approach is utilized to update and supplement the positioning database. We perform several experiments comparing our approach with other positioning algorithms on a common smartphone to evaluate the performance of the proposed sensor-calibrated algorithm, and the results demonstrate that the proposed algorithm could significantly improve accuracy, stability, and applicability of positioning. PMID:27007379

  13. Fast segmentation of industrial quality pavement images using Laws texture energy measures and k -means clustering

    NASA Astrophysics Data System (ADS)

    Mathavan, Senthan; Kumar, Akash; Kamal, Khurram; Nieminen, Michael; Shah, Hitesh; Rahman, Mujib

    2016-09-01

    Thousands of pavement images are collected by road authorities daily for condition monitoring surveys. These images typically have intensity variations and texture nonuniformities that make their segmentation challenging. The automated segmentation of such pavement images is crucial for accurate, thorough, and expedited health monitoring of roads. In the pavement monitoring area, well-known texture descriptors, such as gray-level co-occurrence matrices and local binary patterns, are often used for surface segmentation and identification. These, despite being the established methods for texture discrimination, are inherently slow. This work evaluates Laws texture energy measures as a viable alternative for pavement images for the first time. k-means clustering is used to partition the feature space, limiting the human subjectivity in the process. Data classification, hence image segmentation, is performed by the k-nearest neighbor method. Laws texture energy masks are shown to perform well with resulting accuracy and precision values of more than 80%. The implementations of the algorithm, in both MATLAB® and OpenCV/C++, are extensively compared against the state of the art for execution speed, clearly showing the advantages of the proposed method. Furthermore, the OpenCV-based segmentation shows a 100% increase in processing speed when compared to the fastest algorithm available in literature.

  14. A Comparative Study of Interferometric Regridding Algorithms

    NASA Technical Reports Server (NTRS)

    Hensley, Scott; Safaeinili, Ali

    1999-01-01

    THe paper discusses regridding options: (1) The problem of interpolating data that is not sampled on a uniform grid, that is noisy, and contains gaps is a difficult problem. (2) Several interpolation algorithms have been implemented: (a) Nearest neighbor - Fast and easy but shows some artifacts in shaded relief images. (b) Simplical interpolator - uses plane going through three points containing point where interpolation is required. Reasonably fast and accurate. (c) Convolutional - uses a windowed Gaussian approximating the optimal prolate spheroidal weighting function for a specified bandwidth. (d) First or second order surface fitting - Uses the height data centered in a box about a given point and does a weighted least squares surface fit.

  15. Assessing the impact of background spectral graph construction techniques on the topological anomaly detection algorithm

    NASA Astrophysics Data System (ADS)

    Ziemann, Amanda K.; Messinger, David W.; Albano, James A.; Basener, William F.

    2012-06-01

    Anomaly detection algorithms have historically been applied to hyperspectral imagery in order to identify pixels whose material content is incongruous with the background material in the scene. Typically, the application involves extracting man-made objects from natural and agricultural surroundings. A large challenge in designing these algorithms is determining which pixels initially constitute the background material within an image. The topological anomaly detection (TAD) algorithm constructs a graph theory-based, fully non-parametric topological model of the background in the image scene, and uses codensity to measure deviation from this background. In TAD, the initial graph theory structure of the image data is created by connecting an edge between any two pixel vertices x and y if the Euclidean distance between them is less than some resolution r. While this type of proximity graph is among the most well-known approaches to building a geometric graph based on a given set of data, there is a wide variety of dierent geometrically-based techniques. In this paper, we present a comparative test of the performance of TAD across four dierent constructs of the initial graph: mutual k-nearest neighbor graph, sigma-local graph for two different values of σ > 1, and the proximity graph originally implemented in TAD.

  16. A k-nearest neighbor approach for estimation of single-tree biomass

    Treesearch

    Lutz Fehrmann; Christoph Kleinn

    2007-01-01

    Allometric biomass models are typically site and species specific. They are mostly based on a low number of independent variables such as diameter at breast height and tree height. Because of relatively small datasets, their validity is limited to the set of conditions of the study, such as site conditions and diameter range. One challenge in the context of the current...

  17. A Floor-Map-Aided WiFi/Pseudo-Odometry Integration Algorithm for an Indoor Positioning System

    PubMed Central

    Wang, Jian; Hu, Andong; Liu, Chunyan; Li, Xin

    2015-01-01

    This paper proposes a scheme for indoor positioning by fusing floor map, WiFi and smartphone sensor data to provide meter-level positioning without additional infrastructure. A topology-constrained K nearest neighbor (KNN) algorithm based on a floor map layout provides the coordinates required to integrate WiFi data with pseudo-odometry (P-O) measurements simulated using a pedestrian dead reckoning (PDR) approach. One method of further improving the positioning accuracy is to use a more effective multi-threshold step detection algorithm, as proposed by the authors. The “go and back” phenomenon caused by incorrect matching of the reference points (RPs) of a WiFi algorithm is eliminated using an adaptive fading-factor-based extended Kalman filter (EKF), taking WiFi positioning coordinates, P-O measurements and fused heading angles as observations. The “cross-wall” problem is solved based on the development of a floor-map-aided particle filter algorithm by weighting the particles, thereby also eliminating the gross-error effects originating from WiFi or P-O measurements. The performance observed in a field experiment performed on the fourth floor of the School of Environmental Science and Spatial Informatics (SESSI) building on the China University of Mining and Technology (CUMT) campus confirms that the proposed scheme can reliably achieve meter-level positioning. PMID:25811224

  18. Benchmarking protein classification algorithms via supervised cross-validation.

    PubMed

    Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

    2008-04-24

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and

  19. Rotationally Invariant Image Representation for Viewing Direction Classification in Cryo-EM

    PubMed Central

    Zhao, Zhizhen; Singer, Amit

    2014-01-01

    We introduce a new rotationally invariant viewing angle classification method for identifying, among a large number of cryo-EM projection images, similar views without prior knowledge of the molecule. Our rotationally invariant features are based on the bispectrum. Each image is denoised and compressed using steerable principal component analysis (PCA) such that rotating an image is equivalent to phase shifting the expansion coefficients. Thus we are able to extend the theory of bispectrum of 1D periodic signals to 2D images. The randomized PCA algorithm is then used to efficiently reduce the dimensionality of the bispectrum coefficients, enabling fast computation of the similarity between any pair of images. The nearest neighbors provide an initial classification of similar viewing angles. In this way, rotational alignment is only performed for images with their nearest neighbors. The initial nearest neighbor classification and alignment are further improved by a new classification method called vector diffusion maps. Our pipeline for viewing angle classification and alignment is experimentally shown to be faster and more accurate than reference-free alignment with rotationally invariant K-means clustering, MSA/MRA 2D classification, and their modern approximations. PMID:24631969

  20. Android Malware Classification Using K-Means Clustering Algorithm

    NASA Astrophysics Data System (ADS)

    Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah

    2017-08-01

    Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.

  1. Weighted Parzen Windows for Pattern Classification

    DTIC Science & Technology

    1994-05-01

    Nearest-Neighbor Rule The k-Nearest-Neighbor ( kNN ) technique is nonparametric, assuming nothing about the distribution of the data. Stated succinctly...probabilities P(wj I x) from samples." Raudys and Jain [20:255] advance this interpretation by pointing out that the kNN technique can be viewed as the...34Parzen window classifier with a hyper- rectangular window function." As with the Parzen-window technique, the kNN classifier is more accurate as the

  2. Comparison of crisp and fuzzy character networks in handwritten word recognition

    NASA Technical Reports Server (NTRS)

    Gader, Paul; Mohamed, Magdi; Chiang, Jung-Hsien

    1992-01-01

    Experiments involving handwritten word recognition on words taken from images of handwritten address blocks from the United States Postal Service mailstream are described. The word recognition algorithm relies on the use of neural networks at the character level. The neural networks are trained using crisp and fuzzy desired outputs. The fuzzy outputs were defined using a fuzzy k-nearest neighbor algorithm. The crisp networks slightly outperformed the fuzzy networks at the character level but the fuzzy networks outperformed the crisp networks at the word level.

  3. Image processing meta-algorithm development via genetic manipulation of existing algorithm graphs

    NASA Astrophysics Data System (ADS)

    Schalkoff, Robert J.; Shaaban, Khaled M.

    1999-07-01

    Automatic algorithm generation for image processing applications is not a new idea, however previous work is either restricted to morphological operates or impractical. In this paper, we show recent research result in the development and use of meta-algorithms, i.e. algorithms which lead to new algorithms. Although the concept is generally applicable, the application domain in this work is restricted to image processing. The meta-algorithm concept described in this paper is based upon out work in dynamic algorithm. The paper first present the concept of dynamic algorithms which, on the basis of training and archived algorithmic experience embedded in an algorithm graph (AG), dynamically adjust the sequence of operations applied to the input image data. Each node in the tree-based representation of a dynamic algorithm with out degree greater than 2 is a decision node. At these nodes, the algorithm examines the input data and determines which path will most likely achieve the desired results. This is currently done using nearest-neighbor classification. The details of this implementation are shown. The constrained perturbation of existing algorithm graphs, coupled with a suitable search strategy, is one mechanism to achieve meta-algorithm an doffers rich potential for the discovery of new algorithms. In our work, a meta-algorithm autonomously generates new dynamic algorithm graphs via genetic recombination of existing algorithm graphs. The AG representation is well suited to this genetic-like perturbation, using a commonly- employed technique in artificial neural network synthesis, namely the blueprint representation of graphs. A number of exam. One of the principal limitations of our current approach is the need for significant human input in the learning phase. Efforts to overcome this limitation are discussed. Future research directions are indicated.

  4. Predicting the Occurrence of Haze Events in Southeast Asia using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Lee, H. H.; Chulakadabba, A.; Tonks, A.; Yang, Z.; Wang, C.

    2017-12-01

    Severe local- and regional-scale air pollution episodes typically originate from 1) high emissions of air pollutants, 2) poor dispersion conditions, and 3) trans-boundary pollutant transport. Biomass burning activities have become more frequent in Southeast Asia, especially in Sumatra, Borneo, and the mainland Southeast. Trans-boundary transport of biomass burning aerosols often lead to air quality problems in the region. Furthermore, particulate pollutants from human activities besides biomass burning also play an important role in the air quality of Southeast Asia. Singapore, for example, has a dynamic industrial sector including chemical, electric and metallurgic industries, and is the region's major petroleum-refining center. In addition, natural gas and oil power plants, waste incinerators, active port traffic, and a major regional airport further complicate Singapore's air quality issues. In this study, we compare five Machine Learning algorithms: k-Nearest Neighbors, Linear Support Vector Machine, Decision Tree, Random Forest and Artificial Neural Network, to identify haze patterns and determine variable importance. The algorithms were trained using local atmospheric data (i.e. months, atmospheric conditions, wind direction and relative humidity) from three observation stations in Singapore (Changi, Seletar and Paya Labar). We find that the algorithms reveal the associations in data within and between the stations, and provide in-depth interpretation of the haze sources. The algorithms also allow us to predict the probability of haze episodes in Singapore and to determine the correlation between this probability and atmospheric conditions.

  5. Performance of resonant radar target identification algorithms using intra-class weighting functions

    NASA Astrophysics Data System (ADS)

    Mustafa, A.

    The use of calibrated resonant-region radar cross section (RCS) measurements of targets for the classification of large aircraft is discussed. Errors in the RCS estimate of full scale aircraft flying over an ocean, introduced by the ionospheric variability and the sea conditions were studied. The Weighted Target Representative (WTR) classification algorithm was developed, implemented, tested and compared with the nearest neighbor (NN) algorithm. The WTR-algorithm has a low sensitivity to the uncertainty in the aspect angle of the unknown target returns. In addition, this algorithm was based on the development of a new catalog of representative data which reduces the storage requirements and increases the computational efficiency of the classification system compared to the NN-algorithm. Experiments were designed to study and evaluate the characteristics of the WTR- and the NN-algorithms, investigate the classifiability of targets and study the relative behavior of the number of misclassifications as a function of the target backscatter features. The classification results and statistics were shown in the form of performance curves, performance tables and confusion tables.

  6. Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization.

    PubMed

    Pashaei, Elnaz; Pashaei, Elham; Aydin, Nizamettin

    2018-04-14

    In cancer classification, gene selection is an important data preprocessing technique, but it is a difficult task due to the large search space. Accordingly, the objective of this study is to develop a hybrid meta-heuristic Binary Black Hole Algorithm (BBHA) and Binary Particle Swarm Optimization (BPSO) (4-2) model that emphasizes gene selection. In this model, the BBHA is embedded in the BPSO (4-2) algorithm to make the BPSO (4-2) more effective and to facilitate the exploration and exploitation of the BPSO (4-2) algorithm to further improve the performance. This model has been associated with Random Forest Recursive Feature Elimination (RF-RFE) pre-filtering technique. The classifiers which are evaluated in the proposed framework are Sparse Partial Least Squares Discriminant Analysis (SPLSDA); k-nearest neighbor and Naive Bayes. The performance of the proposed method was evaluated on two benchmark and three clinical microarrays. The experimental results and statistical analysis confirm the better performance of the BPSO (4-2)-BBHA compared with the BBHA, the BPSO (4-2) and several state-of-the-art methods in terms of avoiding local minima, convergence rate, accuracy and number of selected genes. The results also show that the BPSO (4-2)-BBHA model can successfully identify known biologically and statistically significant genes from the clinical datasets. Copyright © 2018 Elsevier Inc. All rights reserved.

  7. Two-pass imputation algorithm for missing value estimation in gene expression time series.

    PubMed

    Tsiporkova, Elena; Boeva, Veselka

    2007-10-01

    Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different

  8. Study on Privacy Protection Algorithm Based on K-Anonymity

    NASA Astrophysics Data System (ADS)

    FeiFei, Zhao; LiFeng, Dong; Kun, Wang; Yang, Li

    Basing on the study of K-Anonymity algorithm in privacy protection issue, this paper proposed a "Degree Priority" method of visiting Lattice nodes on the generalization tree to improve the performance of K-Anonymity algorithm. This paper also proposed a "Two Times K-anonymity" methods to reduce the information loss in the process of K-Anonymity. Finally, we used experimental results to demonstrate the effectiveness of these methods.

  9. A Modified MinMax k-Means Algorithm Based on PSO.

    PubMed

    Wang, Xiaoyan; Bai, Yanping

    The MinMax k -means algorithm is widely used to tackle the effect of bad initialization by minimizing the maximum intraclustering errors. Two parameters, including the exponent parameter and memory parameter, are involved in the executive process. Since different parameters have different clustering errors, it is crucial to choose appropriate parameters. In the original algorithm, a practical framework is given. Such framework extends the MinMax k -means to automatically adapt the exponent parameter to the data set. It has been believed that if the maximum exponent parameter has been set, then the programme can reach the lowest intraclustering errors. However, our experiments show that this is not always correct. In this paper, we modified the MinMax k -means algorithm by PSO to determine the proper values of parameters which can subject the algorithm to attain the lowest clustering errors. The proposed clustering method is tested on some favorite data sets in several different initial situations and is compared to the k -means algorithm and the original MinMax k -means algorithm. The experimental results indicate that our proposed algorithm can reach the lowest clustering errors automatically.

  10. A Modified MinMax k-Means Algorithm Based on PSO

    PubMed Central

    2016-01-01

    The MinMax k-means algorithm is widely used to tackle the effect of bad initialization by minimizing the maximum intraclustering errors. Two parameters, including the exponent parameter and memory parameter, are involved in the executive process. Since different parameters have different clustering errors, it is crucial to choose appropriate parameters. In the original algorithm, a practical framework is given. Such framework extends the MinMax k-means to automatically adapt the exponent parameter to the data set. It has been believed that if the maximum exponent parameter has been set, then the programme can reach the lowest intraclustering errors. However, our experiments show that this is not always correct. In this paper, we modified the MinMax k-means algorithm by PSO to determine the proper values of parameters which can subject the algorithm to attain the lowest clustering errors. The proposed clustering method is tested on some favorite data sets in several different initial situations and is compared to the k-means algorithm and the original MinMax k-means algorithm. The experimental results indicate that our proposed algorithm can reach the lowest clustering errors automatically. PMID:27656201

  11. Automated segmentation algorithm for detection of changes in vaginal epithelial morphology using optical coherence tomography

    NASA Astrophysics Data System (ADS)

    Chitchian, Shahab; Vincent, Kathleen L.; Vargas, Gracie; Motamedi, Massoud

    2012-11-01

    We have explored the use of optical coherence tomography (OCT) as a noninvasive tool for assessing the toxicity of topical microbicides, products used to prevent HIV, by monitoring the integrity of the vaginal epithelium. A novel feature-based segmentation algorithm using a nearest-neighbor classifier was developed to monitor changes in the morphology of vaginal epithelium. The two-step automated algorithm yielded OCT images with a clearly defined epithelial layer, enabling differentiation of normal and damaged tissue. The algorithm was robust in that it was able to discriminate the epithelial layer from underlying stroma as well as residual microbicide product on the surface. This segmentation technique for OCT images has the potential to be readily adaptable to the clinical setting for noninvasively defining the boundaries of the epithelium, enabling quantifiable assessment of microbicide-induced damage in vaginal tissue.

  12. Sampling algorithms for validation of supervised learning models for Ising-like systems

    NASA Astrophysics Data System (ADS)

    Portman, Nataliya; Tamblyn, Isaac

    2017-12-01

    In this paper, we build and explore supervised learning models of ferromagnetic system behavior, using Monte-Carlo sampling of the spin configuration space generated by the 2D Ising model. Given the enormous size of the space of all possible Ising model realizations, the question arises as to how to choose a reasonable number of samples that will form physically meaningful and non-intersecting training and testing datasets. Here, we propose a sampling technique called ;ID-MH; that uses the Metropolis-Hastings algorithm creating Markov process across energy levels within the predefined configuration subspace. We show that application of this method retains phase transitions in both training and testing datasets and serves the purpose of validation of a machine learning algorithm. For larger lattice dimensions, ID-MH is not feasible as it requires knowledge of the complete configuration space. As such, we develop a new ;block-ID; sampling strategy: it decomposes the given structure into square blocks with lattice dimension N ≤ 5 and uses ID-MH sampling of candidate blocks. Further comparison of the performance of commonly used machine learning methods such as random forests, decision trees, k nearest neighbors and artificial neural networks shows that the PCA-based Decision Tree regressor is the most accurate predictor of magnetizations of the Ising model. For energies, however, the accuracy of prediction is not satisfactory, highlighting the need to consider more algorithmically complex methods (e.g., deep learning).

  13. Assessment of various supervised learning algorithms using different performance metrics

    NASA Astrophysics Data System (ADS)

    Susheel Kumar, S. M.; Laxkar, Deepak; Adhikari, Sourav; Vijayarajan, V.

    2017-11-01

    Our work brings out comparison based on the performance of supervised machine learning algorithms on a binary classification task. The supervised machine learning algorithms which are taken into consideration in the following work are namely Support Vector Machine(SVM), Decision Tree(DT), K Nearest Neighbour (KNN), Naïve Bayes(NB) and Random Forest(RF). This paper mostly focuses on comparing the performance of above mentioned algorithms on one binary classification task by analysing the Metrics such as Accuracy, F-Measure, G-Measure, Precision, Misclassification Rate, False Positive Rate, True Positive Rate, Specificity, Prevalence.

  14. Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm

    NASA Astrophysics Data System (ADS)

    Umam, Khoirul; Bustamam, Alhadi; Lestari, Dian

    2017-03-01

    DNA is one of the carrier of genetic information of living organisms. Encoding, sequencing, and clustering DNA sequences has become the key jobs and routine in the world of molecular biology, in particular on bioinformatics application. There are two type of clustering, hierarchical clustering and partitioning clustering. In this paper, we combined two type clustering i.e. K-Means (partitioning clustering) and DIANA (hierarchical clustering), therefore it called Hybrid clustering. Application of hybrid clustering using Parallel K-Means algorithm and DIANA algorithm used to clustering DNA sequences of Human Papillomavirus (HPV). The clustering process is started with Collecting DNA sequences of HPV are obtained from NCBI (National Centre for Biotechnology Information), then performing characteristics extraction of DNA sequences. The characteristics extraction result is store in a matrix form, then normalize this matrix using Min-Max normalization and calculate genetic distance using Euclidian Distance. Furthermore, the hybrid clustering is applied by using implementation of Parallel K-Means algorithm and DIANA algorithm. The aim of using Hybrid Clustering is to obtain better clusters result. For validating the resulted clusters, to get optimum number of clusters, we use Davies-Bouldin Index (DBI). In this study, the result of implementation of Parallel K-Means clustering is data clustered become 5 clusters with minimal IDB value is 0.8741, and Hybrid Clustering clustered data become 13 sub-clusters with minimal IDB values = 0.8216, 0.6845, 0.3331, 0.1994 and 0.3952. The IDB value of hybrid clustering less than IBD value of Parallel K-Means clustering only that perform at 1ts stage. Its means clustering using Hybrid Clustering have the better result to clustered DNA sequence of HPV than perform parallel K-Means Clustering only.

  15. Distributed k-Means Algorithm and Fuzzy c-Means Algorithm for Sensor Networks Based on Multiagent Consensus Theory.

    PubMed

    Qin, Jiahu; Fu, Weiming; Gao, Huijun; Zheng, Wei Xing

    2016-03-03

    This paper is concerned with developing a distributed k-means algorithm and a distributed fuzzy c-means algorithm for wireless sensor networks (WSNs) where each node is equipped with sensors. The underlying topology of the WSN is supposed to be strongly connected. The consensus algorithm in multiagent consensus theory is utilized to exchange the measurement information of the sensors in WSN. To obtain a faster convergence speed as well as a higher possibility of having the global optimum, a distributed k-means++ algorithm is first proposed to find the initial centroids before executing the distributed k-means algorithm and the distributed fuzzy c-means algorithm. The proposed distributed k-means algorithm is capable of partitioning the data observed by the nodes into measure-dependent groups which have small in-group and large out-group distances, while the proposed distributed fuzzy c-means algorithm is capable of partitioning the data observed by the nodes into different measure-dependent groups with degrees of membership values ranging from 0 to 1. Simulation results show that the proposed distributed algorithms can achieve almost the same results as that given by the centralized clustering algorithms.

  16. Identifying the most influential spreaders in complex networks by an Extended Local K-Shell Sum

    NASA Astrophysics Data System (ADS)

    Yang, Fan; Zhang, Ruisheng; Yang, Zhao; Hu, Rongjing; Li, Mengtian; Yuan, Yongna; Li, Keqin

    Identifying influential spreaders is crucial for developing strategies to control the spreading process on complex networks. Following the well-known K-Shell (KS) decomposition, several improved measures are proposed. However, these measures cannot identify the most influential spreaders accurately. In this paper, we define a Local K-Shell Sum (LKSS) by calculating the sum of the K-Shell indices of the neighbors within 2-hops of a given node. Based on the LKSS, we propose an Extended Local K-Shell Sum (ELKSS) centrality to rank spreaders. The ELKSS is defined as the sum of the LKSS of the nearest neighbors of a given node. By assuming that the spreading process on networks follows the Susceptible-Infectious-Recovered (SIR) model, we perform extensive simulations on a series of real networks to compare the performance between the ELKSS centrality and other six measures. The results show that the ELKSS centrality has a better performance than the six measures to distinguish the spreading ability of nodes and to identify the most influential spreaders accurately.

  17. Analysis and implementation of cross lingual short message service spam filtering using graph-based k-nearest neighbor

    NASA Astrophysics Data System (ADS)

    Ayu Cyntya Dewi, Dyah; Shaufiah; Asror, Ibnu

    2018-03-01

    SMS (Short Message Service) is on e of the communication services that still be the main choice, although now the phone grow with various applications. Along with the development of various other communication media, some countries lowered SMS rates to keep the interest of mobile users. It resulted in increased spam SMS that used by several parties, one of them for advertisement. Given the kind of multi-lingual documents in a message SMS, the Web, and others, necessary for effective multilingual or cross-lingual processing techniques is becoming increasingly important. The steps that performed in this research is data / messages first preprocessing then represented into a graph model. Then calculated using GKNN method. From this research we get the maximum accuracy is 98.86 with training data in Indonesian language and testing data in indonesian language with K 10 and threshold 0.001.

  18. Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm.

    PubMed

    Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong

    2016-01-01

    In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.

  19. Comparison of GOES Cloud Classification Algorithms Employing Explicit and Implicit Physics

    NASA Technical Reports Server (NTRS)

    Bankert, Richard L.; Mitrescu, Cristian; Miller, Steven D.; Wade, Robert H.

    2009-01-01

    Cloud-type classification based on multispectral satellite imagery data has been widely researched and demonstrated to be useful for distinguishing a variety of classes using a wide range of methods. The research described here is a comparison of the classifier output from two very different algorithms applied to Geostationary Operational Environmental Satellite (GOES) data over the course of one year. The first algorithm employs spectral channel thresholding and additional physically based tests. The second algorithm was developed through a supervised learning method with characteristic features of expertly labeled image samples used as training data for a 1-nearest-neighbor classification. The latter's ability to identify classes is also based in physics, but those relationships are embedded implicitly within the algorithm. A pixel-to-pixel comparison analysis was done for hourly daytime scenes within a region in the northeastern Pacific Ocean. Considerable agreement was found in this analysis, with many of the mismatches or disagreements providing insight to the strengths and limitations of each classifier. Depending upon user needs, a rule-based or other postprocessing system that combines the output from the two algorithms could provide the most reliable cloud-type classification.

  20. Integrated Sensing and Processing (ISP) Phase II: Demonstration and Evaluation for Distributed Sensor Netowrks and Missile Seeker Systems

    DTIC Science & Technology

    2007-02-28

    Shah, D. Waagen, H. Schmitt, S. Bellofiore, A. Spanias, and D. Cochran, 32nd International Conference on Acoustics, Speech , and Signal Processing...Information Exploitation Office kNN k-Nearest Neighbor LEAN Laplacian Eigenmap Adaptive Neighbor LIP Linear Integer Programming ISP

  1. A Cancer Gene Selection Algorithm Based on the K-S Test and CFS.

    PubMed

    Su, Qiang; Wang, Yina; Jiang, Xiaobing; Chen, Fuxue; Lu, Wen-Cong

    2017-01-01

    To address the challenging problem of selecting distinguished genes from cancer gene expression datasets, this paper presents a gene subset selection algorithm based on the Kolmogorov-Smirnov (K-S) test and correlation-based feature selection (CFS) principles. The algorithm selects distinguished genes first using the K-S test, and then, it uses CFS to select genes from those selected by the K-S test. We adopted support vector machines (SVM) as the classification tool and used the criteria of accuracy to evaluate the performance of the classifiers on the selected gene subsets. This approach compared the proposed gene subset selection algorithm with the K-S test, CFS, minimum-redundancy maximum-relevancy (mRMR), and ReliefF algorithms. The average experimental results of the aforementioned gene selection algorithms for 5 gene expression datasets demonstrate that, based on accuracy, the performance of the new K-S and CFS-based algorithm is better than those of the K-S test, CFS, mRMR, and ReliefF algorithms. The experimental results show that the K-S test-CFS gene selection algorithm is a very effective and promising approach compared to the K-S test, CFS, mRMR, and ReliefF algorithms.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schreiner, S.; Paschal, C.B.; Galloway, R.L.

    Four methods of producing maximum intensity projection (MIP) images were studied and compared. Three of the projection methods differ in the interpolation kernel used for ray tracing. The interpolation kernels include nearest neighbor interpolation, linear interpolation, and cubic convolution interpolation. The fourth projection method is a voxel projection method that is not explicitly a ray-tracing technique. The four algorithms` performance was evaluated using a computer-generated model of a vessel and using real MR angiography data. The evaluation centered around how well an algorithm transferred an object`s width to the projection plane. The voxel projection algorithm does not suffer from artifactsmore » associated with the nearest neighbor algorithm. Also, a speed-up in the calculation of the projection is seen with the voxel projection method. Linear interpolation dramatically improves the transfer of width information from the 3D MRA data set over both nearest neighbor and voxel projection methods. Even though the cubic convolution interpolation kernel is theoretically superior to the linear kernel, it did not project widths more accurately than linear interpolation. A possible advantage to the nearest neighbor interpolation is that the size of small vessels tends to be exaggerated in the projection plane, thereby increasing their visibility. The results confirm that the way in which an MIP image is constructed has a dramatic effect on information contained in the projection. The construction method must be chosen with the knowledge that the clinical information in the 2D projections in general will be different from that contained in the original 3D data volume. 27 refs., 16 figs., 2 tabs.« less

  3. Real-time Interpolation for True 3-Dimensional Ultrasound Image Volumes

    PubMed Central

    Ji, Songbai; Roberts, David W.; Hartov, Alex; Paulsen, Keith D.

    2013-01-01

    We compared trilinear interpolation to voxel nearest neighbor and distance-weighted algorithms for fast and accurate processing of true 3-dimensional ultrasound (3DUS) image volumes. In this study, the computational efficiency and interpolation accuracy of the 3 methods were compared on the basis of a simulated 3DUS image volume, 34 clinical 3DUS image volumes from 5 patients, and 2 experimental phantom image volumes. We show that trilinear interpolation improves interpolation accuracy over both the voxel nearest neighbor and distance-weighted algorithms yet achieves real-time computational performance that is comparable to the voxel nearest neighbor algrorithm (1–2 orders of magnitude faster than the distance-weighted algorithm) as well as the fastest pixel-based algorithms for processing tracked 2-dimensional ultrasound images (0.035 seconds per 2-dimesional cross-sectional image [76,800 pixels interpolated, or 0.46 ms/1000 pixels] and 1.05 seconds per full volume with a 1-mm3 voxel size [4.6 million voxels interpolated, or 0.23 ms/1000 voxels]). On the basis of these results, trilinear interpolation is recommended as a fast and accurate interpolation method for rectilinear sampling of 3DUS image acquisitions, which is required to facilitate subsequent processing and display during operating room procedures such as image-guided neurosurgery. PMID:21266563

  4. Real-time interpolation for true 3-dimensional ultrasound image volumes.

    PubMed

    Ji, Songbai; Roberts, David W; Hartov, Alex; Paulsen, Keith D

    2011-02-01

    We compared trilinear interpolation to voxel nearest neighbor and distance-weighted algorithms for fast and accurate processing of true 3-dimensional ultrasound (3DUS) image volumes. In this study, the computational efficiency and interpolation accuracy of the 3 methods were compared on the basis of a simulated 3DUS image volume, 34 clinical 3DUS image volumes from 5 patients, and 2 experimental phantom image volumes. We show that trilinear interpolation improves interpolation accuracy over both the voxel nearest neighbor and distance-weighted algorithms yet achieves real-time computational performance that is comparable to the voxel nearest neighbor algrorithm (1-2 orders of magnitude faster than the distance-weighted algorithm) as well as the fastest pixel-based algorithms for processing tracked 2-dimensional ultrasound images (0.035 seconds per 2-dimesional cross-sectional image [76,800 pixels interpolated, or 0.46 ms/1000 pixels] and 1.05 seconds per full volume with a 1-mm(3) voxel size [4.6 million voxels interpolated, or 0.23 ms/1000 voxels]). On the basis of these results, trilinear interpolation is recommended as a fast and accurate interpolation method for rectilinear sampling of 3DUS image acquisitions, which is required to facilitate subsequent processing and display during operating room procedures such as image-guided neurosurgery.

  5. Structure-guided Protein Transition Modeling with a Probabilistic Roadmap Algorithm.

    PubMed

    Maximova, Tatiana; Plaku, Erion; Shehu, Amarda

    2016-07-07

    Proteins are macromolecules in perpetual motion, switching between structural states to modulate their function. A detailed characterization of the precise yet complex relationship between protein structure, dynamics, and function requires elucidating transitions between functionally-relevant states. Doing so challenges both wet and dry laboratories, as protein dynamics involves disparate temporal scales. In this paper we present a novel, sampling-based algorithm to compute transition paths. The algorithm exploits two main ideas. First, it leverages known structures to initialize its search and define a reduced conformation space for rapid sampling. This is key to address the insufficient sampling issue suffered by sampling-based algorithms. Second, the algorithm embeds samples in a nearest-neighbor graph where transition paths can be efficiently computed via queries. The algorithm adapts the probabilistic roadmap framework that is popular in robot motion planning. In addition to efficiently computing lowest-cost paths between any given structures, the algorithm allows investigating hypotheses regarding the order of experimentally-known structures in a transition event. This novel contribution is likely to open up new venues of research. Detailed analysis is presented on multiple-basin proteins of relevance to human disease. Multiscaling and the AMBER ff14SB force field are used to obtain energetically-credible paths at atomistic detail.

  6. Advanced Algorithms for Local Routing Strategy on Complex Networks.

    PubMed

    Lin, Benchuan; Chen, Bokui; Gao, Yachun; Tse, Chi K; Dong, Chuanfei; Miao, Lixin; Wang, Binghong

    2016-01-01

    Despite the significant improvement on network performance provided by global routing strategies, their applications are still limited to small-scale networks, due to the need for acquiring global information of the network which grows and changes rapidly with time. Local routing strategies, however, need much less local information, though their transmission efficiency and network capacity are much lower than that of global routing strategies. In view of this, three algorithms are proposed and a thorough investigation is conducted in this paper. These algorithms include a node duplication avoidance algorithm, a next-nearest-neighbor algorithm and a restrictive queue length algorithm. After applying them to typical local routing strategies, the critical generation rate of information packets Rc increases by over ten-fold and the average transmission time 〈T〉 decreases by 70-90 percent, both of which are key physical quantities to assess the efficiency of routing strategies on complex networks. More importantly, in comparison with global routing strategies, the improved local routing strategies can yield better network performance under certain circumstances. This is a revolutionary leap for communication networks, because local routing strategy enjoys great superiority over global routing strategy not only in terms of the reduction of computational expense, but also in terms of the flexibility of implementation, especially for large-scale networks.

  7. Biclustering Learning of Trading Rules.

    PubMed

    Huang, Qinghua; Wang, Ting; Tao, Dacheng; Li, Xuelong

    2015-10-01

    Technical analysis with numerous indicators and patterns has been regarded as important evidence for making trading decisions in financial markets. However, it is extremely difficult for investors to find useful trading rules based on numerous technical indicators. This paper innovatively proposes the use of biclustering mining to discover effective technical trading patterns that contain a combination of indicators from historical financial data series. This is the first attempt to use biclustering algorithm on trading data. The mined patterns are regarded as trading rules and can be classified as three trading actions (i.e., the buy, the sell, and no-action signals) with respect to the maximum support. A modified K nearest neighborhood ( K -NN) method is applied to classification of trading days in the testing period. The proposed method [called biclustering algorithm and the K nearest neighbor (BIC- K -NN)] was implemented on four historical datasets and the average performance was compared with the conventional buy-and-hold strategy and three previously reported intelligent trading systems. Experimental results demonstrate that the proposed trading system outperforms its counterparts and will be useful for investment in various financial markets.

  8. Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm

    PubMed Central

    Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong

    2016-01-01

    In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis. PMID:27959895

  9. Behavioral Modeling for Mental Health using Machine Learning Algorithms.

    PubMed

    Srividya, M; Mohanavalli, S; Bhalaji, N

    2018-04-03

    Mental health is an indicator of emotional, psychological and social well-being of an individual. It determines how an individual thinks, feels and handle situations. Positive mental health helps one to work productively and realize their full potential. Mental health is important at every stage of life, from childhood and adolescence through adulthood. Many factors contribute to mental health problems which lead to mental illness like stress, social anxiety, depression, obsessive compulsive disorder, drug addiction, and personality disorders. It is becoming increasingly important to determine the onset of the mental illness to maintain proper life balance. The nature of machine learning algorithms and Artificial Intelligence (AI) can be fully harnessed for predicting the onset of mental illness. Such applications when implemented in real time will benefit the society by serving as a monitoring tool for individuals with deviant behavior. This research work proposes to apply various machine learning algorithms such as support vector machines, decision trees, naïve bayes classifier, K-nearest neighbor classifier and logistic regression to identify state of mental health in a target group. The responses obtained from the target group for the designed questionnaire were first subject to unsupervised learning techniques. The labels obtained as a result of clustering were validated by computing the Mean Opinion Score. These cluster labels were then used to build classifiers to predict the mental health of an individual. Population from various groups like high school students, college students and working professionals were considered as target groups. The research presents an analysis of applying the aforementioned machine learning algorithms on the target groups and also suggests directions for future work.

  10. Particle Communication and Domain Neighbor Coupling: Scalable Domain Decomposed Algorithms for Monte Carlo Particle Transport

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    O'Brien, M. J.; Brantley, P. S.

    2015-01-20

    In order to run Monte Carlo particle transport calculations on new supercomputers with hundreds of thousands or millions of processors, care must be taken to implement scalable algorithms. This means that the algorithms must continue to perform well as the processor count increases. In this paper, we examine the scalability of:(1) globally resolving the particle locations on the correct processor, (2) deciding that particle streaming communication has finished, and (3) efficiently coupling neighbor domains together with different replication levels. We have run domain decomposed Monte Carlo particle transport on up to 2 21 = 2,097,152 MPI processes on the IBMmore » BG/Q Sequoia supercomputer and observed scalable results that agree with our theoretical predictions. These calculations were carefully constructed to have the same amount of work on every processor, i.e. the calculation is already load balanced. We also examine load imbalanced calculations where each domain’s replication level is proportional to its particle workload. In this case we show how to efficiently couple together adjacent domains to maintain within workgroup load balance and minimize memory usage.« less

  11. A novel algorithm of super-resolution image reconstruction based on multi-class dictionaries for natural scene

    NASA Astrophysics Data System (ADS)

    Wu, Wei; Zhao, Dewei; Zhang, Huan

    2015-12-01

    Super-resolution image reconstruction is an effective method to improve the image quality. It has important research significance in the field of image processing. However, the choice of the dictionary directly affects the efficiency of image reconstruction. A sparse representation theory is introduced into the problem of the nearest neighbor selection. Based on the sparse representation of super-resolution image reconstruction method, a super-resolution image reconstruction algorithm based on multi-class dictionary is analyzed. This method avoids the redundancy problem of only training a hyper complete dictionary, and makes the sub-dictionary more representatives, and then replaces the traditional Euclidean distance computing method to improve the quality of the whole image reconstruction. In addition, the ill-posed problem is introduced into non-local self-similarity regularization. Experimental results show that the algorithm is much better results than state-of-the-art algorithm in terms of both PSNR and visual perception.

  12. Pivot methods for global optimization

    NASA Astrophysics Data System (ADS)

    Stanton, Aaron Fletcher

    A new algorithm is presented for the location of the global minimum of a multiple minima problem. It begins with a series of randomly placed probes in phase space, and then uses an iterative redistribution of the worst probes into better regions of phase space until a chosen convergence criterion is fulfilled. The method quickly converges, does not require derivatives, and is resistant to becoming trapped in local minima. Comparison of this algorithm with others using a standard test suite demonstrates that the number of function calls has been decreased conservatively by a factor of about three with the same degrees of accuracy. Two major variations of the method are presented, differing primarily in the method of choosing the probes that act as the basis for the new probes. The first variation, termed the lowest energy pivot method, ranks all probes by their energy and keeps the best probes. The probes being discarded select from those being kept as the basis for the new cycle. In the second variation, the nearest neighbor pivot method, all probes are paired with their nearest neighbor. The member of each pair with the higher energy is relocated in the vicinity of its neighbor. Both methods are tested against a standard test suite of functions to determine their relative efficiency, and the nearest neighbor pivot method is found to be the more efficient. A series of Lennard-Jones clusters is optimized with the nearest neighbor method, and a scaling law is found for cpu time versus the number of particles in the system. The two methods are then compared more explicitly, and finally a study in the use of the pivot method for solving the Schroedinger equation is presented. The nearest neighbor method is found to be able to solve the ground state of the quantum harmonic oscillator from a pure random initialization of the wavefunction.

  13. Mapping growing stock volume and forest live biomass: a case study of the Polissya region of Ukraine

    NASA Astrophysics Data System (ADS)

    Bilous, Andrii; Myroniuk, Viktor; Holiaka, Dmytrii; Bilous, Svitlana; See, Linda; Schepaschenko, Dmitry

    2017-10-01

    Forest inventory and biomass mapping are important tasks that require inputs from multiple data sources. In this paper we implement two methods for the Ukrainian region of Polissya: random forest (RF) for tree species prediction and k-nearest neighbors (k-NN) for growing stock volume and biomass mapping. We examined the suitability of the five-band RapidEye satellite image to predict the distribution of six tree species. The accuracy of RF is quite high: ~99% for forest/non-forest mask and 89% for tree species prediction. Our results demonstrate that inclusion of elevation as a predictor variable in the RF model improved the performance of tree species classification. We evaluated different distance metrics for the k-NN method, including Euclidean or Mahalanobis distance, most similar neighbor (MSN), gradient nearest neighbor, and independent component analysis. The MSN with the four nearest neighbors (k = 4) is the most precise (according to the root-mean-square deviation) for predicting forest attributes across the study area. The k-NN method allowed us to estimate growing stock volume with an accuracy of 3 m3 ha-1 and for live biomass of about 2 t ha-1 over the study area.

  14. Corrigendum to "Nearest neighbor imputation of species-level, plot-scale forest structure attributes from LiDAR data"

    Treesearch

    Andrew T. Hudak; Nicholas L. Crookston; Jeffrey S. Evans; David E. hall; Michael J. Falkowski

    2009-01-01

    The authors regret that an error was discovered in the code within the R software package, yaImpute (Crookston & Finley, 2008), which led to incorrect results reported in the above article. The Most Similar Neighbor (MSN) method computes the distance between reference observations and target observations in a projected space defined using canonical correlation...

  15. Imputed forest structure uncertainty varies across elevational and longitudinal gradients in the western Cascade mountains, Oregon, USA

    Treesearch

    David M. Bell; Matthew J. Gregory; Janet L. Ohmann

    2015-01-01

    Imputation provides a useful method for mapping forest attributes across broad geographic areas based on field plot measurements and Landsat multi-spectral data, but the resulting map products may be of limited use without corresponding analyses of uncertainties in predictions. In the case of k-nearest neighbor (kNN) imputation with k = 1, such as the Gradient Nearest...

  16. The most detailed high-energy picture of Proxima Centauri, our nearest extrasolar neighbor

    NASA Astrophysics Data System (ADS)

    Schneider, Christian

    2016-10-01

    Proxima Centauri b is the nearest exoplanet to the Sun. It orbits an M5.5 dwarf and is potentially habitable. The latter statement, however, depends sensitively on the high-energy irradiation on the planet. Ribas et al. (2016) estimated the high-energy flux of the host star by collecting archival data from the X-ray to the FUV regime, but explicitly state that one unavoidable complication of estimating XUV fluxes is [...] intrinsic [stellar] variability. Here, we propose to greatly improve upon this unavoidable complication by obtaining simultaneous X-ray and UV observations to measure a high-resolution irradiation spectrum and, thus, to assess the habitability of Proxima b.Our upcoming, very deep Chandra grating observation of Proxima Cen (175 ks, LETGS, PI: P. Predehl) provides a great opportunity to obtain simultaneous coverage at X-ray and UV wavelengths, i.e., to measure most of the stellar high-energy flux in a coherent way. The reason for proposing a HST DDT is that the Chandra observation is a GTO and, thus, could not be augmented by simultaneous HST observations directly as we would have proposedfor in a regular GO.Combining Chandra X-ray and HST UV data allows us to reconstruct a high-resolution spectral energy distribution (SED) including the EUV regime and, thus, a reference irradiation spectrum using the methods developed by us for the MUSCLES project.

  17. Comprehensive thermodynamic analysis of 3′ double-nucleotide overhangs neighboring Watson–Crick terminal base pairs

    PubMed Central

    O'Toole, Amanda S.; Miller, Stacy; Haines, Nathan; Zink, M. Coleen; Serra, Martin J.

    2006-01-01

    Thermodynamic parameters are reported for duplex formation of 48 self-complementary RNA duplexes containing Watson–Crick terminal base pairs (GC, AU and UA) with all 16 possible 3′ double-nucleotide overhangs; mimicking the structures of short interfering RNAs (siRNA) and microRNAs (miRNA). Based on nearest-neighbor analysis, the addition of a second dangling nucleotide to a single 3′ dangling nucleotide increases stability of duplex formation up to 0.8 kcal/mol in a sequence dependent manner. Results from this study in conjunction with data from a previous study [A. S. O'Toole, S. Miller and M. J. Serra (2005) RNA, 11, 512.] allows for the development of a refined nearest-neighbor model to predict the influence of 3′ double-nucleotide overhangs on the stability of duplex formation. The model improves the prediction of free energy and melting temperature when tested against five oligomers with various core duplex sequences. Phylogenetic analysis of naturally occurring miRNAs was performed to support our results. Selection of the effector miR strand of the mature miRNA duplex appears to be dependent upon the identity of the 3′ double-nucleotide overhang. Thermodynamic parameters for 3′ single terminal overhangs adjacent to a UA pair are also presented. PMID:16820533

  18. Long-term surface EMG monitoring using K-means clustering and compressive sensing

    NASA Astrophysics Data System (ADS)

    Balouchestani, Mohammadreza; Krishnan, Sridhar

    2015-05-01

    In this work, we present an advanced K-means clustering algorithm based on Compressed Sensing theory (CS) in combination with the K-Singular Value Decomposition (K-SVD) method for Clustering of long-term recording of surface Electromyography (sEMG) signals. The long-term monitoring of sEMG signals aims at recording of the electrical activity produced by muscles which are very useful procedure for treatment and diagnostic purposes as well as for detection of various pathologies. The proposed algorithm is examined for three scenarios of sEMG signals including healthy person (sEMG-Healthy), a patient with myopathy (sEMG-Myopathy), and a patient with neuropathy (sEMG-Neuropathr), respectively. The proposed algorithm can easily scan large sEMG datasets of long-term sEMG recording. We test the proposed algorithm with Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) dimensionality reduction methods. Then, the output of the proposed algorithm is fed to K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers in order to calclute the clustering performance. The proposed algorithm achieves a classification accuracy of 99.22%. This ability allows reducing 17% of Average Classification Error (ACE), 9% of Training Error (TE), and 18% of Root Mean Square Error (RMSE). The proposed algorithm also reduces 14% clustering energy consumption compared to the existing K-Means clustering algorithm.

  19. Phase transition in the spin- 3 / 2 Blume-Emery-Griffiths model with antiferromagnetic second neighbor interactions

    NASA Astrophysics Data System (ADS)

    Yezli, M.; Bekhechi, S.; Hontinfinde, F.; EZ-Zahraouy, H.

    2016-04-01

    Two nonperturbative methods such as Monte-Carlo simulation (MC) and Transfer-Matrix Finite-Size-Scaling calculations (TMFSS) have been used to study the phase transition of the spin- 3 / 2 ​Blume-Emery-Griffiths model (BEG) with quadrupolar and antiferromagnetic next-nearest-neighbor exchange interactions. Ground state and finite temperature phase diagrams are obtained by means of these two methods. New degenerate phases are found and only second order phase transitions occur for all values of the parameter interactions. No sign of the intermediate phase is found from both methods. Critical exponents are also obtained from TMFSS calculations. Ising criticality and nonuniversal behaviors are observed depending on the strength of the second neighbor interaction.

  20. Novel Hyperspectral Anomaly Detection Methods Based on Unsupervised Nearest Regularized Subspace

    NASA Astrophysics Data System (ADS)

    Hou, Z.; Chen, Y.; Tan, K.; Du, P.

    2018-04-01

    Anomaly detection has been of great interest in hyperspectral imagery analysis. Most conventional anomaly detectors merely take advantage of spectral and spatial information within neighboring pixels. In this paper, two methods of Unsupervised Nearest Regularized Subspace-based with Outlier Removal Anomaly Detector (UNRSORAD) and Local Summation UNRSORAD (LSUNRSORAD) are proposed, which are based on the concept that each pixel in background can be approximately represented by its spatial neighborhoods, while anomalies cannot. Using a dual window, an approximation of each testing pixel is a representation of surrounding data via a linear combination. The existence of outliers in the dual window will affect detection accuracy. Proposed detectors remove outlier pixels that are significantly different from majority of pixels. In order to make full use of various local spatial distributions information with the neighboring pixels of the pixels under test, we take the local summation dual-window sliding strategy. The residual image is constituted by subtracting the predicted background from the original hyperspectral imagery, and anomalies can be detected in the residual image. Experimental results show that the proposed methods have greatly improved the detection accuracy compared with other traditional detection method.

  1. Prediction of microRNAs Associated with Human Diseases Based on Weighted k Most Similar Neighbors

    PubMed Central

    Guo, Maozu; Guo, Yahong; Li, Jinbao; Ding, Jian; Liu, Yong; Dai, Qiguo; Li, Jin; Teng, Zhixia; Huang, Yufei

    2013-01-01

    Background The identification of human disease-related microRNAs (disease miRNAs) is important for further investigating their involvement in the pathogenesis of diseases. More experimentally validated miRNA-disease associations have been accumulated recently. On the basis of these associations, it is essential to predict disease miRNAs for various human diseases. It is useful in providing reliable disease miRNA candidates for subsequent experimental studies. Methodology/Principal Findings It is known that miRNAs with similar functions are often associated with similar diseases and vice versa. Therefore, the functional similarity of two miRNAs has been successfully estimated by measuring the semantic similarity of their associated diseases. To effectively predict disease miRNAs, we calculated the functional similarity by incorporating the information content of disease terms and phenotype similarity between diseases. Furthermore, the members of miRNA family or cluster are assigned higher weight since they are more probably associated with similar diseases. A new prediction method, HDMP, based on weighted k most similar neighbors is presented for predicting disease miRNAs. Experiments validated that HDMP achieved significantly higher prediction performance than existing methods. In addition, the case studies examining prostatic neoplasms, breast neoplasms, and lung neoplasms, showed that HDMP can uncover potential disease miRNA candidates. Conclusions The superior performance of HDMP can be attributed to the accurate measurement of miRNA functional similarity, the weight assignment based on miRNA family or cluster, and the effective prediction based on weighted k most similar neighbors. The online prediction and analysis tool is freely available at http://nclab.hit.edu.cn/hdmpred. PMID:23950912

  2. A Partitioning Algorithm for Block-Diagonal Matrices With Overlap

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Guy Antoine Atenekeng Kahou; Laura Grigori; Masha Sosonkina

    2008-02-02

    We present a graph partitioning algorithm that aims at partitioning a sparse matrix into a block-diagonal form, such that any two consecutive blocks overlap. We denote this form of the matrix as the overlapped block-diagonal matrix. The partitioned matrix is suitable for applying the explicit formulation of Multiplicative Schwarz preconditioner (EFMS) described in [3]. The graph partitioning algorithm partitions the graph of the input matrix into K partitions, such that every partition {Omega}{sub i} has at most two neighbors {Omega}{sub i-1} and {Omega}{sub i+1}. First, an ordering algorithm, such as the reverse Cuthill-McKee algorithm, that reduces the matrix profile ismore » performed. An initial overlapped block-diagonal partition is obtained from the profile of the matrix. An iterative strategy is then used to further refine the partitioning by allowing nodes to be transferred between neighboring partitions. Experiments are performed on matrices arising from real-world applications to show the feasibility and usefulness of this approach.« less

  3. Identification of Disease Critical Genes Using Collective Meta-heuristic Approaches: An Application to Preeclampsia.

    PubMed

    Biswas, Surama; Dutta, Subarna; Acharyya, Sriyankar

    2017-12-01

    Identifying a small subset of disease critical genes out of a large size of microarray gene expression data is a challenge in computational life sciences. This paper has applied four meta-heuristic algorithms, namely, honey bee mating optimization (HBMO), harmony search (HS), differential evolution (DE) and genetic algorithm (basic version GA) to find disease critical genes of preeclampsia which affects women during gestation. Two hybrid algorithms, namely, HBMO-kNN and HS-kNN have been newly proposed here where kNN (k nearest neighbor classifier) is used for sample classification. Performances of these new approaches have been compared with other two hybrid algorithms, namely, DE-kNN and SGA-kNN. Three datasets of different sizes have been used. In a dataset, the set of genes found common in the output of each algorithm is considered here as disease critical genes. In different datasets, the percentage of classification or classification accuracy of meta-heuristic algorithms varied between 92.46 and 100%. HBMO-kNN has the best performance (99.64-100%) in almost all data sets. DE-kNN secures the second position (99.42-100%). Disease critical genes obtained here match with clinically revealed preeclampsia genes to a large extent.

  4. General formulation of long-range degree correlations in complex networks

    NASA Astrophysics Data System (ADS)

    Fujiki, Yuka; Takaguchi, Taro; Yakubo, Kousuke

    2018-06-01

    We provide a general framework for analyzing degree correlations between nodes separated by more than one step (i.e., beyond nearest neighbors) in complex networks. One joint and four conditional probability distributions are introduced to fully describe long-range degree correlations with respect to degrees k and k' of two nodes and shortest path length l between them. We present general relations among these probability distributions and clarify the relevance to nearest-neighbor degree correlations. Unlike nearest-neighbor correlations, some of these probability distributions are meaningful only in finite-size networks. Furthermore, as a baseline to determine the existence of intrinsic long-range degree correlations in a network other than inevitable correlations caused by the finite-size effect, the functional forms of these probability distributions for random networks are analytically evaluated within a mean-field approximation. The utility of our argument is demonstrated by applying it to real-world networks.

  5. Advanced Algorithms for Local Routing Strategy on Complex Networks

    PubMed Central

    Lin, Benchuan; Chen, Bokui; Gao, Yachun; Tse, Chi K.; Dong, Chuanfei; Miao, Lixin; Wang, Binghong

    2016-01-01

    Despite the significant improvement on network performance provided by global routing strategies, their applications are still limited to small-scale networks, due to the need for acquiring global information of the network which grows and changes rapidly with time. Local routing strategies, however, need much less local information, though their transmission efficiency and network capacity are much lower than that of global routing strategies. In view of this, three algorithms are proposed and a thorough investigation is conducted in this paper. These algorithms include a node duplication avoidance algorithm, a next-nearest-neighbor algorithm and a restrictive queue length algorithm. After applying them to typical local routing strategies, the critical generation rate of information packets Rc increases by over ten-fold and the average transmission time 〈T〉 decreases by 70–90 percent, both of which are key physical quantities to assess the efficiency of routing strategies on complex networks. More importantly, in comparison with global routing strategies, the improved local routing strategies can yield better network performance under certain circumstances. This is a revolutionary leap for communication networks, because local routing strategy enjoys great superiority over global routing strategy not only in terms of the reduction of computational expense, but also in terms of the flexibility of implementation, especially for large-scale networks. PMID:27434502

  6. Prediction of Drug-Plasma Protein Binding Using Artificial Intelligence Based Algorithms.

    PubMed

    Kumar, Rajnish; Sharma, Anju; Siddiqui, Mohammed Haris; Tiwari, Rajesh Kumar

    2018-01-01

    Plasma protein binding (PPB) has vital importance in the characterization of drug distribution in the systemic circulation. Unfavorable PPB can pose a negative effect on clinical development of promising drug candidates. The drug distribution properties should be considered at the initial phases of the drug design and development. Therefore, PPB prediction models are receiving an increased attention. In the current study, we present a systematic approach using Support vector machine, Artificial neural network, k- nearest neighbor, Probabilistic neural network, Partial least square and Linear discriminant analysis to relate various in vitro and in silico molecular descriptors to a diverse dataset of 736 drugs/drug-like compounds. The overall accuracy of Support vector machine with Radial basis function kernel came out to be comparatively better than the rest of the applied algorithms. The training set accuracy, validation set accuracy, precision, sensitivity, specificity and F1 score for the Suprort vector machine was found to be 89.73%, 89.97%, 92.56%, 87.26%, 91.97% and 0.898, respectively. This model can potentially be useful in screening of relevant drug candidates at the preliminary stages of drug design and development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  7. Modeling occupancy distribution in large spaces with multi-feature classification algorithm

    DOE PAGES

    Wang, Wei; Chen, Jiayu; Hong, Tianzhen

    2018-04-07

    We present that occupancy information enables robust and flexible control of heating, ventilation, and air-conditioning (HVAC) systems in buildings. In large spaces, multiple HVAC terminals are typically installed to provide cooperative services for different thermal zones, and the occupancy information determines the cooperation among terminals. However, a person count at room-level does not adequately optimize HVAC system operation due to the movement of occupants within the room that creates uneven load distribution. Without accurate knowledge of the occupants’ spatial distribution, the uneven distribution of occupants often results in under-cooling/heating or over-cooling/heating in some thermal zones. Therefore, the lack of high-resolutionmore » occupancy distribution is often perceived as a bottleneck for future improvements to HVAC operation efficiency. To fill this gap, this study proposes a multi-feature k-Nearest-Neighbors (k-NN) classification algorithm to extract occupancy distribution through reliable, low-cost Bluetooth Low Energy (BLE) networks. An on-site experiment was conducted in a typical office of an institutional building to demonstrate the proposed methods, and the experiment outcomes of three case studies were examined to validate detection accuracy. One method based on City Block Distance (CBD) was used to measure the distance between detected occupancy distribution and ground truth and assess the results of occupancy distribution. Finally, the results show the accuracy when CBD = 1 is over 71.4% and the accuracy when CBD = 2 can reach up to 92.9%.« less

  8. Modeling occupancy distribution in large spaces with multi-feature classification algorithm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Wei; Chen, Jiayu; Hong, Tianzhen

    We present that occupancy information enables robust and flexible control of heating, ventilation, and air-conditioning (HVAC) systems in buildings. In large spaces, multiple HVAC terminals are typically installed to provide cooperative services for different thermal zones, and the occupancy information determines the cooperation among terminals. However, a person count at room-level does not adequately optimize HVAC system operation due to the movement of occupants within the room that creates uneven load distribution. Without accurate knowledge of the occupants’ spatial distribution, the uneven distribution of occupants often results in under-cooling/heating or over-cooling/heating in some thermal zones. Therefore, the lack of high-resolutionmore » occupancy distribution is often perceived as a bottleneck for future improvements to HVAC operation efficiency. To fill this gap, this study proposes a multi-feature k-Nearest-Neighbors (k-NN) classification algorithm to extract occupancy distribution through reliable, low-cost Bluetooth Low Energy (BLE) networks. An on-site experiment was conducted in a typical office of an institutional building to demonstrate the proposed methods, and the experiment outcomes of three case studies were examined to validate detection accuracy. One method based on City Block Distance (CBD) was used to measure the distance between detected occupancy distribution and ground truth and assess the results of occupancy distribution. Finally, the results show the accuracy when CBD = 1 is over 71.4% and the accuracy when CBD = 2 can reach up to 92.9%.« less

  9. Band nesting, massive Dirac fermions, and valley Landé and Zeeman effects in transition metal dichalcogenides: A tight-binding model

    NASA Astrophysics Data System (ADS)

    Bieniek, Maciej; Korkusiński, Marek; Szulakowska, Ludmiła; Potasz, Paweł; Ozfidan, Isil; Hawrylak, Paweł

    2018-02-01

    We present here the minimal tight-binding model for a single layer of transition metal dichalcogenides (TMDCs) MX 2(M , metal; X , chalcogen) which illuminates the physics and captures band nesting, massive Dirac fermions, and valley Landé and Zeeman magnetic field effects. TMDCs share the hexagonal lattice with graphene but their electronic bands require much more complex atomic orbitals. Using symmetry arguments, a minimal basis consisting of three metal d orbitals and three chalcogen dimer p orbitals is constructed. The tunneling matrix elements between nearest-neighbor metal and chalcogen orbitals are explicitly derived at K ,-K , and Γ points of the Brillouin zone. The nearest-neighbor tunneling matrix elements connect specific metal and sulfur orbitals yielding an effective 6 ×6 Hamiltonian giving correct composition of metal and chalcogen orbitals but not the direct gap at K points. The direct gap at K , correct masses, and conduction band minima at Q points responsible for band nesting are obtained by inclusion of next-neighbor Mo-Mo tunneling. The parameters of the next-nearest-neighbor model are successfully fitted to MX 2(M =Mo ; X =S ) density functional ab initio calculations of the highest valence and lowest conduction band dispersion along K -Γ line in the Brillouin zone. The effective two-band massive Dirac Hamiltonian for MoS2, Landé g factors, and valley Zeeman splitting are obtained.

  10. Curve Set Feature-Based Robust and Fast Pose Estimation Algorithm

    PubMed Central

    Hashimoto, Koichi

    2017-01-01

    Bin picking refers to picking the randomly-piled objects from a bin for industrial production purposes, and robotic bin picking is always used in automated assembly lines. In order to achieve a higher productivity, a fast and robust pose estimation algorithm is necessary to recognize and localize the randomly-piled parts. This paper proposes a pose estimation algorithm for bin picking tasks using point cloud data. A novel descriptor Curve Set Feature (CSF) is proposed to describe a point by the surface fluctuation around this point and is also capable of evaluating poses. The Rotation Match Feature (RMF) is proposed to match CSF efficiently. The matching process combines the idea of the matching in 2D space of origin Point Pair Feature (PPF) algorithm with nearest neighbor search. A voxel-based pose verification method is introduced to evaluate the poses and proved to be more than 30-times faster than the kd-tree-based verification method. Our algorithm is evaluated against a large number of synthetic and real scenes and proven to be robust to noise, able to detect metal parts, more accurately and more than 10-times faster than PPF and Oriented, Unique and Repeatable (OUR)-Clustered Viewpoint Feature Histogram (CVFH). PMID:28771216

  11. Development of an indoor location based service test bed and geographic information system with a wireless sensor network.

    PubMed

    Jan, Shau-Shiun; Hsu, Li-Ta; Tsai, Wen-Ming

    2010-01-01

    In order to provide the seamless navigation and positioning services for indoor environments, an indoor location based service (LBS) test bed is developed to integrate the indoor positioning system and the indoor three-dimensional (3D) geographic information system (GIS). A wireless sensor network (WSN) is used in the developed indoor positioning system. Considering the power consumption, in this paper the ZigBee radio is used as the wireless protocol, and the received signal strength (RSS) fingerprinting positioning method is applied as the primary indoor positioning algorithm. The matching processes of the user location include the nearest neighbor (NN) algorithm, the K-weighted nearest neighbors (KWNN) algorithm, and the probabilistic approach. To enhance the positioning accuracy for the dynamic user, the particle filter is used to improve the positioning performance. As part of this research, a 3D indoor GIS is developed to be used with the indoor positioning system. This involved using the computer-aided design (CAD) software and the virtual reality markup language (VRML) to implement a prototype indoor LBS test bed. Thus, a rapid and practical procedure for constructing a 3D indoor GIS is proposed, and this GIS is easy to update and maintenance for users. The building of the Department of Aeronautics and Astronautics at National Cheng Kung University in Taiwan is used as an example to assess the performance of various algorithms for the indoor positioning system.

  12. Development of an Indoor Location Based Service Test Bed and Geographic Information System with a Wireless Sensor Network

    PubMed Central

    Jan, Shau-Shiun; Hsu, Li-Ta; Tsai, Wen-Ming

    2010-01-01

    In order to provide the seamless navigation and positioning services for indoor environments, an indoor location based service (LBS) test bed is developed to integrate the indoor positioning system and the indoor three-dimensional (3D) geographic information system (GIS). A wireless sensor network (WSN) is used in the developed indoor positioning system. Considering the power consumption, in this paper the ZigBee radio is used as the wireless protocol, and the received signal strength (RSS) fingerprinting positioning method is applied as the primary indoor positioning algorithm. The matching processes of the user location include the nearest neighbor (NN) algorithm, the K-weighted nearest neighbors (KWNN) algorithm, and the probabilistic approach. To enhance the positioning accuracy for the dynamic user, the particle filter is used to improve the positioning performance. As part of this research, a 3D indoor GIS is developed to be used with the indoor positioning system. This involved using the computer-aided design (CAD) software and the virtual reality markup language (VRML) to implement a prototype indoor LBS test bed. Thus, a rapid and practical procedure for constructing a 3D indoor GIS is proposed, and this GIS is easy to update and maintenance for users. The building of the Department of Aeronautics and Astronautics at National Cheng Kung University in Taiwan is used as an example to assess the performance of various algorithms for the indoor positioning system. PMID:22319282

  13. Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study.

    PubMed

    Olivera, André Rodrigues; Roesler, Valter; Iochpe, Cirano; Schmidt, Maria Inês; Vigo, Álvaro; Barreto, Sandhi Maria; Duncan, Bruce Bartholow

    2017-01-01

    Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. The best models were created using artificial neural networks and logistic regression. -These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.

  14. The application of k-Nearest Neighbour in the identification of high potential archers based on relative psychological coping skills variables

    NASA Astrophysics Data System (ADS)

    Taha, Zahari; Muazu Musa, Rabiu; Majeed, Anwar P. P. Abdul; Razali Abdullah, Mohamad; Muaz Alim, Muhammad; Nasir, Ahmad Fakhri Ab

    2018-04-01

    The present study aims at classifying and predicting high and low potential archers from a collection of psychological coping skills variables trained on different k-Nearest Neighbour (k-NN) kernels. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. Psychological coping skills inventory which evaluates the archers level of related coping skills were filled out by the archers prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed k-NN models, i.e. fine, medium, coarse, cosine, cubic and weighted kernel functions, were trained on the psychological variables. The k-means clustered the archers into high psychologically prepared archers (HPPA) and low psychologically prepared archers (LPPA), respectively. It was demonstrated that the cosine k-NN model exhibited good accuracy and precision throughout the exercise with an accuracy of 94% and considerably fewer error rate for the prediction of the HPPA and the LPPA as compared to the rest of the models. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected psychological coping skills variables examined which would consequently save time and energy during talent identification and development programme.

  15. Multidimensional Optimization of Signal Space Distance Parameters in WLAN Positioning

    PubMed Central

    Brković, Milenko; Simić, Mirjana

    2014-01-01

    Accurate indoor localization of mobile users is one of the challenging problems of the last decade. Besides delivering high speed Internet, Wireless Local Area Network (WLAN) can be used as an effective indoor positioning system, being competitive both in terms of accuracy and cost. Among the localization algorithms, nearest neighbor fingerprinting algorithms based on Received Signal Strength (RSS) parameter have been extensively studied as an inexpensive solution for delivering indoor Location Based Services (LBS). In this paper, we propose the optimization of the signal space distance parameters in order to improve precision of WLAN indoor positioning, based on nearest neighbor fingerprinting algorithms. Experiments in a real WLAN environment indicate that proposed optimization leads to substantial improvements of the localization accuracy. Our approach is conceptually simple, is easy to implement, and does not require any additional hardware. PMID:24757443

  16. Exploitation of RF-DNA for Device Classification and Verification Using GRLVQI Processing

    DTIC Science & Technology

    2012-12-01

    5 FLD Fisher’s Linear Discriminant . . . . . . . . . . . . . . . . . . . 6 kNN K-Nearest Neighbor...Neighbor ( kNN ), Support Vector Machine (SVM), and simple cross-correlation techniques [40, 57, 82, 88, 94, 95]. The RF-DNA fingerprinting research in...Expansion and the Dis- crete Gabor Transform on a Non-Separable Lattice”. 2000 IEEE Int’l Conf on Acoustics, Speech , and Signal Processing (ICASSP00

  17. Clustering for Binary Data Sets by Using Genetic Algorithm-Incremental K-means

    NASA Astrophysics Data System (ADS)

    Saharan, S.; Baragona, R.; Nor, M. E.; Salleh, R. M.; Asrah, N. M.

    2018-04-01

    This research was initially driven by the lack of clustering algorithms that specifically focus in binary data. To overcome this gap in knowledge, a promising technique for analysing this type of data became the main subject in this research, namely Genetic Algorithms (GA). For the purpose of this research, GA was combined with the Incremental K-means (IKM) algorithm to cluster the binary data streams. In GAIKM, the objective function was based on a few sufficient statistics that may be easily and quickly calculated on binary numbers. The implementation of IKM will give an advantage in terms of fast convergence. The results show that GAIKM is an efficient and effective new clustering algorithm compared to the clustering algorithms and to the IKM itself. In conclusion, the GAIKM outperformed other clustering algorithms such as GCUK, IKM, Scalable K-means (SKM) and K-means clustering and paves the way for future research involving missing data and outliers.

  18. a Gross Error Elimination Method for Point Cloud Data Based on Kd-Tree

    NASA Astrophysics Data System (ADS)

    Kang, Q.; Huang, G.; Yang, S.

    2018-04-01

    Point cloud data has been one type of widely used data sources in the field of remote sensing. Key steps of point cloud data's pro-processing focus on gross error elimination and quality control. Owing to the volume feature of point could data, existed gross error elimination methods need spend massive memory both in space and time. This paper employed a new method which based on Kd-tree algorithm to construct, k-nearest neighbor algorithm to search, settled appropriate threshold to determine with result turns out a judgement that whether target point is or not an outlier. Experimental results show that, our proposed algorithm will help to delete gross error in point cloud data and facilitate to decrease memory consumption, improve efficiency.

  19. Towards enhancement of performance of K-means clustering using nature-inspired optimization algorithms.

    PubMed

    Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan

    2014-01-01

    Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.

  20. McTwo: a two-step feature selection algorithm based on maximal information coefficient.

    PubMed

    Ge, Ruiquan; Zhou, Manli; Luo, Youxi; Meng, Qinghan; Mai, Guoqin; Ma, Dongli; Wang, Guoqing; Zhou, Fengfeng

    2016-03-23

    High-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to various difficulties. This "large p, small n" paradigm in the area of biomedical "big data" may be at least partly solved by feature selection algorithms, which select only features significantly associated with phenotypes. Feature selection is an NP-hard problem. Due to the exponentially increased time requirement for finding the globally optimal solution, all the existing feature selection algorithms employ heuristic rules to find locally optimal solutions, and their solutions achieve different performances on different datasets. This work describes a feature selection algorithm based on a recently published correlation measurement, Maximal Information Coefficient (MIC). The proposed algorithm, McTwo, aims to select features associated with phenotypes, independently of each other, and achieving high classification performance of the nearest neighbor algorithm. Based on the comparative study of 17 datasets, McTwo performs about as well as or better than existing algorithms, with significantly reduced numbers of selected features. The features selected by McTwo also appear to have particular biomedical relevance to the phenotypes from the literature. McTwo selects a feature subset with very good classification performance, as well as a small feature number. So McTwo may represent a complementary feature selection algorithm for the high-dimensional biomedical datasets.

  1. Clustering performance comparison using K-means and expectation maximization algorithms.

    PubMed

    Jung, Yong Gyu; Kang, Min Soo; Heo, Jun

    2014-11-14

    Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.

  2. A novel harmony search-K means hybrid algorithm for clustering gene expression data

    PubMed Central

    Nazeer, KA Abdul; Sebastian, MP; Kumar, SD Madhu

    2013-01-01

    Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms. PMID:23390351

  3. A novel harmony search-K means hybrid algorithm for clustering gene expression data.

    PubMed

    Nazeer, Ka Abdul; Sebastian, Mp; Kumar, Sd Madhu

    2013-01-01

    Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.

  4. The advantages of the surface Laplacian in brain-computer interface research.

    PubMed

    McFarland, Dennis J

    2015-09-01

    Brain-computer interface (BCI) systems frequently use signal processing methods, such as spatial filtering, to enhance performance. The surface Laplacian can reduce spatial noise and aid in identification of sources. In BCI research, these two functions of the surface Laplacian correspond to prediction accuracy and signal orthogonality. In the present study, an off-line analysis of data from a sensorimotor rhythm-based BCI task dissociated these functions of the surface Laplacian by comparing nearest-neighbor and next-nearest neighbor Laplacian algorithms. The nearest-neighbor Laplacian produced signals that were more orthogonal while the next-nearest Laplacian produced signals that resulted in better accuracy. Both prediction and signal identification are important for BCI research. Better prediction of user's intent produces increased speed and accuracy of communication and control. Signal identification is important for ruling out the possibility of control by artifacts. Identifying the nature of the control signal is relevant both to understanding exactly what is being studied and in terms of usability for individuals with limited motor control. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. Towards Enhancement of Performance of K-Means Clustering Using Nature-Inspired Optimization Algorithms

    PubMed Central

    Deb, Suash; Yang, Xin-She

    2014-01-01

    Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730

  6. Computer Simulation of Energy Parameters and Magnetic Effects in Fe-Si-C Ternary Alloys

    NASA Astrophysics Data System (ADS)

    Ridnyi, Ya. M.; Mirzoev, A. A.; Mirzaev, D. A.

    2018-06-01

    The paper presents ab initio simulation with the WIEN2k software package of the equilibrium structure and properties of silicon and carbon atoms dissolved in iron with the body-centered cubic crystal system of the lattice. Silicon and carbon atoms manifest a repulsive interaction in the first two nearest neighbors, in the second neighbor the repulsion being stronger than in the first. In the third and next-nearest neighbors a very weak repulsive interaction occurs and tends to zero with increasing distance between atoms. Silicon and carbon dissolution reduces the magnetic moment of iron atoms.

  7. Learning Instance-Specific Predictive Models

    PubMed Central

    Visweswaran, Shyam; Cooper, Gregory F.

    2013-01-01

    This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. PMID:25045325

  8. SibRank: Signed bipartite network analysis for neighbor-based collaborative ranking

    NASA Astrophysics Data System (ADS)

    Shams, Bita; Haratizadeh, Saman

    2016-09-01

    Collaborative ranking is an emerging field of recommender systems that utilizes users' preference data rather than rating values. Unfortunately, neighbor-based collaborative ranking has gained little attention despite its more flexibility and justifiability. This paper proposes a novel framework, called SibRank that seeks to improve the state of the art neighbor-based collaborative ranking methods. SibRank represents users' preferences as a signed bipartite network, and finds similar users, through a novel personalized ranking algorithm in signed networks.

  9. Lip reading using neural networks

    NASA Astrophysics Data System (ADS)

    Kalbande, Dhananjay; Mishra, Akassh A.; Patil, Sanjivani; Nirgudkar, Sneha; Patel, Prashant

    2011-10-01

    Computerized lip reading, or speech reading, is concerned with the difficult task of converting a video signal of a speaking person to written text. It has several applications like teaching deaf and dumb to speak and communicate effectively with the other people, its crime fighting potential and invariance to acoustic environment. We convert the video of the subject speaking vowels into images and then images are further selected manually for processing. However, several factors like fast speech, bad pronunciation, and poor illumination, movement of face, moustaches and beards make lip reading difficult. Contour tracking methods and Template matching are used for the extraction of lips from the face. K Nearest Neighbor algorithm is then used to classify the 'speaking' images and the 'silent' images. The sequence of images is then transformed into segments of utterances. Feature vector is calculated on each frame for all the segments and is stored in the database with properly labeled class. Character recognition is performed using modified KNN algorithm which assigns more weight to nearer neighbors. This paper reports the recognition of vowels using KNN algorithms

  10. Multivariate Feature Selection of Image Descriptors Data for Breast Cancer with Computer-Assisted Diagnosis

    PubMed Central

    Galván-Tejada, Carlos E.; Zanella-Calzada, Laura A.; Galván-Tejada, Jorge I.; Celaya-Padilla, José M.; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L.

    2017-01-01

    Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions. PMID:28216571

  11. Multivariate Feature Selection of Image Descriptors Data for Breast Cancer with Computer-Assisted Diagnosis.

    PubMed

    Galván-Tejada, Carlos E; Zanella-Calzada, Laura A; Galván-Tejada, Jorge I; Celaya-Padilla, José M; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L

    2017-02-14

    Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions.

  12. Infinite projected entangled-pair state algorithm for ruby and triangle-honeycomb lattices

    NASA Astrophysics Data System (ADS)

    Jahromi, Saeed S.; Orús, Román; Kargarian, Mehdi; Langari, Abdollah

    2018-03-01

    The infinite projected entangled-pair state (iPEPS) algorithm is one of the most efficient techniques for studying the ground-state properties of two-dimensional quantum lattice Hamiltonians in the thermodynamic limit. Here, we show how the algorithm can be adapted to explore nearest-neighbor local Hamiltonians on the ruby and triangle-honeycomb lattices, using the corner transfer matrix (CTM) renormalization group for 2D tensor network contraction. Additionally, we show how the CTM method can be used to calculate the ground-state fidelity per lattice site and the boundary density operator and entanglement entropy (EE) on an infinite cylinder. As a benchmark, we apply the iPEPS method to the ruby model with anisotropic interactions and explore the ground-state properties of the system. We further extract the phase diagram of the model in different regimes of the couplings by measuring two-point correlators, ground-state fidelity, and EE on an infinite cylinder. Our phase diagram is in agreement with previous studies of the model by exact diagonalization.

  13. Absence of long-range order in the frustrated magnet SrDy2O4 due to trapped defects from a dimensionality crossover

    NASA Astrophysics Data System (ADS)

    Gauthier, N.; Fennell, A.; Prévost, B.; Uldry, A.-C.; Delley, B.; Sibille, R.; Désilets-Benoit, A.; Dabkowska, H. A.; Nilsen, G. J.; Regnault, L.-P.; White, J. S.; Niedermayer, C.; Pomjakushin, V.; Bianchi, A. D.; Kenzelmann, M.

    2017-04-01

    Magnetic frustration and low dimensionality can prevent long-range magnetic order and lead to exotic correlated ground states. SrDy2O4 consists of magnetic Dy3 + ions forming magnetically frustrated zigzag chains along the c axis and shows no long-range order to temperatures as low as T =60 mK. We carried out neutron scattering and ac magnetic susceptibility measurements using powder and single crystals of SrDy2O4 . Diffuse neutron scattering indicates strong one-dimensional (1D) magnetic correlations along the chain direction that can be qualitatively accounted for by the axial next-nearest-neighbor Ising model with nearest-neighbor and next-nearest-neighbor exchange J1=0.3 meV and J2=0.2 meV, respectively. Three-dimensional (3D) correlations become important below T*≈0.7 K. At T =60 mK, the short-range correlations are characterized by a putative propagation vector k1 /2=(0 ,1/2 ,1/2 ) . We argue that the absence of long-range order arises from the presence of slowly decaying 1D domain walls that are trapped due to 3D correlations. This stabilizes a low-temperature phase without long-range magnetic order, but with well-ordered chain segments separated by slowly moving domain walls.

  14. Quantum Algorithms to Simulate Many-Body Physics of Correlated Fermions

    NASA Astrophysics Data System (ADS)

    Jiang, Zhang; Sung, Kevin J.; Kechedzhi, Kostyantyn; Smelyanskiy, Vadim N.; Boixo, Sergio

    2018-04-01

    Simulating strongly correlated fermionic systems is notoriously hard on classical computers. An alternative approach, as proposed by Feynman, is to use a quantum computer. We discuss simulating strongly correlated fermionic systems using near-term quantum devices. We focus specifically on two-dimensional (2D) or linear geometry with nearest-neighbor qubit-qubit couplings, typical for superconducting transmon qubit arrays. We improve an existing algorithm to prepare an arbitrary Slater determinant by exploiting a unitary symmetry. We also present a quantum algorithm to prepare an arbitrary fermionic Gaussian state with O (N2) gates and O (N ) circuit depth. Both algorithms are optimal in the sense that the numbers of parameters in the quantum circuits are equal to those describing the quantum states. Furthermore, we propose an algorithm to implement the 2D fermionic Fourier transformation on a 2D qubit array with only O (N1.5) gates and O (√{N }) circuit depth, which is the minimum depth required for quantum information to travel across the qubit array. We also present methods to simulate each time step in the evolution of the 2D Fermi-Hubbard model—again on a 2D qubit array—with O (N ) gates and O (√{N }) circuit depth. Finally, we discuss how these algorithms can be used to determine the ground-state properties and phase diagrams of strongly correlated quantum systems using the Hubbard model as an example.

  15. A Locally Optimal Algorithm for Estimating a Generating Partition from an Observed Time Series and Its Application to Anomaly Detection.

    PubMed

    Ghalyan, Najah F; Miller, David J; Ray, Asok

    2018-06-12

    Estimation of a generating partition is critical for symbolization of measurements from discrete-time dynamical systems, where a sequence of symbols from a (finite-cardinality) alphabet may uniquely specify the underlying time series. Such symbolization is useful for computing measures (e.g., Kolmogorov-Sinai entropy) to identify or characterize the (possibly unknown) dynamical system. It is also useful for time series classification and anomaly detection. The seminal work of Hirata, Judd, and Kilminster (2004) derives a novel objective function, akin to a clustering objective, that measures the discrepancy between a set of reconstruction values and the points from the time series. They cast estimation of a generating partition via the minimization of their objective function. Unfortunately, their proposed algorithm is nonconvergent, with no guarantee of finding even locally optimal solutions with respect to their objective. The difficulty is a heuristic-nearest neighbor symbol assignment step. Alternatively, we develop a novel, locally optimal algorithm for their objective. We apply iterative nearest-neighbor symbol assignments with guaranteed discrepancy descent, by which joint, locally optimal symbolization of the entire time series is achieved. While most previous approaches frame generating partition estimation as a state-space partitioning problem, we recognize that minimizing the Hirata et al. (2004) objective function does not induce an explicit partitioning of the state space, but rather the space consisting of the entire time series (effectively, clustering in a (countably) infinite-dimensional space). Our approach also amounts to a novel type of sliding block lossy source coding. Improvement, with respect to several measures, is demonstrated over popular methods for symbolizing chaotic maps. We also apply our approach to time-series anomaly detection, considering both chaotic maps and failure application in a polycrystalline alloy material.

  16. Carbon-hydrogen defects with a neighboring oxygen atom in n-type Si

    NASA Astrophysics Data System (ADS)

    Gwozdz, K.; Stübner, R.; Kolkovsky, Vl.; Weber, J.

    2017-07-01

    We report on the electrical activation of neutral carbon-oxygen complexes in Si by wet-chemical etching at room temperature. Two deep levels, E65 and E75, are observed by deep level transient spectroscopy in n-type Czochralski Si. The activation enthalpies of E65 and E75 are obtained as EC-0.11 eV (E65) and EC-0.13 eV (E75). The electric field dependence of their emission rates relates both levels to single acceptor states. From the analysis of the depth profiles, we conclude that the levels belong to two different defects, which contain only one hydrogen atom. A configuration is proposed, where the CH1BC defect, with hydrogen in the bond-centered position between neighboring C and Si atoms, is disturbed by interstitial oxygen in the second nearest neighbor position to substitutional carbon. The significant reduction of the CH1BC concentration in samples with high oxygen concentrations limits the use of this defect for the determination of low concentrations of substitutional carbon in Si samples.

  17. A computational approach for hypersonic nonequilibrium radiation utilizing space partition algorithm and Gauss quadrature

    NASA Astrophysics Data System (ADS)

    Shang, J. S.; Andrienko, D. A.; Huang, P. G.; Surzhikov, S. T.

    2014-06-01

    An efficient computational capability for nonequilibrium radiation simulation via the ray tracing technique has been accomplished. The radiative rate equation is iteratively coupled with the aerodynamic conservation laws including nonequilibrium chemical and chemical-physical kinetic models. The spectral properties along tracing rays are determined by a space partition algorithm of the nearest neighbor search process, and the numerical accuracy is further enhanced by a local resolution refinement using the Gauss-Lobatto polynomial. The interdisciplinary governing equations are solved by an implicit delta formulation through the diminishing residual approach. The axisymmetric radiating flow fields over the reentry RAM-CII probe have been simulated and verified with flight data and previous solutions by traditional methods. A computational efficiency gain nearly forty times is realized over that of the existing simulation procedures.

  18. Membership-degree preserving discriminant analysis with applications to face recognition.

    PubMed

    Yang, Zhangjing; Liu, Chuancai; Huang, Pu; Qian, Jianjun

    2013-01-01

    In pattern recognition, feature extraction techniques have been widely employed to reduce the dimensionality of high-dimensional data. In this paper, we propose a novel feature extraction algorithm called membership-degree preserving discriminant analysis (MPDA) based on the fisher criterion and fuzzy set theory for face recognition. In the proposed algorithm, the membership degree of each sample to particular classes is firstly calculated by the fuzzy k-nearest neighbor (FKNN) algorithm to characterize the similarity between each sample and class centers, and then the membership degree is incorporated into the definition of the between-class scatter and the within-class scatter. The feature extraction criterion via maximizing the ratio of the between-class scatter to the within-class scatter is applied. Experimental results on the ORL, Yale, and FERET face databases demonstrate the effectiveness of the proposed algorithm.

  19. Handling Neighbor Discovery and Rendezvous Consistency with Weighted Quorum-Based Approach

    PubMed Central

    Own, Chung-Ming; Meng, Zhaopeng; Liu, Kehan

    2015-01-01

    Neighbor discovery and the power of sensors play an important role in the formation of Wireless Sensor Networks (WSNs) and mobile networks. Many asynchronous protocols based on wake-up time scheduling have been proposed to enable neighbor discovery among neighboring nodes for the energy saving, especially in the difficulty of clock synchronization. However, existing researches are divided two parts with the neighbor-discovery methods, one is the quorum-based protocols and the other is co-primality based protocols. Their distinction is on the arrangements of time slots, the former uses the quorums in the matrix, the latter adopts the numerical analysis. In our study, we propose the weighted heuristic quorum system (WQS), which is based on the quorum algorithm to eliminate redundant paths of active slots. We demonstrate the specification of our system: fewer active slots are required, the referring rate is balanced, and remaining power is considered particularly when a device maintains rendezvous with discovered neighbors. The evaluation results showed that our proposed method can effectively reschedule the active slots and save the computing time of the network system. PMID:26404297

  20. The Research on Denoising of SAR Image Based on Improved K-SVD Algorithm

    NASA Astrophysics Data System (ADS)

    Tan, Linglong; Li, Changkai; Wang, Yueqin

    2018-04-01

    SAR images often receive noise interference in the process of acquisition and transmission, which can greatly reduce the quality of images and cause great difficulties for image processing. The existing complete DCT dictionary algorithm is fast in processing speed, but its denoising effect is poor. In this paper, the problem of poor denoising, proposed K-SVD (K-means and singular value decomposition) algorithm is applied to the image noise suppression. Firstly, the sparse dictionary structure is introduced in detail. The dictionary has a compact representation and can effectively train the image signal. Then, the sparse dictionary is trained by K-SVD algorithm according to the sparse representation of the dictionary. The algorithm has more advantages in high dimensional data processing. Experimental results show that the proposed algorithm can remove the speckle noise more effectively than the complete DCT dictionary and retain the edge details better.

  1. Fast Exact Search in Hamming Space With Multi-Index Hashing.

    PubMed

    Norouzi, Mohammad; Punjani, Ali; Fleet, David J

    2014-06-01

    There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used as such, as it was thought to be ineffective. We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. The approach is storage efficient and straight-forward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. Empirical results show dramatic speedups over a linear scan baseline for datasets of up to one billion codes of 64, 128, or 256 bits.

  2. Line Thinning Algorithm

    NASA Astrophysics Data System (ADS)

    Feigin, G.; Ben-Yosef, N.

    1983-10-01

    A thinning algorithm, of the banana-peel type, is presented. In each iteration pixels are attacked from all directions (there are no sub-iterations), and the deletion criteria depend on the 24 nearest neighbours.

  3. Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model

    PubMed Central

    Mitra, Rajib; Jordan, Michael I.; Dunbrack, Roland L.

    2010-01-01

    Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp. PMID:20442867

  4. Feature weight estimation for gene selection: a local hyperlinear learning approach

    PubMed Central

    2014-01-01

    Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071

  5. A Comparison of Rule-Based, K-Nearest Neighbor, and Neural Net Classifiers for Automated

    Treesearch

    Tai-Hoon Cho; Richard W. Conners; Philip A. Araman

    1991-01-01

    Over the last few years the authors have been involved in research aimed at developing a machine vision system for locating and identifying surface defects on materials. The particular problem being studied involves locating surface defects on hardwood lumber in a species independent manner. Obviously, the accurate location and identification of defects is of paramount...

  6. Stratified estimates of forest area using the k-nearest neighbors technique and satellite imagery

    Treesearch

    Ronald E. McRoberts; Mark D. Nelson; Daniel Wendt

    2002-01-01

    For two study areas in Minnesota, stratified estimation using Landsat Thematic Mapper satellite imagery as the basis for stratification was used to estimate forest area. Measurements of forest inventory plots obtained for a 12-month period in 1998 and 1999 were used as the source of data for within-strata estimates. These measurements further served as calibration data...

  7. Developing robust arsenic awareness prediction models using machine learning algorithms.

    PubMed

    Singh, Sushant K; Taylor, Robert W; Rahman, Mohammad Mahmudur; Pradhan, Biswajeet

    2018-04-01

    Arsenic awareness plays a vital role in ensuring the sustainability of arsenic mitigation technologies. Thus far, however, few studies have dealt with the sustainability of such technologies and its associated socioeconomic dimensions. As a result, arsenic awareness prediction has not yet been fully conceptualized. Accordingly, this study evaluated arsenic awareness among arsenic-affected communities in rural India, using a structured questionnaire to record socioeconomic, demographic, and other sociobehavioral factors with an eye to assessing their association with and influence on arsenic awareness. First a logistic regression model was applied and its results compared with those produced by six state-of-the-art machine-learning algorithms (Support Vector Machine [SVM], Kernel-SVM, Decision Tree [DT], k-Nearest Neighbor [k-NN], Naïve Bayes [NB], and Random Forests [RF]) as measured by their accuracy at predicting arsenic awareness. Most (63%) of the surveyed population was found to be arsenic-aware. Significant arsenic awareness predictors were divided into three types: (1) socioeconomic factors: caste, education level, and occupation; (2) water and sanitation behavior factors: number of family members involved in water collection, distance traveled and time spent for water collection, places for defecation, and materials used for handwashing after defecation; and (3) social capital and trust factors: presence of anganwadi and people's trust in other community members, NGOs, and private agencies. Moreover, individuals' having higher social network positively contributed to arsenic awareness in the communities. Results indicated that both the SVM and the RF algorithms outperformed at overall prediction of arsenic awareness-a nonlinear classification problem. Lower-caste, less educated, and unemployed members of the population were found to be the most vulnerable, requiring immediate arsenic mitigation. To this end, local social institutions and NGOs could play a

  8. A Fast Implementation of the ISOCLUS Algorithm

    NASA Technical Reports Server (NTRS)

    Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

    2003-01-01

    Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor

  9. Quantitative diagnosis of bladder cancer by morphometric analysis of HE images

    NASA Astrophysics Data System (ADS)

    Wu, Binlin; Nebylitsa, Samantha V.; Mukherjee, Sushmita; Jain, Manu

    2015-02-01

    In clinical practice, histopathological analysis of biopsied tissue is the main method for bladder cancer diagnosis and prognosis. The diagnosis is performed by a pathologist based on the morphological features in the image of a hematoxylin and eosin (HE) stained tissue sample. This manuscript proposes algorithms to perform morphometric analysis on the HE images, quantify the features in the images, and discriminate bladder cancers with different grades, i.e. high grade and low grade. The nuclei are separated from the background and other types of cells such as red blood cells (RBCs) and immune cells using manual outlining, color deconvolution and image segmentation. A mask of nuclei is generated for each image for quantitative morphometric analysis. The features of the nuclei in the mask image including size, shape, orientation, and their spatial distributions are measured. To quantify local clustering and alignment of nuclei, we propose a 1-nearest-neighbor (1-NN) algorithm which measures nearest neighbor distance and nearest neighbor parallelism. The global distributions of the features are measured using statistics of the proposed parameters. A linear support vector machine (SVM) algorithm is used to classify the high grade and low grade bladder cancers. The results show using a particular group of nuclei such as large ones, and combining multiple parameters can achieve better discrimination. This study shows the proposed approach can potentially help expedite pathological diagnosis by triaging potentially suspicious biopsies.

  10. Applying network analysis and Nebula (neighbor-edges based and unbiased leverage algorithm) to ToxCast data.

    PubMed

    Ye, Hao; Luo, Heng; Ng, Hui Wen; Meehan, Joe; Ge, Weigong; Tong, Weida; Hong, Huixiao

    2016-01-01

    ToxCast data have been used to develop models for predicting in vivo toxicity. To predict the in vivo toxicity of a new chemical using a ToxCast data based model, its ToxCast bioactivity data are needed but not normally available. The capability of predicting ToxCast bioactivity data is necessary to fully utilize ToxCast data in the risk assessment of chemicals. We aimed to understand and elucidate the relationships between the chemicals and bioactivity data of the assays in ToxCast and to develop a network analysis based method for predicting ToxCast bioactivity data. We conducted modularity analysis on a quantitative network constructed from ToxCast data to explore the relationships between the assays and chemicals. We further developed Nebula (neighbor-edges based and unbiased leverage algorithm) for predicting ToxCast bioactivity data. Modularity analysis on the network constructed from ToxCast data yielded seven modules. Assays and chemicals in the seven modules were distinct. Leave-one-out cross-validation yielded a Q(2) of 0.5416, indicating ToxCast bioactivity data can be predicted by Nebula. Prediction domain analysis showed some types of ToxCast assay data could be more reliably predicted by Nebula than others. Network analysis is a promising approach to understand ToxCast data. Nebula is an effective algorithm for predicting ToxCast bioactivity data, helping fully utilize ToxCast data in the risk assessment of chemicals. Published by Elsevier Ltd.

  11. Adaptive phase k-means algorithm for waveform classification

    NASA Astrophysics Data System (ADS)

    Song, Chengyun; Liu, Zhining; Wang, Yaojun; Xu, Feng; Li, Xingming; Hu, Guangmin

    2018-01-01

    Waveform classification is a powerful technique for seismic facies analysis that describes the heterogeneity and compartments within a reservoir. Horizon interpretation is a critical step in waveform classification. However, the horizon often produces inconsistent waveform phase, and thus results in an unsatisfied classification. To alleviate this problem, an adaptive phase waveform classification method called the adaptive phase k-means is introduced in this paper. Our method improves the traditional k-means algorithm using an adaptive phase distance for waveform similarity measure. The proposed distance is a measure with variable phases as it moves from sample to sample along the traces. Model traces are also updated with the best phase interference in the iterative process. Therefore, our method is robust to phase variations caused by the interpretation horizon. We tested the effectiveness of our algorithm by applying it to synthetic and real data. The satisfactory results reveal that the proposed method tolerates certain waveform phase variation and is a good tool for seismic facies analysis.

  12. Masked Priming with Orthographic Neighbors: A Test of the Lexical Competition Assumption

    ERIC Educational Resources Information Center

    Nakayama, Mariko; Sears, Christopher R.; Lupker, Stephen J.

    2008-01-01

    In models of visual word identification that incorporate inhibitory competition among activated lexical units, a word's higher frequency neighbors will be the word's strongest competitors. Preactivation of these neighbors by a prime is predicted to delay the word's identification. Using the masked priming paradigm (K. I. Forster & C. Davis, 1984,…

  13. Algorithms for Autonomous Plume Detection on Outer Planet Satellites

    NASA Astrophysics Data System (ADS)

    Lin, Y.; Bunte, M. K.; Saripalli, S.; Greeley, R.

    2011-12-01

    We investigate techniques for automated detection of geophysical events (i.e., volcanic plumes) from spacecraft images. The algorithms presented here have not been previously applied to detection of transient events on outer planet satellites. We apply Scale Invariant Feature Transform (SIFT) to raw images of Io and Enceladus from the Voyager, Galileo, Cassini, and New Horizons missions. SIFT produces distinct interest points in every image; feature descriptors are reasonably invariant to changes in illumination, image noise, rotation, scaling, and small changes in viewpoint. We classified these descriptors as plumes using the k-nearest neighbor (KNN) algorithm. In KNN, an object is classified by its similarity to examples in a training set of images based on user defined thresholds. Using the complete database of Io images and a selection of Enceladus images where 1-3 plumes were manually detected in each image, we successfully detected 74% of plumes in Galileo and New Horizons images, 95% in Voyager images, and 93% in Cassini images. Preliminary tests yielded some false positive detections; further iterations will improve performance. In images where detections fail, plumes are less than 9 pixels in size or are lost in image glare. We compared the appearance of plumes and illuminated mountain slopes to determine the potential for feature classification. We successfully differentiated features. An advantage over other methods is the ability to detect plumes in non-limb views where they appear in the shadowed part of the surface; improvements will enable detection against the illuminated background surface where gradient changes would otherwise preclude detection. This detection method has potential applications to future outer planet missions for sustained plume monitoring campaigns and onboard automated prioritization of all spacecraft data. The complementary nature of this method is such that it could be used in conjunction with edge detection algorithms to

  14. Parallel transformation of K-SVD solar image denoising algorithm

    NASA Astrophysics Data System (ADS)

    Liang, Youwen; Tian, Yu; Li, Mei

    2017-02-01

    The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.

  15. iNJclust: Iterative Neighbor-Joining Tree Clustering Framework for Inferring Population Structure.

    PubMed

    Limpiti, Tulaya; Amornbunchornvej, Chainarong; Intarapanich, Apichart; Assawamakin, Anunchai; Tongsima, Sissades

    2014-01-01

    Understanding genetic differences among populations is one of the most important issues in population genetics. Genetic variations, e.g., single nucleotide polymorphisms, are used to characterize commonality and difference of individuals from various populations. This paper presents an efficient graph-based clustering framework which operates iteratively on the Neighbor-Joining (NJ) tree called the iNJclust algorithm. The framework uses well-known genetic measurements, namely the allele-sharing distance, the neighbor-joining tree, and the fixation index. The behavior of the fixation index is utilized in the algorithm's stopping criterion. The algorithm provides an estimated number of populations, individual assignments, and relationships between populations as outputs. The clustering result is reported in the form of a binary tree, whose terminal nodes represent the final inferred populations and the tree structure preserves the genetic relationships among them. The clustering performance and the robustness of the proposed algorithm are tested extensively using simulated and real data sets from bovine, sheep, and human populations. The result indicates that the number of populations within each data set is reasonably estimated, the individual assignment is robust, and the structure of the inferred population tree corresponds to the intrinsic relationships among populations within the data.

  16. What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm.

    PubMed

    Raykov, Yordan P; Boukouvalas, Alexis; Baig, Fahd; Little, Max A

    The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.

  17. What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm

    PubMed Central

    Baig, Fahd; Little, Max A.

    2016-01-01

    The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism. PMID:27669525

  18. Automated spike sorting algorithm based on Laplacian eigenmaps and k-means clustering.

    PubMed

    Chah, E; Hok, V; Della-Chiesa, A; Miller, J J H; O'Mara, S M; Reilly, R B

    2011-02-01

    This study presents a new automatic spike sorting method based on feature extraction by Laplacian eigenmaps combined with k-means clustering. The performance of the proposed method was compared against previously reported algorithms such as principal component analysis (PCA) and amplitude-based feature extraction. Two types of classifier (namely k-means and classification expectation-maximization) were incorporated within the spike sorting algorithms, in order to find a suitable classifier for the feature sets. Simulated data sets and in-vivo tetrode multichannel recordings were employed to assess the performance of the spike sorting algorithms. The results show that the proposed algorithm yields significantly improved performance with mean sorting accuracy of 73% and sorting error of 10% compared to PCA which combined with k-means had a sorting accuracy of 58% and sorting error of 10%.A correction was made to this article on 22 February 2011. The spacing of the title was amended on the abstract page. No changes were made to the article PDF and the print version was unaffected.

  19. Critical Temperature of Randomly Diluted Two-Dimensional Heisenberg Ferromagnet, K2CuxZn(1-x)F4

    NASA Astrophysics Data System (ADS)

    Okuda, Yuichi; Tohi, Yasuto; Yamada, Isao; Haseda, Taiichiro

    1980-09-01

    The susceptibility of randomly diluted two-dimensional Heisenberg-like ferromagnet K2CuxZn(1-x)F4 was measured down to 50 mK, using the 3He-4He dilution refrigerator and a SQUID magnetometer. The ferromagnetic critical temperature Tc(x) was obtained for x{=}0.98, 0.94, 0.85, 0.82, 0.68, 0.60, 0.54, 0.50 and 0.42. The value of [1/Tc(1)][(d/dx)Tc(x)]x=1 was approximately 3.0. The critical temperature versus x curve exhibits a noticeable tail near the critical concentration, which may stem from the second nearest-neighbor interaction. The critical concentration xc, below which concentration there is no long range order down to T{=}0 K, was estimated to be 0.45˜0.50. The susceptibility of sample with x{=}0.42 behaves as if it obeys the Curie law down to 50 mK.

  20. [Automatic Sleep Stage Classification Based on an Improved K-means Clustering Algorithm].

    PubMed

    Xiao, Shuyuan; Wang, Bei; Zhang, Jian; Zhang, Qunfeng; Zou, Junzhong

    2016-10-01

    Sleep stage scoring is a hotspot in the field of medicine and neuroscience.Visual inspection of sleep is laborious and the results may be subjective to different clinicians.Automatic sleep stage classification algorithm can be used to reduce the manual workload.However,there are still limitations when it encounters complicated and changeable clinical cases.The purpose of this paper is to develop an automatic sleep staging algorithm based on the characteristics of actual sleep data.In the proposed improved K-means clustering algorithm,points were selected as the initial centers by using a concept of density to avoid the randomness of the original K-means algorithm.Meanwhile,the cluster centers were updated according to the‘Three-Sigma Rule’during the iteration to abate the influence of the outliers.The proposed method was tested and analyzed on the overnight sleep data of the healthy persons and patients with sleep disorders after continuous positive airway pressure(CPAP)treatment.The automatic sleep stage classification results were compared with the visual inspection by qualified clinicians and the averaged accuracy reached 76%.With the analysis of morphological diversity of sleep data,it was proved that the proposed improved K-means algorithm was feasible and valid for clinical practice.

  1. Implementation of spectral clustering on microarray data of carcinoma using k-means algorithm

    NASA Astrophysics Data System (ADS)

    Frisca, Bustamam, Alhadi; Siswantining, Titin

    2017-03-01

    Clustering is one of data analysis methods that aims to classify data which have similar characteristics in the same group. Spectral clustering is one of the most popular modern clustering algorithms. As an effective clustering technique, spectral clustering method emerged from the concepts of spectral graph theory. Spectral clustering method needs partitioning algorithm. There are some partitioning methods including PAM, SOM, Fuzzy c-means, and k-means. Based on the research that has been done by Capital and Choudhury in 2013, when using Euclidian distance k-means algorithm provide better accuracy than PAM algorithm. So in this paper we use k-means as our partition algorithm. The major advantage of spectral clustering is in reducing data dimension, especially in this case to reduce the dimension of large microarray dataset. Microarray data is a small-sized chip made of a glass plate containing thousands and even tens of thousands kinds of genes in the DNA fragments derived from doubling cDNA. Application of microarray data is widely used to detect cancer, for the example is carcinoma, in which cancer cells express the abnormalities in his genes. The purpose of this research is to classify the data that have high similarity in the same group and the data that have low similarity in the others. In this research, Carcinoma microarray data using 7457 genes. The result of partitioning using k-means algorithm is two clusters.

  2. Water quality assessment with emphasis in parameter optimisation using pattern recognition methods and genetic algorithm.

    PubMed

    Sotomayor, Gonzalo; Hampel, Henrietta; Vázquez, Raúl F

    2018-03-01

    A non-supervised (k-means) and a supervised (k-Nearest Neighbour in combination with genetic algorithm optimisation, k-NN/GA) pattern recognition algorithms were applied for evaluating and interpreting a large complex matrix of water quality (WQ) data collected during five years (2008, 2010-2013) in the Paute river basin (southern Ecuador). 21 physical, chemical and microbiological parameters collected at 80 different WQ sampling stations were examined. At first, the k-means algorithm was carried out to identify classes of sampling stations regarding their associated WQ status by considering three internal validation indexes, i.e., Silhouette coefficient, Davies-Bouldin and Caliński-Harabasz. As a result, two WQ classes were identified, representing low (C1) and high (C2) pollution. The k-NN/GA algorithm was applied on the available data to construct a classification model with the two WQ classes, previously defined by the k-means algorithm, as the dependent variables and the 21 physical, chemical and microbiological parameters being the independent ones. This algorithm led to a significant reduction of the multidimensional space of independent variables to only nine, which are likely to explain most of the structure of the two identified WQ classes. These parameters are, namely, electric conductivity, faecal coliforms, dissolved oxygen, chlorides, total hardness, nitrate, total alkalinity, biochemical oxygen demand and turbidity. Further, the land use cover of the study basin revealed a very good agreement with the WQ spatial distribution suggested by the k-means algorithm, confirming the credibility of the main results of the used WQ data mining approach. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Chirality dependence of dipole matrix element of carbon nanotubes in axial magnetic field: A third neighbor tight binding approach

    NASA Astrophysics Data System (ADS)

    Chegel, Raad; Behzad, Somayeh

    2014-02-01

    We have studied the electronic structure and dipole matrix element, D, of carbon nanotubes (CNTs) under magnetic field, using the third nearest neighbor tight binding model. It is shown that the 1NN and 3NN-TB band structures show differences such as the spacing and mixing of neighbor subbands. Applying the magnetic field leads to breaking the degeneracy behavior in the D transitions and creates new allowed transitions corresponding to the band modifications. It is found that |D| is proportional to the inverse tube radius and chiral angle. Our numerical results show that amount of filed induced splitting for the first optical peak is proportional to the magnetic field by the splitting rate ν11. It is shown that ν11 changes linearly and parabolicly with the chiral angle and radius, respectively.

  4. Nonexposure Accurate Location K-Anonymity Algorithm in LBS

    PubMed Central

    2014-01-01

    This paper tackles location privacy protection in current location-based services (LBS) where mobile users have to report their exact location information to an LBS provider in order to obtain their desired services. Location cloaking has been proposed and well studied to protect user privacy. It blurs the user's accurate coordinate and replaces it with a well-shaped cloaked region. However, to obtain such an anonymous spatial region (ASR), nearly all existent cloaking algorithms require knowing the accurate locations of all users. Therefore, location cloaking without exposing the user's accurate location to any party is urgently needed. In this paper, we present such two nonexposure accurate location cloaking algorithms. They are designed for K-anonymity, and cloaking is performed based on the identifications (IDs) of the grid areas which were reported by all the users, instead of directly on their accurate coordinates. Experimental results show that our algorithms are more secure than the existent cloaking algorithms, need not have all the users reporting their locations all the time, and can generate smaller ASR. PMID:24605060

  5. X-ray K-edge absorption spectra of Fe minerals and model compounds: II. EXAFS

    NASA Astrophysics Data System (ADS)

    Waychunas, Glenn A.; Brown, Gordon E.; Apted, Michael J.

    1986-01-01

    K-edge extended X-ray absorption fine structure (EXAFS) spectra of Fe in varying environments in a suite of well-characterized silicate and oxide minerals were collected using synchrotron radiation and analyzed using single scattering approximation theory to yield nearest neighbor Fe-O distances and coordination numbers. The partial inverse character of synthetic hercynite spinal was verified in this way. Comparison of the results from all samples with structural data from X-ray diffraction crystal structure refinements indicates that EXAFS-derived first neighbor distances are generally accurate to ±0.02 Å using only theoretically generated phase information, and may be improved over this if similar model compounds are used to determine EXAFS phase functions. Coordination numbers are accurate to ±20 percent and can be similarly improved using model compound EXAFS amplitude information. However, in particular cases the EXAFS-derived distances may be shortened, and the coordination number reduced, by the effects of static and thermal disorder or by partial overlap of the longer Fe-O first neighbor distances with second neighbor distances in the EXAFS structure function. In the former case the total information available in the EXAFS is limited by the disorder, while in the latter case more accurate results can in principle be obtained by multiple neighbor EXAFS analysis. The EXAFS and XANES spectra of Fe in Nain, Labrador osumulite and Lakeview, Oregon plagioclase are also analyzed as an example of the application of X-ray absorption spectroscopy to metal ion site occupation determination in minerals.

  6. Effective and extensible feature extraction method using genetic algorithm-based frequency-domain feature search for epileptic EEG multiclassification

    PubMed Central

    Wen, Tingxi; Zhang, Zhongnan

    2017-01-01

    Abstract In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy. PMID:28489789

  7. Effective and extensible feature extraction method using genetic algorithm-based frequency-domain feature search for epileptic EEG multiclassification.

    PubMed

    Wen, Tingxi; Zhang, Zhongnan

    2017-05-01

    In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy.

  8. The nearest relative in mental health law.

    PubMed

    Andoh, Benjamin; Gogo, Emmanuel

    2004-04-01

    This article considers the concept of the 'nearest relative' in mental health law in England and Wales and argues, inter alia, for its retention in a way that avoids violation of the European Convention on Human Rights and the Human Rights Act 1998. It looks, first, at the meaning of nearest relative and then focuses on his/her role today, including its link with advance directives for mental health care, and on the tension between nearest relatives and approved social workers and the law. The problem exposed by JT v. United Kingdom in relation to the Human Rights Act 1998 and its implications for the future are considered. The impact of the Mental Health Bill (2002) on the nearest relative is discussed and recommendations to improve the present law are then suggested.

  9. GPU Accelerated Chemical Similarity Calculation for Compound Library Comparison

    PubMed Central

    Ma, Chao; Wang, Lirong; Xie, Xiang-Qun

    2012-01-01

    Chemical similarity calculation plays an important role in compound library design, virtual screening, and “lead” optimization. In this manuscript, we present a novel GPU-accelerated algorithm for all-vs-all Tanimoto matrix calculation and nearest neighbor search. By taking advantage of multi-core GPU architecture and CUDA parallel programming technology, the algorithm is up to 39 times superior to the existing commercial software that runs on CPUs. Because of the utilization of intrinsic GPU instructions, this approach is nearly 10 times faster than existing GPU-accelerated sparse vector algorithm, when Unity fingerprints are used for Tanimoto calculation. The GPU program that implements this new method takes about 20 minutes to complete the calculation of Tanimoto coefficients between 32M PubChem compounds and 10K Active Probes compounds, i.e., 324G Tanimoto coefficients, on a 128-CUDA-core GPU. PMID:21692447

  10. An Efficient Statistical Computation Technique for Health Care Big Data using R

    NASA Astrophysics Data System (ADS)

    Sushma Rani, N.; Srinivasa Rao, P., Dr; Parimala, P.

    2017-08-01

    Due to the changes in living conditions and other factors many critical health related problems are arising. The diagnosis of the problem at earlier stages will increase the chances of survival and fast recovery. This reduces the time of recovery and the cost associated for the treatment. One such medical related issue is cancer and breast cancer has been identified as the second leading cause of cancer death. If detected in the early stage it can be cured. Once a patient is detected with breast cancer tumor, it should be classified whether it is cancerous or non-cancerous. So the paper uses k-nearest neighbors(KNN) algorithm which is one of the simplest machine learning algorithms and is an instance-based learning algorithm to classify the data. Day-to -day new records are added which leds to increase in the data to be classified and this tends to be big data problem. The algorithm is implemented in R whichis the most popular platform applied to machine learning algorithms for statistical computing. Experimentation is conducted by using various classification evaluation metric onvarious values of k. The results show that the KNN algorithm out performes better than existing models.

  11. An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China.

    PubMed

    Zou, Hui; Zou, Zhihong; Wang, Xiaojing

    2015-11-12

    The increase and the complexity of data caused by the uncertain environment is today's reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006-2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality.

  12. Ensemble Clustering Classification Applied to Competing SVM and One-Class Classifiers Exemplified by Plant MicroRNAs Data.

    PubMed

    Yousef, Malik; Khalifa, Waleed; AbdAllah, Loai

    2016-12-01

    The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that EC-kNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.

  13. Pre-Scheduled and Self Organized Sleep-Scheduling Algorithms for Efficient K-Coverage in Wireless Sensor Networks

    PubMed Central

    Hwang, I-Shyan

    2017-01-01

    The K-coverage configuration that guarantees coverage of each location by at least K sensors is highly popular and is extensively used to monitor diversified applications in wireless sensor networks. Long network lifetime and high detection quality are the essentials of such K-covered sleep-scheduling algorithms. However, the existing sleep-scheduling algorithms either cause high cost or cannot preserve the detection quality effectively. In this paper, the Pre-Scheduling-based K-coverage Group Scheduling (PSKGS) and Self-Organized K-coverage Scheduling (SKS) algorithms are proposed to settle the problems in the existing sleep-scheduling algorithms. Simulation results show that our pre-scheduled-based KGS approach enhances the detection quality and network lifetime, whereas the self-organized-based SKS algorithm minimizes the computation and communication cost of the nodes and thereby is energy efficient. Besides, SKS outperforms PSKGS in terms of network lifetime and detection quality as it is self-organized. PMID:29257078

  14. Probabilistic Neighborhood-Based Data Collection Algorithms for 3D Underwater Acoustic Sensor Networks

    PubMed Central

    Han, Guangjie; Li, Shanshan; Zhu, Chunsheng; Jiang, Jinfang; Zhang, Wenbo

    2017-01-01

    Marine environmental monitoring provides crucial information and support for the exploitation, utilization, and protection of marine resources. With the rapid development of information technology, the development of three-dimensional underwater acoustic sensor networks (3D UASNs) provides a novel strategy to acquire marine environment information conveniently, efficiently and accurately. However, the specific propagation effects of acoustic communication channel lead to decreased successful information delivery probability with increased distance. Therefore, we investigate two probabilistic neighborhood-based data collection algorithms for 3D UASNs which are based on a probabilistic acoustic communication model instead of the traditional deterministic acoustic communication model. An autonomous underwater vehicle (AUV) is employed to traverse along the designed path to collect data from neighborhoods. For 3D UASNs without prior deployment knowledge, partitioning the network into grids can allow the AUV to visit the central location of each grid for data collection. For 3D UASNs in which the deployment knowledge is known in advance, the AUV only needs to visit several selected locations by constructing a minimum probabilistic neighborhood covering set to reduce data latency. Otherwise, by increasing the transmission rounds, our proposed algorithms can provide a tradeoff between data collection latency and information gain. These algorithms are compared with basic Nearest-neighbor Heuristic algorithm via simulations. Simulation analyses show that our proposed algorithms can efficiently reduce the average data collection completion time, corresponding to a decrease of data latency. PMID:28208735

  15. Probabilistic Neighborhood-Based Data Collection Algorithms for 3D Underwater Acoustic Sensor Networks.

    PubMed

    Han, Guangjie; Li, Shanshan; Zhu, Chunsheng; Jiang, Jinfang; Zhang, Wenbo

    2017-02-08

    Marine environmental monitoring provides crucial information and support for the exploitation, utilization, and protection of marine resources. With the rapid development of information technology, the development of three-dimensional underwater acoustic sensor networks (3D UASNs) provides a novel strategy to acquire marine environment information conveniently, efficiently and accurately. However, the specific propagation effects of acoustic communication channel lead to decreased successful information delivery probability with increased distance. Therefore, we investigate two probabilistic neighborhood-based data collection algorithms for 3D UASNs which are based on a probabilistic acoustic communication model instead of the traditional deterministic acoustic communication model. An autonomous underwater vehicle (AUV) is employed to traverse along the designed path to collect data from neighborhoods. For 3D UASNs without prior deployment knowledge, partitioning the network into grids can allow the AUV to visit the central location of each grid for data collection. For 3D UASNs in which the deployment knowledge is known in advance, the AUV only needs to visit several selected locations by constructing a minimum probabilistic neighborhood covering set to reduce data latency. Otherwise, by increasing the transmission rounds, our proposed algorithms can provide a tradeoff between data collection latency and information gain. These algorithms are compared with basic Nearest-neighbor Heuristic algorithm via simulations. Simulation analyses show that our proposed algorithms can efficiently reduce the average data collection completion time, corresponding to a decrease of data latency.

  16. Automated identification of Monogeneans using digital image processing and K-nearest neighbour approaches.

    PubMed

    Yousef Kalafi, Elham; Tan, Wooi Boon; Town, Christopher; Dhillon, Sarinder Kaur

    2016-12-22

    Monogeneans are flatworms (Platyhelminthes) that are primarily found on gills and skin of fishes. Monogenean parasites have attachment appendages at their haptoral regions that help them to move about the body surface and feed on skin and gill debris. Haptoral attachment organs consist of sclerotized hard parts such as hooks, anchors and marginal hooks. Monogenean species are differentiated based on their haptoral bars, anchors, marginal hooks, reproductive parts' (male and female copulatory organs) morphological characters and soft anatomical parts. The complex structure of these diagnostic organs and also their overlapping in microscopic digital images are impediments for developing fully automated identification system for monogeneans (LNCS 7666:256-263, 2012), (ISDA; 457-462, 2011), (J Zoolog Syst Evol Res 52(2): 95-99. 2013;). In this study images of hard parts of the haptoral organs such as bars and anchors are used to develop a fully automated identification technique for monogenean species identification by implementing image processing techniques and machine learning methods. Images of four monogenean species namely Sinodiplectanotrema malayanus, Trianchoratus pahangensis, Metahaliotrema mizellei and Metahaliotrema sp. (undescribed) were used to develop an automated technique for identification. K-nearest neighbour (KNN) was applied to classify the monogenean specimens based on the extracted features. 50% of the dataset was used for training and the other 50% was used as testing for system evaluation. Our approach demonstrated overall classification accuracy of 90%. In this study Leave One Out (LOO) cross validation is used for validation of our system and the accuracy is 91.25%. The methods presented in this study facilitate fast and accurate fully automated classification of monogeneans at the species level. In future studies more classes will be included in the model, the time to capture the monogenean images will be reduced and improvements in

  17. Ground State of Quasi-One Dimensional Competing Spin Chain Cs2Cu2Mo3O12 at zero and Finite Fields

    NASA Astrophysics Data System (ADS)

    Matsui, Kazuki; Goto, Takayuki; Angel, Julia; Watanabe, Isao; Sasaki, Takahiko; Hase, Masashi

    The ground state of competing-spin-chain Cs2Cu2Mo3O12 with the ferromagnetic exchange interaction J1 = -93 K on nearest-neighboring spins and the antiferromagnetic one J2 = +33 K on next-nearest-neighboring spins was investigated by ZF/LF-μSR and 133Cs-NMR in the 3He temperature range. The zero-field μSR relaxation rate λ shows a significant increase below 1.85 K, suggesting the existence of magnetic order, which is consistent with the recent report on the specific heat. However, LF decoupling data at the lowest temperature 0.3 K indicate that the spins fluctuate dynamically, suggesting that the system is in a quasi-static ordered state under zero field. This idea is further supported by the fact that the broadening in NMR spectra below TN is weakened at low field below 2 T.

  18. Ensemble Clustering Classification compete SVM and One-Class classifiers applied on plant microRNAs Data.

    PubMed

    Yousef, Malik; Khalifa, Waleed; AbedAllah, Loai

    2016-12-22

    The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that ECkNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.

  19. An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China

    PubMed Central

    Zou, Hui; Zou, Zhihong; Wang, Xiaojing

    2015-01-01

    The increase and the complexity of data caused by the uncertain environment is today’s reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006–2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality. PMID:26569283

  20. Constructing a logical, regular axis topology from an irregular topology

    DOEpatents

    Faraj, Daniel A.

    2014-07-22

    Constructing a logical regular topology from an irregular topology including, for each axial dimension and recursively, for each compute node in a subcommunicator until returning to a first node: adding to a logical line of the axial dimension a neighbor specified in a nearest neighbor list; calling the added compute node; determining, by the called node, whether any neighbor in the node's nearest neighbor list is available to add to the logical line; if a neighbor in the called compute node's nearest neighbor list is available to add to the logical line, adding, by the called compute node to the logical line, any neighbor in the called compute node's nearest neighbor list for the axial dimension not already added to the logical line; and, if no neighbor in the called compute node's nearest neighbor list is available to add to the logical line, returning to the calling compute node.

  1. Constructing a logical, regular axis topology from an irregular topology

    DOEpatents

    Faraj, Daniel A.

    2014-07-01

    Constructing a logical regular topology from an irregular topology including, for each axial dimension and recursively, for each compute node in a subcommunicator until returning to a first node: adding to a logical line of the axial dimension a neighbor specified in a nearest neighbor list; calling the added compute node; determining, by the called node, whether any neighbor in the node's nearest neighbor list is available to add to the logical line; if a neighbor in the called compute node's nearest neighbor list is available to add to the logical line, adding, by the called compute node to the logical line, any neighbor in the called compute node's nearest neighbor list for the axial dimension not already added to the logical line; and, if no neighbor in the called compute node's nearest neighbor list is available to add to the logical line, returning to the calling compute node.

  2. A nearest-neighbour discretisation of the regularized stokeslet boundary integral equation

    NASA Astrophysics Data System (ADS)

    Smith, David J.

    2018-04-01

    The method of regularized stokeslets is extensively used in biological fluid dynamics due to its conceptual simplicity and meshlessness. This simplicity carries a degree of cost in computational expense and accuracy because the number of degrees of freedom used to discretise the unknown surface traction is generally significantly higher than that required by boundary element methods. We describe a meshless method based on nearest-neighbour interpolation that significantly reduces the number of degrees of freedom required to discretise the unknown traction, increasing the range of problems that can be practically solved, without excessively complicating the task of the modeller. The nearest-neighbour technique is tested against the classical problem of rigid body motion of a sphere immersed in very viscous fluid, then applied to the more complex biophysical problem of calculating the rotational diffusion timescales of a macromolecular structure modelled by three closely-spaced non-slender rods. A heuristic for finding the required density of force and quadrature points by numerical refinement is suggested. Matlab/GNU Octave code for the key steps of the algorithm is provided, which predominantly use basic linear algebra operations, with a full implementation being provided on github. Compared with the standard Nyström discretisation, more accurate and substantially more efficient results can be obtained by de-refining the force discretisation relative to the quadrature discretisation: a cost reduction of over 10 times with improved accuracy is observed. This improvement comes at minimal additional technical complexity. Future avenues to develop the algorithm are then discussed.

  3. Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition.

    PubMed

    Shen, Hong-Bin; Chou, Kuo-Chen

    2005-11-25

    The nucleus is the brain of eukaryotic cells that guides the life processes of the cell by issuing key instructions. For in-depth understanding of the biochemical process of the nucleus, the knowledge of localization of nuclear proteins is very important. With the avalanche of protein sequences generated in the post-genomic era, it is highly desired to develop an automated method for fast annotating the subnuclear locations for numerous newly found nuclear protein sequences so as to be able to timely utilize them for basic research and drug discovery. In view of this, a novel approach is developed for predicting the protein subnuclear location. It is featured by introducing a powerful classifier, the optimized evidence-theoretic K-nearest classifier, and using the pseudo amino acid composition [K.C. Chou, PROTEINS: Structure, Function, and Genetics, 43 (2001) 246], which can incorporate a considerable amount of sequence-order effects, to represent protein samples. As a demonstration, identifications were performed for 370 nuclear proteins among the following 9 subnuclear locations: (1) Cajal body, (2) chromatin, (3) heterochromatin, (4) nuclear diffuse, (5) nuclear pore, (6) nuclear speckle, (7) nucleolus, (8) PcG body, and (9) PML body. The overall success rates thus obtained by both the re-substitution test and jackknife cross-validation test are significantly higher than those by existing classifiers on the same working dataset. It is anticipated that the powerful approach may also become a useful high throughput vehicle to bridge the huge gap occurring in the post-genomic era between the number of gene sequences in databases and the number of gene products that have been functionally characterized. The OET-KNN classifier will be available at www.pami.sjtu.edu.cn/people/hbshen.

  4. Noise reduction and image enhancement using a hardware implementation of artificial neural networks

    NASA Astrophysics Data System (ADS)

    David, Robert; Williams, Erin; de Tremiolles, Ghislain; Tannhof, Pascal

    1999-03-01

    In this paper, we present a neural based solution developed for noise reduction and image enhancement using the ZISC, an IBM hardware processor which implements the Restricted Coulomb Energy algorithm and the K-Nearest Neighbor algorithm. Artificial neural networks present the advantages of processing time reduction in comparison with classical models, adaptability, and the weighted property of pattern learning. The goal of the developed application is image enhancement in order to restore old movies (noise reduction, focus correction, etc.), to improve digital television images, or to treat images which require adaptive processing (medical images, spatial images, special effects, etc.). Image results show a quantitative improvement over the noisy image as well as the efficiency of this system. Further enhancements are being examined to improve the output of the system.

  5. Live neighbor-joining.

    PubMed

    Telles, Guilherme P; Araújo, Graziela S; Walter, Maria E M T; Brigido, Marcelo M; Almeida, Nalvo F

    2018-05-16

    In phylogenetic reconstruction the result is a tree where all taxa are leaves and internal nodes are hypothetical ancestors. In a live phylogeny, both ancestral and living taxa may coexist, leading to a tree where internal nodes may be living taxa. The well-known Neighbor-Joining heuristic is largely used for phylogenetic reconstruction. We present Live Neighbor-Joining, a heuristic for building a live phylogeny. We have investigated Live Neighbor-Joining on datasets of viral genomes, a plausible scenario for its application, which allowed the construction of alternative hypothesis for the relationships among virus that embrace both ancestral and descending taxa. We also applied Live Neighbor-Joining on a set of bacterial genomes and to sets of images and texts. Non-biological data may be better explored visually when their relationship in terms of content similarity is represented by means of a phylogeny. Our experiments have shown interesting alternative phylogenetic hypothesis for RNA virus genomes, bacterial genomes and alternative relationships among images and texts, illustrating a wide range of scenarios where Live Neighbor-Joining may be used.

  6. An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.

    PubMed

    Nidheesh, N; Abdul Nazeer, K A; Ameer, P M

    2017-12-01

    Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Design of Quantum Algorithms Using Physics Tools

    DTIC Science & Technology

    2014-06-02

    invariant spin-1 chain that has a unique highly entangled ground state and exhibits some signatures of critical behavior. The entanglement entropy of one... entangled and found them hard to approximate using the MPS method. In follow on work Shor along with Sergey Bravyi, Libor Caha, Movassagh and Nagaj...They asked how entangled the ground state of a FF quantum spin-s chain with nearest-neighbor interactions can be for small values of s. While FF spin-1

  8. Charge exchange between two nearest neighbour ions immersed in a dense plasma

    NASA Astrophysics Data System (ADS)

    Sauvan, P.; Angelo, P.; Derfoul, H.; Leboucher-Dalimier, E.; Devdariani, A.; Calisti, A.; Talin, B.

    1999-04-01

    In dense plasmas the quasimolecular model is relevant to describe the radiative properties: two nearest neighbor ions remain close to each other during a time scale of the order of the emission time. Within the frame of a quasistatic approach it has been shown that hydrogen-like spectral line shapes can exhibit satellite-like features. In this work we present the effect on the line shapes of the dynamical collision between the two ions exchanging transiently their bound electron. This model is suitable for the description of the core, the wings and the red satellite-like features. It is post-processed to the self consistent code (IDEFIX) giving the adiabatic transition energies and the oscillator strengths for the transient molecule immersed in a dense free electron bath. It is shown that the positions of the satellites are insensitive to the dynamics of the ion-ion collision. Results for fluorine Lyβ are presented.

  9. Analysis of the Seismicity Preceding Large Earthquakes

    NASA Astrophysics Data System (ADS)

    Stallone, A.; Marzocchi, W.

    2016-12-01

    The most common earthquake forecasting models assume that the magnitude of the next earthquake is independent from the past. This feature is probably one of the most severe limitations of the capability to forecast large earthquakes.In this work, we investigate empirically on this specific aspect, exploring whether spatial-temporal variations in seismicity encode some information on the magnitude of the future earthquakes. For this purpose, and to verify the universality of the findings, we consider seismic catalogs covering quite different space-time-magnitude windows, such as the Alto Tiberina Near Fault Observatory (TABOO) catalogue, and the California and Japanese seismic catalog. Our method is inspired by the statistical methodology proposed by Zaliapin (2013) to distinguish triggered and background earthquakes, using the nearest-neighbor clustering analysis in a two-dimension plan defined by rescaled time and space. In particular, we generalize the metric based on the nearest-neighbor to a metric based on the k-nearest-neighbors clustering analysis that allows us to consider the overall space-time-magnitude distribution of k-earthquakes (k-foreshocks) which anticipate one target event (the mainshock); then we analyze the statistical properties of the clusters identified in this rescaled space. In essence, the main goal of this study is to verify if different classes of mainshock magnitudes are characterized by distinctive k-foreshocks distribution. The final step is to show how the findings of this work may (or not) improve the skill of existing earthquake forecasting models.

  10. Material identification based on electrostatic sensing technology

    NASA Astrophysics Data System (ADS)

    Liu, Kai; Chen, Xi; Li, Jingnan

    2018-04-01

    When the robot travels on the surface of different media, the uncertainty of the medium will seriously affect the autonomous action of the robot. In this paper, the distribution characteristics of multiple electrostatic charges on the surface of materials are detected, so as to improve the accuracy of the existing electrostatic signal material identification methods, which is of great significance to help the robot optimize the control algorithm. In this paper, based on the electrostatic signal material identification method proposed by predecessors, the multi-channel detection circuit is used to obtain the electrostatic charge distribution at different positions of the material surface, the weights are introduced into the eigenvalue matrix, and the weight distribution is optimized by the evolutionary algorithm, which makes the eigenvalue matrix more accurately reflect the surface charge distribution characteristics of the material. The matrix is used as the input of the k-Nearest Neighbor (kNN)classification algorithm to classify the dielectric materials. The experimental results show that the proposed method can significantly improve the recognition rate of the existing electrostatic signal material recognition methods.

  11. Unsupervised Indoor Localization Based on Smartphone Sensors, iBeacon and Wi-Fi.

    PubMed

    Chen, Jing; Zhang, Yi; Xue, Wei

    2018-04-28

    In this paper, we propose UILoc, an unsupervised indoor localization scheme that uses a combination of smartphone sensors, iBeacons and Wi-Fi fingerprints for reliable and accurate indoor localization with zero labor cost. Firstly, compared with the fingerprint-based method, the UILoc system can build a fingerprint database automatically without any site survey and the database will be applied in the fingerprint localization algorithm. Secondly, since the initial position is vital to the system, UILoc will provide the basic location estimation through the pedestrian dead reckoning (PDR) method. To provide accurate initial localization, this paper proposes an initial localization module, a weighted fusion algorithm combined with a k-nearest neighbors (KNN) algorithm and a least squares algorithm. In UILoc, we have also designed a reliable model to reduce the landmark correction error. Experimental results show that the UILoc can provide accurate positioning, the average localization error is about 1.1 m in the steady state, and the maximum error is 2.77 m.

  12. Classification of acoustic emission signals using wavelets and Random Forests : Application to localized corrosion

    NASA Astrophysics Data System (ADS)

    Morizet, N.; Godin, N.; Tang, J.; Maillet, E.; Fregonese, M.; Normand, B.

    2016-03-01

    This paper aims to propose a novel approach to classify acoustic emission (AE) signals deriving from corrosion experiments, even if embedded into a noisy environment. To validate this new methodology, synthetic data are first used throughout an in-depth analysis, comparing Random Forests (RF) to the k-Nearest Neighbor (k-NN) algorithm. Moreover, a new evaluation tool called the alter-class matrix (ACM) is introduced to simulate different degrees of uncertainty on labeled data for supervised classification. Then, tests on real cases involving noise and crevice corrosion are conducted, by preprocessing the waveforms including wavelet denoising and extracting a rich set of features as input of the RF algorithm. To this end, a software called RF-CAM has been developed. Results show that this approach is very efficient on ground truth data and is also very promising on real data, especially for its reliability, performance and speed, which are serious criteria for the chemical industry.

  13. Performance Analysis of Combined Methods of Genetic Algorithm and K-Means Clustering in Determining the Value of Centroid

    NASA Astrophysics Data System (ADS)

    Adya Zizwan, Putra; Zarlis, Muhammad; Budhiarti Nababan, Erna

    2017-12-01

    The determination of Centroid on K-Means Algorithm directly affects the quality of the clustering results. Determination of centroid by using random numbers has many weaknesses. The GenClust algorithm that combines the use of Genetic Algorithms and K-Means uses a genetic algorithm to determine the centroid of each cluster. The use of the GenClust algorithm uses 50% chromosomes obtained through deterministic calculations and 50% is obtained from the generation of random numbers. This study will modify the use of the GenClust algorithm in which the chromosomes used are 100% obtained through deterministic calculations. The results of this study resulted in performance comparisons expressed in Mean Square Error influenced by centroid determination on K-Means method by using GenClust method, modified GenClust method and also classic K-Means.

  14. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Gray, A.

    2014-04-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex

  15. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Astronomy Data Centre, Canadian

    2014-01-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.

  16. Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.

    PubMed

    Kruppa, Jochen; Liu, Yufeng; Biau, Gérard; Kohler, Michael; König, Inke R; Malley, James D; Ziegler, Andreas

    2014-07-01

    Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Rb-NMR study of the quasi-one-dimensional competing spin-chain compound R b2C u2M o3O12

    NASA Astrophysics Data System (ADS)

    Matsui, Kazuki; Yagi, Ayato; Hoshino, Yukihiro; Atarashi, Sochiro; Hase, Masashi; Sasaki, Takahiko; Goto, Takayuki

    2017-12-01

    A Rb-NMR study has been performed on the quasi-one-dimensional competing spin chain R b2C u2M o3O12 with ferromagnetic and antiferromagnetic exchange interactions on nearest-neighboring and next-nearest neighboring spins, respectively. The system changes from a gapped ground state at zero field to a gapless state at HC≃2 T , where the existence of magnetic order below 1 K was demonstrated by a broadening of the NMR spectrum, associated with a critical divergence of 1 /T1 . In the higher-temperature region, T1-1 showed a power-law-type temperature dependence, from which the field dependence of the Luttinger parameter K was obtained and compared with theoretical calculations based on the spin nematic Tomonaga-Luttinger liquid (TLL) state.

  18. Norrie disease and MAO genes: nearest neighbors.

    PubMed

    Chen, Z Y; Denney, R M; Breakefield, X O

    1995-01-01

    The Norrie disease and MAO genes are tandemly arranged in the p11.4-p11.3 region of the human X chromosome in the order tel-MAOA-MAOB-NDP-cent. This relationship is conserved in the mouse in the order tel-MAOB-MAOA-NDP-cent. The MAO genes appear to have arisen by tandem duplication of an ancestral MAO gene, but their positional relationship to NDP appears to be random. Distinctive X-linked syndromes have been described for mutations in the MAOA and NDP genes, and in addition, individuals have been identified with contiguous gene syndromes due to chromosomal deletions which encompass two or three of these genes. Loss of function of the NDP gene causes a syndrome of congenital blindness and progressive hearing loss, sometimes accompanied by signs of CNS dysfunction, including variable mental retardation and psychiatric symptoms. Other mutations in the NDP gene have been found to underlie another X-linked eye disease, exudative vitreo-retinopathy. An MAOA deficiency state has been described in one family to date, with features of altered amine and amine metabolite levels, low normal intelligence, apparent difficulty in impulse control and cardiovascular difficulty in affected males. A contiguous gene syndrome in which all three genes are lacking, as well as other as yet unidentified flanking genes, results in severe mental retardation, small stature, seizures and congenital blindness, as well as altered amine and amine metabolites. Issues that remain to be resolved are the function of the NDP gene product, the frequency and phenotype of the MAOA deficiency state, and the possible occurrence and phenotype of an MAOB deficiency state.

  19. Quantum Correlation in the XY Spin Model with Anisotropic Three-Site Interaction

    NASA Astrophysics Data System (ADS)

    Wang, Yao; Chai, Bing-Bing; Guo, Jin-Liang

    2018-05-01

    We investigate pairwise entanglement and quantum discord (QD) in the XY spin model with anisotropic three-site interaction at zero and finite temperatures. For both the nearest-neighbor spins and the next nearest-neighbor spins, special attention is paid to the dependence of entanglement and QD on the anisotropic parameter δ induced by the next nearest-neighbor spins. We show that the behavior of QD differs in many ways from entanglement under the influences of the anisotropic three-site interaction at finite temperatures. More important, comparing the effects of δ on the entanglement and QD, we find the anisotropic three-site interaction plays an important role in the quantum correlations at zero and finite temperatures. It is found that δ can strengthen the quantum correlation for both the nearest-neighbor spins and the next nearest-neighbor spins, especially for the nearest-neighbor spins at low temperature.

  20. Performing a scatterv operation on a hierarchical tree network optimized for collective operations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D

    Performing a scatterv operation on a hierarchical tree network optimized for collective operations including receiving, by the scatterv module installed on the node, from a nearest neighbor parent above the node a chunk of data having at least a portion of data for the node; maintaining, by the scatterv module installed on the node, the portion of the data for the node; determining, by the scatterv module installed on the node, whether any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child; andmore » sending, by the scatterv module installed on the node, those portions of data to the nearest neighbor child if any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child.« less

  1. Second harmonic generation microscopy analysis of extracellular matrix changes in human idiopathic pulmonary fibrosis

    PubMed Central

    Tilbury, Karissa; Hocker, James; Wen, Bruce L.; Sandbo, Nathan; Singh, Vikas; Campagnola, Paul J.

    2014-01-01

    Abstract. Patients with idiopathic fibrosis (IPF) have poor long-term survival as there are limited diagnostic/prognostic tools or successful therapies. Remodeling of the extracellular matrix (ECM) has been implicated in IPF progression; however, the structural consequences on the collagen architecture have not received considerable attention. Here, we demonstrate that second harmonic generation (SHG) and multiphoton fluorescence microscopy can quantitatively differentiate normal and IPF human tissues. For SHG analysis, we developed a classifier based on wavelet transforms, principle component analysis, and a K-nearest-neighbor algorithm to classify the specific alterations of the collagen structure observed in IPF tissues. The resulting ROC curves obtained by varying the numbers of principal components and nearest neighbors yielded accuracies of >95%. In contrast, simpler metrics based on SHG intensity and collagen coverage in the image provided little or no discrimination. We also characterized the change in the elastin/collagen balance by simultaneously measuring the elastin autofluorescence and SHG intensities and found that the IPF tissues were less elastic relative to collagen. This is consistent with known mechanical consequences of the disease. Understanding ECM remodeling in IPF via nonlinear optical microscopy may enhance our ability to differentiate patients with rapid and slow progression and, thus, provide better prognostic information. PMID:25134793

  2. k-Nearest neighbour local linear prediction of scalp EEG activity during intermittent photic stimulation.

    PubMed

    Erla, Silvia; Faes, Luca; Tranquillini, Enzo; Orrico, Daniele; Nollo, Giandomenico

    2011-05-01

    The characterization of the EEG response to photic stimulation (PS) is an important issue with significant clinical relevance. This study aims to quantify and map the complexity of the EEG during PS, where complexity is measured as the degree of unpredictability resulting from local linear prediction. EEG activity was recorded with eyes closed (EC) and eyes open (EO) during resting and PS at 5, 10, and 15 Hz in a group of 30 healthy subjects and in a case-report of a patient suffering from cerebral ischemia. The mean squared prediction error (MSPE) resulting from k-nearest neighbour local linear prediction was calculated in each condition as an index of EEG unpredictability. The linear or nonlinear nature of the system underlying EEG activity was evaluated quantifying MSPE as a function of the neighbourhood size during local linear prediction, and by surrogate data analysis as well. Unpredictability maps were obtained for each subject interpolating MSPE values over a schematic head representation. Results on healthy subjects evidenced: (i) the prevalence of linear mechanisms in the generation of EEG dynamics, (ii) the lower predictability of EO EEG, (iii) the desynchronization of oscillatory mechanisms during PS leading to increased EEG complexity, (iv) the entrainment of alpha rhythm during EC obtained by 10 Hz PS, and (v) differences of EEG predictability among different scalp regions. Ischemic patient showed different MSPE values in healthy and damaged regions. The EEG predictability decreased moving from the early acute stage to a stage of partial recovery. These results suggest that nonlinear prediction can be a useful tool to characterize EEG dynamics during PS protocols, and may consequently constitute a complement of quantitative EEG analysis in clinical applications. Copyright © 2010 IPEM. Published by Elsevier Ltd. All rights reserved.

  3. Analytical approach for collective diffusion: One-dimensional lattice with the nearest neighbor and the next nearest neighbor lateral interactions

    NASA Astrophysics Data System (ADS)

    Tarasenko, Alexander

    2018-01-01

    Diffusion of particles adsorbed on a homogeneous one-dimensional lattice is investigated using a theoretical approach and MC simulations. The analytical dependencies calculated in the framework of approach are tested using the numerical data. The perfect coincidence of the data obtained by these different methods demonstrates that the correctness of the approach based on the theory of the non-equilibrium statistical operator.

  4. Maximal Neighbor Similarity Reveals Real Communities in Networks

    PubMed Central

    Žalik, Krista Rizman

    2015-01-01

    An important problem in the analysis of network data is the detection of groups of densely interconnected nodes also called modules or communities. Community structure reveals functions and organizations of networks. Currently used algorithms for community detection in large-scale real-world networks are computationally expensive or require a priori information such as the number or sizes of communities or are not able to give the same resulting partition in multiple runs. In this paper we investigate a simple and fast algorithm that uses the network structure alone and requires neither optimization of pre-defined objective function nor information about number of communities. We propose a bottom up community detection algorithm in which starting from communities consisting of adjacent pairs of nodes and their maximal similar neighbors we find real communities. We show that the overall advantage of the proposed algorithm compared to the other community detection algorithms is its simple nature, low computational cost and its very high accuracy in detection communities of different sizes also in networks with blurred modularity structure consisting of poorly separated communities. All communities identified by the proposed method for facebook network and E-Coli transcriptional regulatory network have strong structural and functional coherence. PMID:26680448

  5. δ-Generalized Labeled Multi-Bernoulli Filter Using Amplitude Information of Neighboring Cells

    PubMed Central

    Liu, Chao; Lei, Peng; Qi, Yaolong

    2018-01-01

    The amplitude information (AI) of echoed signals plays an important role in radar target detection and tracking. A lot of research shows that the introduction of AI enables the tracking algorithm to distinguish targets from clutter better and then improves the performance of data association. The current AI-aided tracking algorithms only consider the signal amplitude in the range-azimuth cell where measurement exists. However, since radar echoes always contain backscattered signals from multiple cells, the useful information of neighboring cells would be lost if directly applying those existing methods. In order to solve this issue, a new δ-generalized labeled multi-Bernoulli (δ-GLMB) filter is proposed. It exploits the AI of radar echoes from neighboring cells to construct a united amplitude likelihood ratio, and then plugs it into the update process and the measurement-track assignment cost matrix of the δ-GLMB filter. Simulation results show that the proposed approach has better performance in target’s state and number estimation than that of the δ-GLMB only using single-cell AI in low signal-to-clutter-ratio (SCR) environment. PMID:29642595

  6. A diabetic retinopathy detection method using an improved pillar K-means algorithm.

    PubMed

    Gogula, Susmitha Valli; Divakar, Ch; Satyanarayana, Ch; Rao, Allam Appa

    2014-01-01

    The paper presents a new approach for medical image segmentation. Exudates are a visible sign of diabetic retinopathy that is the major reason of vision loss in patients with diabetes. If the exudates extend into the macular area, blindness may occur. Automated detection of exudates will assist ophthalmologists in early diagnosis. This segmentation process includes a new mechanism for clustering the elements of high-resolution images in order to improve precision and reduce computation time. The system applies K-means clustering to the image segmentation after getting optimized by Pillar algorithm; pillars are constructed in such a way that they can withstand the pressure. Improved pillar algorithm can optimize the K-means clustering for image segmentation in aspects of precision and computation time. This evaluates the proposed approach for image segmentation by comparing with Kmeans and Fuzzy C-means in a medical image. Using this method, identification of dark spot in the retina becomes easier and the proposed algorithm is applied on diabetic retinal images of all stages to identify hard and soft exudates, where the existing pillar K-means is more appropriate for brain MRI images. This proposed system help the doctors to identify the problem in the early stage and can suggest a better drug for preventing further retinal damage.

  7. Analysis of the seismicity preceding large earthquakes

    NASA Astrophysics Data System (ADS)

    Stallone, Angela; Marzocchi, Warner

    2017-04-01

    The most common earthquake forecasting models assume that the magnitude of the next earthquake is independent from the past. This feature is probably one of the most severe limitations of the capability to forecast large earthquakes. In this work, we investigate empirically on this specific aspect, exploring whether variations in seismicity in the space-time-magnitude domain encode some information on the size of the future earthquakes. For this purpose, and to verify the stability of the findings, we consider seismic catalogs covering quite different space-time-magnitude windows, such as the Alto Tiberina Near Fault Observatory (TABOO) catalogue, the California and Japanese seismic catalog. Our method is inspired by the statistical methodology proposed by Baiesi & Paczuski (2004) and elaborated by Zaliapin et al. (2008) to distinguish between triggered and background earthquakes, based on a pairwise nearest-neighbor metric defined by properly rescaled temporal and spatial distances. We generalize the method to a metric based on the k-nearest-neighbors that allows us to consider the overall space-time-magnitude distribution of k-earthquakes, which are the strongly correlated ancestors of a target event. Finally, we analyze the statistical properties of the clusters composed by the target event and its k-nearest-neighbors. In essence, the main goal of this study is to verify if different classes of target event magnitudes are characterized by distinctive "k-foreshocks" distributions. The final step is to show how the findings of this work may (or not) improve the skill of existing earthquake forecasting models.

  8. The influence of further-neighbor spin-spin interaction on a ground state of 2D coupled spin-electron model in a magnetic field

    NASA Astrophysics Data System (ADS)

    Čenčariková, Hana; Strečka, Jozef; Gendiar, Andrej; Tomašovičová, Natália

    2018-05-01

    An exhaustive ground-state analysis of extended two-dimensional (2D) correlated spin-electron model consisting of the Ising spins localized on nodal lattice sites and mobile electrons delocalized over pairs of decorating sites is performed within the framework of rigorous analytical calculations. The investigated model, defined on an arbitrary 2D doubly decorated lattice, takes into account the kinetic energy of mobile electrons, the nearest-neighbor Ising coupling between the localized spins and mobile electrons, the further-neighbor Ising coupling between the localized spins and the Zeeman energy. The ground-state phase diagrams are examined for a wide range of model parameters for both ferromagnetic as well as antiferromagnetic interaction between the nodal Ising spins and non-zero value of external magnetic field. It is found that non-zero values of further-neighbor interaction leads to a formation of new quantum states as a consequence of competition between all considered interaction terms. Moreover, the new quantum states are accompanied with different magnetic features and thus, several kinds of field-driven phase transitions are observed.

  9. High-temperature dynamic behavior in bulk liquid water: A molecular dynamics simulation study using the OPC and TIP4P-Ew potentials

    NASA Astrophysics Data System (ADS)

    Gabrieli, Andrea; Sant, Marco; Izadi, Saeed; Shabane, Parviz Seifpanahi; Onufriev, Alexey V.; Suffritti, Giuseppe B.

    2018-02-01

    Classical molecular dynamics simulations were performed to study the high-temperature (above 300 K) dynamic behavior of bulk water, specifically the behavior of the diffusion coefficient, hydrogen bond, and nearest-neighbor lifetimes. Two water potentials were compared: the recently proposed "globally optimal" point charge (OPC) model and the well-known TIP4P-Ew model. By considering the Arrhenius plots of the computed inverse diffusion coefficient and rotational relaxation constants, a crossover from Vogel-Fulcher-Tammann behavior to a linear trend with increasing temperature was detected at T* ≈ 309 and T* ≈ 285 K for the OPC and TIP4P-Ew models, respectively. Experimentally, the crossover point was previously observed at T* ± 315-5 K. We also verified that for the coefficient of thermal expansion α P ( T, P), the isobaric α P ( T) curves cross at about the same T* as in the experiment. The lifetimes of water hydrogen bonds and of the nearest neighbors were evaluated and were found to cross near T*, where the lifetimes are about 1 ps. For T < T*, hydrogen bonds persist longer than nearest neighbors, suggesting that the hydrogen bonding network dominates the water structure at T < T*, whereas for T > T*, water behaves more like a simple liquid. The fact that T* falls within the biologically relevant temperature range is a strong motivation for further analysis of the phenomenon and its possible consequences for biomolecular systems.

  10. A spin transfer torque magnetoresistance random access memory-based high-density and ultralow-power associative memory for fully data-adaptive nearest neighbor search with current-mode similarity evaluation and time-domain minimum searching

    NASA Astrophysics Data System (ADS)

    Ma, Yitao; Miura, Sadahiko; Honjo, Hiroaki; Ikeda, Shoji; Hanyu, Takahiro; Ohno, Hideo; Endoh, Tetsuo

    2017-04-01

    A high-density nonvolatile associative memory (NV-AM) based on spin transfer torque magnetoresistive random access memory (STT-MRAM), which achieves highly concurrent and ultralow-power nearest neighbor search with full adaptivity of the template data format, has been proposed and fabricated using the 90 nm CMOS/70 nm perpendicular-magnetic-tunnel-junction hybrid process. A truly compact current-mode circuitry is developed to realize flexibly controllable and high-parallel similarity evaluation, which makes the NV-AM adaptable to any dimensionality and component-bit of template data. A compact dual-stage time-domain minimum searching circuit is also developed, which can freely extend the system for more template data by connecting multiple NM-AM cores without additional circuits for integrated processing. Both the embedded STT-MRAM module and the computing circuit modules in this NV-AM chip are synchronously power-gated to completely eliminate standby power and maximally reduce operation power by only activating the currently accessed circuit blocks. The operations of a prototype chip at 40 MHz are demonstrated by measurement. The average operation power is only 130 µW, and the circuit density is less than 11 µm2/bit. Compared with the latest conventional works in both volatile and nonvolatile approaches, more than 31.3% circuit area reductions and 99.2% power improvements are achieved, respectively. Further power performance analyses are discussed, which verify the special superiority of the proposed NV-AM in low-power and large-memory-based VLSIs.

  11. Incommensurate phase of a triangular frustrated Heisenberg model studied via Schwinger-boson mean-field theory

    NASA Astrophysics Data System (ADS)

    Li, Peng; Su, Haibin; Dong, Hui-Ning; Shen, Shun-Qing

    2009-08-01

    We study a triangular frustrated antiferromagnetic Heisenberg model with nearest-neighbor interactions J1 and third-nearest-neighbor interactions J3 by means of Schwinger-boson mean-field theory. By setting an antiferromagnetic J3 and varying J1 from positive to negative values, we disclose the low-temperature features of its interesting incommensurate phase. The gapless dispersion of quasiparticles leads to the intrinsic T2 law of specific heat. The magnetic susceptibility is linear in temperature. The local magnetization is significantly reduced by quantum fluctuations. We address possible relevance of these results to the low-temperature properties of NiGa2S4. From a careful analysis of the incommensurate spin wavevector, the interaction parameters are estimated as J1≈-3.8755 K and J3≈14.0628 K, in order to account for the experimental data.

  12. Unconventional quantum antiferromagnetism with a fourfold symmetry breaking in a spin-1/2 Ising-Heisenberg pentagonal chain

    NASA Astrophysics Data System (ADS)

    Karľová, Katarína; Strečka, Jozef; Lyra, Marcelo L.

    2018-03-01

    The spin-1/2 Ising-Heisenberg pentagonal chain is investigated with use of the star-triangle transformation, which establishes a rigorous mapping equivalence with the effective spin-1/2 Ising zigzag ladder. The investigated model has a rich ground-state phase diagram including two spectacular quantum antiferromagnetic ground states with a fourfold broken symmetry. It is demonstrated that these long-period quantum ground states arise due to a competition between the effective next-nearest-neighbor and nearest-neighbor interactions of the corresponding spin-1/2 Ising zigzag ladder. The concurrence is used to quantify the bipartite entanglement between the nearest-neighbor Heisenberg spin pairs, which are quantum-mechanically entangled in two quantum ground states with or without spontaneously broken symmetry. The pair correlation functions between the nearest-neighbor Heisenberg spins as well as the next-nearest-neighbor and nearest-neighbor Ising spins were investigated with the aim to bring insight into how a relevant short-range order manifests itself at low enough temperatures. It is shown that the specific heat displays temperature dependencies with either one or two separate round maxima.

  13. Neighboring and Urbanism: Commonality versus Friendship.

    ERIC Educational Resources Information Center

    Silverman, Carol J.

    1986-01-01

    Examines a dimension of neighboring that need not assume friendship as the role model. When the model assumes only a sense of connectedness as defining neighboring, then the residential correlation, shown in many studies between urbanism and neighboring, disappears. Theories of neighboring, study variables, methods, and analysis are discussed.…

  14. Stratified estimation of forest area using satellite imagery, inventory data, and the k-nearest neighbors technique

    Treesearch

    Ronald E. McRoberts; Mark D. Nelson; Daniel G. Wendt

    2002-01-01

    For two large study areas in Minnesota, USA, stratified estimation using classified Landsat Thematic Mapper satellite imagery as the basis for stratification was used to estimate forest area. Measurements of forest inventory plots obtained for a 12-month period in 1998 and 1999 were used as the source of data for within-stratum estimates. These measurements further...

  15. A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis.

    PubMed

    Sahan, Seral; Polat, Kemal; Kodaz, Halife; Güneş, Salih

    2007-03-01

    The use of machine learning tools in medical diagnosis is increasing gradually. This is mainly because the effectiveness of classification and recognition systems has improved in a great deal to help medical experts in diagnosing diseases. Such a disease is breast cancer, which is a very common type of cancer among woman. As the incidence of this disease has increased significantly in the recent years, machine learning applications to this problem have also took a great attention as well as medical consideration. This study aims at diagnosing breast cancer with a new hybrid machine learning method. By hybridizing a fuzzy-artificial immune system with k-nearest neighbour algorithm, a method was obtained to solve this diagnosis problem via classifying Wisconsin Breast Cancer Dataset (WBCD). This data set is a very commonly used data set in the literature relating the use of classification systems for breast cancer diagnosis and it was used in this study to compare the classification performance of our proposed method with regard to other studies. We obtained a classification accuracy of 99.14%, which is the highest one reached so far. The classification accuracy was obtained via 10-fold cross validation. This result is for WBCD but it states that this method can be used confidently for other breast cancer diagnosis problems, too.

  16. An ensemble of dissimilarity based classifiers for Mackerel gender determination

    NASA Astrophysics Data System (ADS)

    Blanco, A.; Rodriguez, R.; Martinez-Maranon, I.

    2014-03-01

    Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.

  17. An adaptive bias - hybrid MD/kMC algorithm for protein folding and aggregation.

    PubMed

    Peter, Emanuel K; Shea, Joan-Emma

    2017-07-05

    In this paper, we present a novel hybrid Molecular Dynamics/kinetic Monte Carlo (MD/kMC) algorithm and apply it to protein folding and aggregation in explicit solvent. The new algorithm uses a dynamical definition of biases throughout the MD component of the simulation, normalized in relation to the unbiased forces. The algorithm guarantees sampling of the underlying ensemble in dependency of one average linear coupling factor 〈α〉 τ . We test the validity of the kinetics in simulations of dialanine and compare dihedral transition kinetics with long-time MD-simulations. We find that for low 〈α〉 τ values, kinetics are in good quantitative agreement. In folding simulations of TrpCage and TrpZip4 in explicit solvent, we also find good quantitative agreement with experimental results and prior MD/kMC simulations. Finally, we apply our algorithm to study growth of the Alzheimer Amyloid Aβ 16-22 fibril by monomer addition. We observe two possible binding modes, one at the extremity of the fibril (elongation) and one on the surface of the fibril (lateral growth), on timescales ranging from ns to 8 μs.

  18. Iris Recognition Using Feature Extraction of Box Counting Fractal Dimension

    NASA Astrophysics Data System (ADS)

    Khotimah, C.; Juniati, D.

    2018-01-01

    Biometrics is a science that is now growing rapidly. Iris recognition is a biometric modality which captures a photo of the eye pattern. The markings of the iris are distinctive that it has been proposed to use as a means of identification, instead of fingerprints. Iris recognition was chosen for identification in this research because every human has a special feature that each individual is different and the iris is protected by the cornea so that it will have a fixed shape. This iris recognition consists of three step: pre-processing of data, feature extraction, and feature matching. Hough transformation is used in the process of pre-processing to locate the iris area and Daugman’s rubber sheet model to normalize the iris data set into rectangular blocks. To find the characteristics of the iris, it was used box counting method to get the fractal dimension value of the iris. Tests carried out by used k-fold cross method with k = 5. In each test used 10 different grade K of K-Nearest Neighbor (KNN). The result of iris recognition was obtained with the best accuracy was 92,63 % for K = 3 value on K-Nearest Neighbor (KNN) method.

  19. Nearest Neighbor Classification of Stationary Time Series: An Application to Anesthesia Level Classification by EEG Analysis.

    DTIC Science & Technology

    1980-12-05

    classification procedures that are common in speech processing. The anesthesia level classification by EEG time series population screening problem example is in...formance. The use of the KL number type metric in NN rule classification, in a delete-one subj ect ’s EE-at-a-time KL-NN and KL- kNN classification of the...17 individual labeled EEG sample population using KL-NN and KL- kNN rules. The results obtained are shown in Table 1. The entries in the table indicate

  20. An optimized video system for augmented reality in endodontics: a feasibility study.

    PubMed

    Bruellmann, D D; Tjaden, H; Schwanecke, U; Barth, P

    2013-03-01

    We propose an augmented reality system for the reliable detection of root canals in video sequences based on a k-nearest neighbor color classification and introduce a simple geometric criterion for teeth. The new software was implemented using C++, Qt, and the image processing library OpenCV. Teeth are detected in video images to restrict the segmentation of the root canal orifices by using a k-nearest neighbor algorithm. The location of the root canal orifices were determined using Euclidean distance-based image segmentation. A set of 126 human teeth with known and verified locations of the root canal orifices was used for evaluation. The software detects root canals orifices for automatic classification of the teeth in video images and stores location and size of the found structures. Overall 287 of 305 root canals were correctly detected. The overall sensitivity was about 94 %. Classification accuracy for molars ranged from 65.0 to 81.2 % and from 85.7 to 96.7 % for premolars. The realized software shows that observations made in anatomical studies can be exploited to automate real-time detection of root canal orifices and tooth classification with a software system. Automatic storage of location, size, and orientation of the found structures with this software can be used for future anatomical studies. Thus, statistical tables with canal locations will be derived, which can improve anatomical knowledge of the teeth to alleviate root canal detection in the future. For this purpose the software is freely available at: http://www.dental-imaging.zahnmedizin.uni-mainz.de/.

  1. Structure of Ordinary Ice Ih. Part 1: Ideal Structure of Ice

    DTIC Science & Technology

    1993-10-01

    T., H . Onuki and R. Onaka (1977) Electronic structures of water and ice. Journal of the Physics Society of Japan, 42: 152-158. Shimaoka, K. (1960...nearest neighbors .................................................................................................................. 5 6. H -bond...8 12. Positions of oxygen atoms in the ice % h crystal

  2. Finding reproducible cluster partitions for the k-means algorithm

    PubMed Central

    2013-01-01

    K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions. This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset. PMID:23369085

  3. Finding reproducible cluster partitions for the k-means algorithm.

    PubMed

    Lisboa, Paulo J G; Etchells, Terence A; Jarman, Ian H; Chambers, Simon J

    2013-01-01

    K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions. This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset.

  4. Evaluation of the Jonker-Volgenant-Castanon (JVC) assignment algorithm for track association

    NASA Astrophysics Data System (ADS)

    Malkoff, Donald B.

    1997-07-01

    The Jonker-Volgenant-Castanon (JVC) assignment algorithm was used by Lockheed Martin Advanced Technology Laboratories (ATL) for track association in the Rotorcraft Pilot's Associate (RPA) program. RPA is Army Aviation's largest science and technology program, involving an integrated hardware/software system approach for a next generation helicopter containing advanced sensor equipments and applying artificial intelligence `associate' technologies. ATL is responsible for the multisensor, multitarget, onboard/offboard track fusion. McDonnell Douglas Helicopter Systems is the prime contractor and Lockheed Martin Federal Systems is responsible for developing much of the cognitive decision aiding and controls-and-displays subsystems. RPA is scheduled for flight testing beginning in 1997. RPA is unique in requiring real-time tracking and fusion for large numbers of highly-maneuverable ground (and air) targets in a target-dense environment. It uses diverse sensors and is concerned with a large area of interest. Target class and identification data is tightly integrated with spatial and kinematic data throughout the processing. Because of platform constraints, processing hardware for track fusion was quite limited. No previous experience using JVC in this type environment had been reported. ATL performed extensive testing of the JVC, concentrating on error rates and run- times under a variety of conditions. These included wide ranging numbers and types of targets, sensor uncertainties, target attributes, differing degrees of target maneuverability, and diverse combinations of sensors. Testing utilized Monte Carlo approaches, as well as many kinds of challenging scenarios. Comparisons were made with a nearest-neighbor algorithm and a new, proprietary algorithm (the `Competition' algorithm). The JVC proved to be an excellent choice for the RPA environment, providing a good balance between speed of operation and accuracy of results.

  5. Automatic tissue characterization from ultrasound imagery

    NASA Astrophysics Data System (ADS)

    Kadah, Yasser M.; Farag, Aly A.; Youssef, Abou-Bakr M.; Badawi, Ahmed M.

    1993-08-01

    In this work, feature extraction algorithms are proposed to extract the tissue characterization parameters from liver images. Then the resulting parameter set is further processed to obtain the minimum number of parameters representing the most discriminating pattern space for classification. This preprocessing step was applied to over 120 pathology-investigated cases to obtain the learning data for designing the classifier. The extracted features are divided into independent training and test sets and are used to construct both statistical and neural classifiers. The optimal criteria for these classifiers are set to have minimum error, ease of implementation and learning, and the flexibility for future modifications. Various algorithms for implementing various classification techniques are presented and tested on the data. The best performance was obtained using a single layer tensor model functional link network. Also, the voting k-nearest neighbor classifier provided comparably good diagnostic rates.

  6. Machine learning methods in chemoinformatics

    PubMed Central

    Mitchell, John B O

    2014-01-01

    Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure–activity relationships (QSAR), many others exist in the technical literature. This discussion is methods-based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k-Nearest Neighbors and naïve Bayes classifiers. WIREs Comput Mol Sci 2014, 4:468–481. How to cite this article: WIREs Comput Mol Sci 2014, 4:468–481. doi:10.1002/wcms.1183 PMID:25285160

  7. Next neighbors effect along the Ca-Sr-Ba-åkermanite join: Long-range vs. short-range structural features

    NASA Astrophysics Data System (ADS)

    Dondi, Michele; Ardit, Matteo; Cruciani, Giuseppe

    2013-06-01

    An original approach has been developed herein to explore the correlations between short- and long-range structural properties of solid solutions. X-ray diffraction (XRD) and electronic absorption spectroscopy (EAS) data were combined on a (Ca,Sr,Ba)2(Mg0.7Co0.3)Si2O7 join to determine average and local distances, respectively. Instead of varying the EAS-active ion concentration along the join, as has commonly been performed in previous studies, the constant replacement of Mg2+ by a minimal fraction of a similar size cation (Co2+) has been used to assess the effects of varying second-nearest neighbor cations (Ca, Sr, Ba) on the local distances of the first shell. A comparison between doped and un-doped series has shown that, although the overall symmetry of the Co-centered T1-site was retained, greater relaxation occurs at the CoO4 tetrahedra which become increasingly large and more distorted than the MgO4 tetrahedra. This is indicated by an increase in both the quadratic elongation (λT1) and the bond angle variance (σ2T1) distortion indices, as the whole structure expands due to an increase in size in the second-nearest neighbors. This behavior highlights the effect of the different electronic configurations of Co2+ (3d7) and Mg2+ (2p6) in spite of their very similar ionic size. Furthermore, although the overall symmetry of the Co-centered T1-site is retained, relatively limited (<10 deg) angular variations in O-Co2+-O occur along the solid solution series and large changes are found in molar absorption coefficients showing that EAS Co2+-bands are highly sensitive to change in the local structure.

  8. Construction of phase diagrams for nanoscaled Ising thin films on the honeycomb lattice using cellular automata simulation approach

    NASA Astrophysics Data System (ADS)

    Ghaemi, Mehrdad; Javadi, Nabi

    2017-11-01

    The phase diagrams of the three-layer Ising model on the honeycomb lattice with a diluted surface have been constructed using the probabilistic cellular automata based on Glauber algorithm. The effects of the exchange interactions on the phase diagrams have been investigated. A general mathematical expression for the critical temperature is obtained in terms of relative coupling r = J1/J and Δs = (Js/J) - 1, where J and Js represent the nearest neighbor coupling within inner- and surface-layers, respectively, and each magnetic site in the surface-layer is coupled with the nearest neighbor site in the inner-layer via the exchange coupling J1. In the case of antiferromagnetic coupling between surface-layer and inner-layer, system reveals many interesting phenomena, such as the possibility of existence of compensation line before the critical temperature.

  9. Identifying influential neighbors in animal flocking

    PubMed Central

    Jiang, Li; Giuggioli, Luca; Escobedo, Ramón; Sire, Clément; Han, Zhangang

    2017-01-01

    Schools of fish and flocks of birds can move together in synchrony and decide on new directions of movement in a seamless way. This is possible because group members constantly share directional information with their neighbors. Although detecting the directionality of other group members is known to be important to maintain cohesion, it is not clear how many neighbors each individual can simultaneously track and pay attention to, and what the spatial distribution of these influential neighbors is. Here, we address these questions on shoals of Hemigrammus rhodostomus, a species of fish exhibiting strong schooling behavior. We adopt a data-driven analysis technique based on the study of short-term directional correlations to identify which neighbors have the strongest influence over the participation of an individual in a collective U-turn event. We find that fish mainly react to one or two neighbors at a time. Moreover, we find no correlation between the distance rank of a neighbor and its likelihood to be influential. We interpret our results in terms of fish allocating sequential and selective attention to their neighbors. PMID:29161269

  10. Using Machine Learning and Natural Language Processing Algorithms to Automate the Evaluation of Clinical Decision Support in Electronic Medical Record Systems.

    PubMed

    Szlosek, Donald A; Ferrett, Jonathan

    2016-01-01

    As the number of clinical decision support systems (CDSSs) incorporated into electronic medical records (EMRs) increases, so does the need to evaluate their effectiveness. The use of medical record review and similar manual methods for evaluating decision rules is laborious and inefficient. The authors use machine learning and Natural Language Processing (NLP) algorithms to accurately evaluate a clinical decision support rule through an EMR system, and they compare it against manual evaluation. Modeled after the EMR system EPIC at Maine Medical Center, we developed a dummy data set containing physician notes in free text for 3,621 artificial patients records undergoing a head computed tomography (CT) scan for mild traumatic brain injury after the incorporation of an electronic best practice approach. We validated the accuracy of the Best Practice Advisories (BPA) using three machine learning algorithms-C-Support Vector Classification (SVC), Decision Tree Classifier (DecisionTreeClassifier), k-nearest neighbors classifier (KNeighborsClassifier)-by comparing their accuracy for adjudicating the occurrence of a mild traumatic brain injury against manual review. We then used the best of the three algorithms to evaluate the effectiveness of the BPA, and we compared the algorithm's evaluation of the BPA to that of manual review. The electronic best practice approach was found to have a sensitivity of 98.8 percent (96.83-100.0), specificity of 10.3 percent, PPV = 7.3 percent, and NPV = 99.2 percent when reviewed manually by abstractors. Though all the machine learning algorithms were observed to have a high level of prediction, the SVC displayed the highest with a sensitivity 93.33 percent (92.49-98.84), specificity of 97.62 percent (96.53-98.38), PPV = 50.00, NPV = 99.83. The SVC algorithm was observed to have a sensitivity of 97.9 percent (94.7-99.86), specificity 10.30 percent, PPV 7.25 percent, and NPV 99.2 percent for evaluating the best practice approach, after

  11. Online Feature Transformation Learning for Cross-Domain Object Category Recognition.

    PubMed

    Zhang, Xuesong; Zhuang, Yan; Wang, Wei; Pedrycz, Witold

    2017-06-09

    In this paper, we introduce a new research problem termed online feature transformation learning in the context of multiclass object category recognition. The learning of a feature transformation is viewed as learning a global similarity metric function in an online manner. We first consider the problem of online learning a feature transformation matrix expressed in the original feature space and propose an online passive aggressive feature transformation algorithm. Then these original features are mapped to kernel space and an online single kernel feature transformation (OSKFT) algorithm is developed to learn a nonlinear feature transformation. Based on the OSKFT and the existing Hedge algorithm, a novel online multiple kernel feature transformation algorithm is also proposed, which can further improve the performance of online feature transformation learning in large-scale application. The classifier is trained with k nearest neighbor algorithm together with the learned similarity metric function. Finally, we experimentally examined the effect of setting different parameter values in the proposed algorithms and evaluate the model performance on several multiclass object recognition data sets. The experimental results demonstrate the validity and good performance of our methods on cross-domain and multiclass object recognition application.

  12. Modifications to Axially Symmetric Simulations Using New DSMC (2007) Algorithms

    NASA Technical Reports Server (NTRS)

    Liechty, Derek S.

    2008-01-01

    Several modifications aimed at improving physical accuracy are proposed for solving axially symmetric problems building on the DSMC (2007) algorithms introduced by Bird. Originally developed to solve nonequilibrium, rarefied flows, the DSMC method is now regularly used to solve complex problems over a wide range of Knudsen numbers. These new algorithms include features such as nearest neighbor collisions excluding the previous collision partners, separate collision and sampling cells, automatically adaptive variable time steps, a modified no-time counter procedure for collisions, and discontinuous and event-driven physical processes. Axially symmetric solutions require radial weighting for the simulated molecules since the molecules near the axis represent fewer real molecules than those farther away from the axis due to the difference in volume of the cells. In the present methodology, these radial weighting factors are continuous, linear functions that vary with the radial position of each simulated molecule. It is shown that how one defines the number of tentative collisions greatly influences the mean collision time near the axis. The method by which the grid is treated for axially symmetric problems also plays an important role near the axis, especially for scalar pressure. A new method to treat how the molecules are traced through the grid is proposed to alleviate the decrease in scalar pressure at the axis near the surface. Also, a modification to the duplication buffer is proposed to vary the duplicated molecular velocities while retaining the molecular kinetic energy and axially symmetric nature of the problem.

  13. Algorithms for optimizing cross-overs in DNA shuffling.

    PubMed

    He, Lu; Friedman, Alan M; Bailey-Kellogg, Chris

    2012-03-21

    DNA shuffling generates combinatorial libraries of chimeric genes by stochastically recombining parent genes. The resulting libraries are subjected to large-scale genetic selection or screening to identify those chimeras with favorable properties (e.g., enhanced stability or enzymatic activity). While DNA shuffling has been applied quite successfully, it is limited by its homology-dependent, stochastic nature. Consequently, it is used only with parents of sufficient overall sequence identity, and provides no control over the resulting chimeric library. This paper presents efficient methods to extend the scope of DNA shuffling to handle significantly more diverse parents and to generate more predictable, optimized libraries. Our CODNS (cross-over optimization for DNA shuffling) approach employs polynomial-time dynamic programming algorithms to select codons for the parental amino acids, allowing for zero or a fixed number of conservative substitutions. We first present efficient algorithms to optimize the local sequence identity or the nearest-neighbor approximation of the change in free energy upon annealing, objectives that were previously optimized by computationally-expensive integer programming methods. We then present efficient algorithms for more powerful objectives that seek to localize and enhance the frequency of recombination by producing "runs" of common nucleotides either overall or according to the sequence diversity of the resulting chimeras. We demonstrate the effectiveness of CODNS in choosing codons and allocating substitutions to promote recombination between parents targeted in earlier studies: two GAR transformylases (41% amino acid sequence identity), two very distantly related DNA polymerases, Pol X and β (15%), and beta-lactamases of varying identity (26-47%). Our methods provide the protein engineer with a new approach to DNA shuffling that supports substantially more diverse parents, is more deterministic, and generates more predictable

  14. Inspection of wear particles in oils by using a fuzzy classifier

    NASA Astrophysics Data System (ADS)

    Hamalainen, Jari J.; Enwald, Petri

    1994-11-01

    The reliability of stand-alone machines and larger production units can be improved by automated condition monitoring. Analysis of wear particles in lubricating or hydraulic oils helps diagnosing the wear states of machine parts. This paper presents a computer vision system for automated classification of wear particles. Digitized images from experiments with a bearing test bench, a hydraulic system with an industrial company, and oil samples from different industrial sources were used for algorithm development and testing. The wear particles were divided into four classes indicating different wear mechanisms: cutting wear, fatigue wear, adhesive wear, and abrasive wear. The results showed that the fuzzy K-nearest neighbor classifier utilized gave the same distribution of wear particles as the classification by a human expert.

  15. [The impact of subsidized healthcare insurance on access to cervical cytology in Medellin, Colombia].

    PubMed

    Atehortúa, Sara C; Palacio-Mejía, Lina S

    2014-01-01

    Assessing the impact of subsidized healthcare insurance on access to cervical cytology in Medellin, Colombia. Propensity score matching (PSM) was used with 2008 Life Quality Survey in Colombia figures to obtain a control group comparable to a treatment group. This involved using stratification estimates, the k-nearest-neighbor algorithm and kernel density for calculating impact size Access to cytology for 19 to 49 year-old women having subsidized healthcare insurance were 2.2 % to 2.9 % lower compared to women who did not have any healthcare insurance. Estimates were not statistically significant for women over 50 years-old. Women lacking healthcare insurance having increased access to cytology could be explained by charities or social programs aiding the population lacking healthcare insurance.

  16. Nearest private query based on quantum oblivious key distribution

    NASA Astrophysics Data System (ADS)

    Xu, Min; Shi, Run-hua; Luo, Zhen-yu; Peng, Zhen-wan

    2017-12-01

    Nearest private query is a special private query which involves two parties, a user and a data owner, where the user has a private input (e.g., an integer) and the data owner has a private data set, and the user wants to query which element in the owner's private data set is the nearest to his input without revealing their respective private information. In this paper, we first present a quantum protocol for nearest private query, which is based on quantum oblivious key distribution (QOKD). Compared to the classical related protocols, our protocol has the advantages of the higher security and the better feasibility, so it has a better prospect of applications.

  17. Fast internal marker tracking algorithm for onboard MV and kV imaging systems

    PubMed Central

    Mao, W.; Wiersma, R. D.; Xing, L.

    2008-01-01

    Intrafraction organ motion can limit the advantage of highly conformal dose techniques such as intensity modulated radiation therapy (IMRT) due to target position uncertainty. To ensure high accuracy in beam targeting, real-time knowledge of the target location is highly desired throughout the beam delivery process. This knowledge can be gained through imaging of internally implanted radio-opaque markers with fluoroscopic or electronic portal imaging devices (EPID). In the case of MV based images, marker detection can be problematic due to the significantly lower contrast between different materials in comparison to their kV-based counterparts. This work presents a fully automated algorithm capable of detecting implanted metallic markers in both kV and MV images with high consistency. Using prior CT information, the algorithm predefines the volumetric search space without manual region-of-interest (ROI) selection by the user. Depending on the template selected, both spherical and cylindrical markers can be detected. Multiple markers can be simultaneously tracked without indexing confusion. Phantom studies show detection success rates of 100% for both kV and MV image data. In addition, application of the algorithm to real patient image data results in successful detection of all implanted markers for MV images. Near real-time operational speeds of ∼10 frames∕sec for the detection of five markers in a 1024×768 image are accomplished using an ordinary PC workstation. PMID:18561670

  18. Reducing process delays for real-time earthquake parameter estimation - An application of KD tree to large databases for Earthquake Early Warning

    NASA Astrophysics Data System (ADS)

    Yin, Lucy; Andrews, Jennifer; Heaton, Thomas

    2018-05-01

    Earthquake parameter estimations using nearest neighbor searching among a large database of observations can lead to reliable prediction results. However, in the real-time application of Earthquake Early Warning (EEW) systems, the accurate prediction using a large database is penalized by a significant delay in the processing time. We propose to use a multidimensional binary search tree (KD tree) data structure to organize large seismic databases to reduce the processing time in nearest neighbor search for predictions. We evaluated the performance of KD tree on the Gutenberg Algorithm, a database-searching algorithm for EEW. We constructed an offline test to predict peak ground motions using a database with feature sets of waveform filter-bank characteristics, and compare the results with the observed seismic parameters. We concluded that large database provides more accurate predictions of the ground motion information, such as peak ground acceleration, velocity, and displacement (PGA, PGV, PGD), than source parameters, such as hypocenter distance. Application of the KD tree search to organize the database reduced the average searching process by 85% time cost of the exhaustive method, allowing the method to be feasible for real-time implementation. The algorithm is straightforward and the results will reduce the overall time of warning delivery for EEW.

  19. An improved method of early diagnosis of smoking-induced respiratory changes using machine learning algorithms.

    PubMed

    Amaral, Jorge L M; Lopes, Agnaldo J; Jansen, José M; Faria, Alvaro C D; Melo, Pedro L

    2013-12-01

    The purpose of this study was to develop an automatic classifier to increase the accuracy of the forced oscillation technique (FOT) for diagnosing early respiratory abnormalities in smoking patients. The data consisted of FOT parameters obtained from 56 volunteers, 28 healthy and 28 smokers with low tobacco consumption. Many supervised learning techniques were investigated, including logistic linear classifiers, k nearest neighbor (KNN), neural networks and support vector machines (SVM). To evaluate performance, the ROC curve of the most accurate parameter was established as baseline. To determine the best input features and classifier parameters, we used genetic algorithms and a 10-fold cross-validation using the average area under the ROC curve (AUC). In the first experiment, the original FOT parameters were used as input. We observed a significant improvement in accuracy (KNN=0.89 and SVM=0.87) compared with the baseline (0.77). The second experiment performed a feature selection on the original FOT parameters. This selection did not cause any significant improvement in accuracy, but it was useful in identifying more adequate FOT parameters. In the third experiment, we performed a feature selection on the cross products of the FOT parameters. This selection resulted in a further increase in AUC (KNN=SVM=0.91), which allows for high diagnostic accuracy. In conclusion, machine learning classifiers can help identify early smoking-induced respiratory alterations. The use of FOT cross products and the search for the best features and classifier parameters can markedly improve the performance of machine learning classifiers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  20. Social aggregation in pea aphids: experiment and random walk modeling.

    PubMed

    Nilsen, Christa; Paige, John; Warner, Olivia; Mayhew, Benjamin; Sutley, Ryan; Lam, Matthew; Bernoff, Andrew J; Topaz, Chad M

    2013-01-01

    From bird flocks to fish schools and ungulate herds to insect swarms, social biological aggregations are found across the natural world. An ongoing challenge in the mathematical modeling of aggregations is to strengthen the connection between models and biological data by quantifying the rules that individuals follow. We model aggregation of the pea aphid, Acyrthosiphon pisum. Specifically, we conduct experiments to track the motion of aphids walking in a featureless circular arena in order to deduce individual-level rules. We observe that each aphid transitions stochastically between a moving and a stationary state. Moving aphids follow a correlated random walk. The probabilities of motion state transitions, as well as the random walk parameters, depend strongly on distance to an aphid's nearest neighbor. For large nearest neighbor distances, when an aphid is essentially isolated, its motion is ballistic with aphids moving faster, turning less, and being less likely to stop. In contrast, for short nearest neighbor distances, aphids move more slowly, turn more, and are more likely to become stationary; this behavior constitutes an aggregation mechanism. From the experimental data, we estimate the state transition probabilities and correlated random walk parameters as a function of nearest neighbor distance. With the individual-level model established, we assess whether it reproduces the macroscopic patterns of movement at the group level. To do so, we consider three distributions, namely distance to nearest neighbor, angle to nearest neighbor, and percentage of population moving at any given time. For each of these three distributions, we compare our experimental data to the output of numerical simulations of our nearest neighbor model, and of a control model in which aphids do not interact socially. Our stochastic, social nearest neighbor model reproduces salient features of the experimental data that are not captured by the control.