Analysis of miRNA expression profile based on SVM algorithm
NASA Astrophysics Data System (ADS)
Ting-ting, Dai; Chang-ji, Shan; Yan-shou, Dong; Yi-duo, Bian
2018-05-01
Based on mirna expression spectrum data set, a new data mining algorithm - tSVM - KNN (t statistic with support vector machine - k nearest neighbor) is proposed. the idea of the algorithm is: firstly, the feature selection of the data set is carried out by the unified measurement method; Secondly, SVM - KNN algorithm, which combines support vector machine (SVM) and k - nearest neighbor (k - nearest neighbor) is used as classifier. Simulation results show that SVM - KNN algorithm has better classification ability than SVM and KNN alone. Tsvm - KNN algorithm only needs 5 mirnas to obtain 96.08 % classification accuracy in terms of the number of mirna " tags" and recognition accuracy. compared with similar algorithms, tsvm - KNN algorithm has obvious advantages.
Wang, Xueyi
2012-02-08
The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.
K-Nearest Neighbor Algorithm Optimization in Text Categorization
NASA Astrophysics Data System (ADS)
Chen, Shufeng
2018-01-01
K-Nearest Neighbor (KNN) classification algorithm is one of the simplest methods of data mining. It has been widely used in classification, regression and pattern recognition. The traditional KNN method has some shortcomings such as large amount of sample computation and strong dependence on the sample library capacity. In this paper, a method of representative sample optimization based on CURE algorithm is proposed. On the basis of this, presenting a quick algorithm QKNN (Quick k-nearest neighbor) to find the nearest k neighbor samples, which greatly reduces the similarity calculation. The experimental results show that this algorithm can effectively reduce the number of samples and speed up the search for the k nearest neighbor samples to improve the performance of the algorithm.
Improving GPU-accelerated adaptive IDW interpolation algorithm using fast kNN search.
Mei, Gang; Xu, Nengxiong; Xu, Liangliang
2016-01-01
This paper presents an efficient parallel Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm on modern Graphics Processing Unit (GPU). The presented algorithm is an improvement of our previous GPU-accelerated AIDW algorithm by adopting fast k-nearest neighbors (kNN) search. In AIDW, it needs to find several nearest neighboring data points for each interpolated point to adaptively determine the power parameter; and then the desired prediction value of the interpolated point is obtained by weighted interpolating using the power parameter. In this work, we develop a fast kNN search approach based on the space-partitioning data structure, even grid, to improve the previous GPU-accelerated AIDW algorithm. The improved algorithm is composed of the stages of kNN search and weighted interpolating. To evaluate the performance of the improved algorithm, we perform five groups of experimental tests. The experimental results indicate: (1) the improved algorithm can achieve a speedup of up to 1017 over the corresponding serial algorithm; (2) the improved algorithm is at least two times faster than our previous GPU-accelerated AIDW algorithm; and (3) the utilization of fast kNN search can significantly improve the computational efficiency of the entire GPU-accelerated AIDW algorithm.
Three Dimensional Object Recognition Using a Complex Autoregressive Model
1993-12-01
3.4.2 Template Matching Algorithm ...................... 3-16 3.4.3 K-Nearest-Neighbor ( KNN ) Techniques ................. 3-25 3.4.4 Hidden Markov Model...Neighbor ( KNN ) Test Results ...................... 4-13 4.2.1 Single-Look 1-NN Testing .......................... 4-14 4.2.2 Multiple-Look 1-NN Testing...4-15 4.2.3 Discussion of KNN Test Results ...................... 4-15 4.3 Hidden Markov Model (HMM) Test Results
Statistical analysis for validating ACO-KNN algorithm as feature selection in sentiment analysis
NASA Astrophysics Data System (ADS)
Ahmad, Siti Rohaidah; Yusop, Nurhafizah Moziyana Mohd; Bakar, Azuraliza Abu; Yaakub, Mohd Ridzwan
2017-10-01
This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbor (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, IG-GA, and IG-RSAR algorithms. This study evaluated the performance of the ACO-KNN algorithm using precision, recall, and F-score, which were validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.
Quantum Algorithm for K-Nearest Neighbors Classification Based on the Metric of Hamming Distance
NASA Astrophysics Data System (ADS)
Ruan, Yue; Xue, Xiling; Liu, Heng; Tan, Jianing; Li, Xi
2017-11-01
K-nearest neighbors (KNN) algorithm is a common algorithm used for classification, and also a sub-routine in various complicated machine learning tasks. In this paper, we presented a quantum algorithm (QKNN) for implementing this algorithm based on the metric of Hamming distance. We put forward a quantum circuit for computing Hamming distance between testing sample and each feature vector in the training set. Taking advantage of this method, we realized a good analog for classical KNN algorithm by setting a distance threshold value t to select k - n e a r e s t neighbors. As a result, QKNN achieves O( n 3) performance which is only relevant to the dimension of feature vectors and high classification accuracy, outperforms Llyod's algorithm (Lloyd et al. 2013) and Wiebe's algorithm (Wiebe et al. 2014).
AVNM: A Voting based Novel Mathematical Rule for Image Classification.
Vidyarthi, Ankit; Mittal, Namita
2016-12-01
In machine learning, the accuracy of the system depends upon classification result. Classification accuracy plays an imperative role in various domains. Non-parametric classifier like K-Nearest Neighbor (KNN) is the most widely used classifier for pattern analysis. Besides its easiness, simplicity and effectiveness characteristics, the main problem associated with KNN classifier is the selection of a number of nearest neighbors i.e. "k" for computation. At present, it is hard to find the optimal value of "k" using any statistical algorithm, which gives perfect accuracy in terms of low misclassification error rate. Motivated by the prescribed problem, a new sample space reduction weighted voting mathematical rule (AVNM) is proposed for classification in machine learning. The proposed AVNM rule is also non-parametric in nature like KNN. AVNM uses the weighted voting mechanism with sample space reduction to learn and examine the predicted class label for unidentified sample. AVNM is free from any initial selection of predefined variable and neighbor selection as found in KNN algorithm. The proposed classifier also reduces the effect of outliers. To verify the performance of the proposed AVNM classifier, experiments are made on 10 standard datasets taken from UCI database and one manually created dataset. The experimental result shows that the proposed AVNM rule outperforms the KNN classifier and its variants. Experimentation results based on confusion matrix accuracy parameter proves higher accuracy value with AVNM rule. The proposed AVNM rule is based on sample space reduction mechanism for identification of an optimal number of nearest neighbor selections. AVNM results in better classification accuracy and minimum error rate as compared with the state-of-art algorithm, KNN, and its variants. The proposed rule automates the selection of nearest neighbor selection and improves classification rate for UCI dataset and manually created dataset. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Portable Language-Independent Adaptive Translation from OCR. Phase 1
2009-04-01
including brute-force k-Nearest Neighbors ( kNN ), fast approximate kNN using hashed k-d trees, classification and regression trees, and locality...achieved by refinements in ground-truthing protocols. Recent algorithmic improvements to our approximate kNN classifier using hashed k-D trees allows...recent years discriminative training has been shown to outperform phonetic HMMs estimated using ML for speech recognition. Standard ML estimation
GPU based cloud system for high-performance arrhythmia detection with parallel k-NN algorithm.
Tae Joon Jun; Hyun Ji Park; Hyuk Yoo; Young-Hak Kim; Daeyoung Kim
2016-08-01
In this paper, we propose an GPU based Cloud system for high-performance arrhythmia detection. Pan-Tompkins algorithm is used for QRS detection and we optimized beat classification algorithm with K-Nearest Neighbor (K-NN). To support high performance beat classification on the system, we parallelized beat classification algorithm with CUDA to execute the algorithm on virtualized GPU devices on the Cloud system. MIT-BIH Arrhythmia database is used for validation of the algorithm. The system achieved about 93.5% of detection rate which is comparable to previous researches while our algorithm shows 2.5 times faster execution time compared to CPU only detection algorithm.
Electromagnetic Induction Spectroscopy for the Detection of Subsurface Targets
2012-12-01
curves of the proposed method and that of Fails et al.. For the kNN ROC curve, k = 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81...et al. [6] and Ramachandran et al. [7] both demonstrated success in detecting mines using the k-nearest-neighbor ( kNN ) algorithm based on the EMI...error is also included in the feature vector. The kNN labels an unknown target based on the closest targets in a training set. Collins et al. [2] and
Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Jian; Hamidouche, Khaled; Zheng, Jie
2015-08-05
Machine Learning algorithms are benefiting from the continuous improvement of programming models, including MPI, MapReduce and PGAS. k-Nearest Neighbors (k-NN) algorithm is a widely used machine learning algorithm, applied to supervised learning tasks such as classification. Several parallel implementations of k-NN have been proposed in the literature and practice. However, on high-performance computing systems with high-speed interconnects, it is important to further accelerate existing designs of the k-NN algorithm through taking advantage of scalable programming models. To improve the performance of k-NN on large-scale environment with InfiniBand network, this paper proposes several alternative hybrid MPI+OpenSHMEM designs and performs a systemicmore » evaluation and analysis on typical workloads. The hybrid designs leverage the one-sided memory access to better overlap communication with computation than the existing pure MPI design, and propose better schemes for efficient buffer management. The implementation based on k-NN program from MaTEx with MVAPICH2-X (Unified MPI+PGAS Communication Runtime over InfiniBand) shows up to 9.0% time reduction for training KDD Cup 2010 workload over 512 cores, and 27.6% time reduction for small workload with balanced communication and computation. Experiments of running with varied number of cores show that our design can maintain good scalability.« less
Applying an efficient K-nearest neighbor search to forest attribute imputation
Andrew O. Finley; Ronald E. McRoberts; Alan R. Ek
2006-01-01
This paper explores the utility of an efficient nearest neighbor (NN) search algorithm for applications in multi-source kNN forest attribute imputation. The search algorithm reduces the number of distance calculations between a given target vector and each reference vector, thereby, decreasing the time needed to discover the NN subset. Results of five trials show gains...
Comparative Analysis of Document level Text Classification Algorithms using R
NASA Astrophysics Data System (ADS)
Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.
2017-08-01
From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Majidpour, Mostafa; Qiu, Charlie; Chu, Peter
Three algorithms for the forecasting of energy consumption at individual EV charging outlets have been applied to real world data from the UCLA campus. Out of these three algorithms, namely k-Nearest Neighbor (kNN), ARIMA, and Pattern Sequence Forecasting (PSF), kNN with k=1, was the best and PSF was the worst performing algorithm with respect to the SMAPE measure. The advantage of PSF is its increased robustness to noise by substituting the real valued time series with an integer valued one, and the advantage of NN is having the least SMAPE for our data. We propose a Modified PSF algorithm (MPSF)more » which is a combination of PSF and NN; it could be interpreted as NN on integer valued data or as PSF with considering only the most recent neighbor to produce the output. Some other shortcomings of PSF are also addressed in the MPSF. Results show that MPSF has improved the forecast performance.« less
Attention Recognition in EEG-Based Affective Learning Research Using CFS+KNN Algorithm.
Hu, Bin; Li, Xiaowei; Sun, Shuting; Ratcliffe, Martyn
2018-01-01
The research detailed in this paper focuses on the processing of Electroencephalography (EEG) data to identify attention during the learning process. The identification of affect using our procedures is integrated into a simulated distance learning system that provides feedback to the user with respect to attention and concentration. The authors propose a classification procedure that combines correlation-based feature selection (CFS) and a k-nearest-neighbor (KNN) data mining algorithm. To evaluate the CFS+KNN algorithm, it was test against CFS+C4.5 algorithm and other classification algorithms. The classification performance was measured 10 times with different 3-fold cross validation data. The data was derived from 10 subjects while they were attempting to learn material in a simulated distance learning environment. A self-assessment model of self-report was used with a single valence to evaluate attention on 3 levels (high, neutral, low). It was found that CFS+KNN had a much better performance, giving the highest correct classification rate (CCR) of % for the valence dimension divided into three classes.
An automated algorithm for determining photometric redshifts of quasars
NASA Astrophysics Data System (ADS)
Wang, Dan; Zhang, Yanxia; Zhao, Yongheng
2010-07-01
We employ k-nearest neighbor algorithm (KNN) for photometric redshift measurement of quasars with the Fifth Data Release (DR5) of the Sloan Digital Sky Survey (SDSS). KNN is an instance learning algorithm where the result of new instance query is predicted based on the closest training samples. The regressor do not use any model to fit and only based on memory. Given a query quasar, we find the known quasars or (training points) closest to the query point, whose redshift value is simply assigned to be the average of the values of its k nearest neighbors. Three kinds of different colors (PSF, Model or Fiber) and spectral redshifts are used as input parameters, separatively. The combination of the three kinds of colors is also taken as input. The experimental results indicate that the best input pattern is PSF + Model + Fiber colors in all experiments. With this pattern, 59.24%, 77.34% and 84.68% of photometric redshifts are obtained within ▵z < 0.1, 0.2 and 0.3, respectively. If only using one kind of colors as input, the model colors achieve the best performance. However, when using two kinds of colors, the best result is achieved by PSF + Fiber colors. In addition, nearest neighbor method (k = 1) shows its superiority compared to KNN (k ≠ 1) for the given sample.
A Proposed Methodology to Classify Frontier Capital Markets
2011-07-31
but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project involves basic...machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ), ensemble...Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F. Horizontal
A Proposed Methodology to Classify Frontier Capital Markets
2011-07-31
out of charity, but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project...identification, and machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ...Support Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F
NASA Astrophysics Data System (ADS)
Squiers, John J.; Li, Weizhi; King, Darlene R.; Mo, Weirong; Zhang, Xu; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.
2016-03-01
The clinical judgment of expert burn surgeons is currently the standard on which diagnostic and therapeutic decisionmaking regarding burn injuries is based. Multispectral imaging (MSI) has the potential to increase the accuracy of burn depth assessment and the intraoperative identification of viable wound bed during surgical debridement of burn injuries. A highly accurate classification model must be developed using machine-learning techniques in order to translate MSI data into clinically-relevant information. An animal burn model was developed to build an MSI training database and to study the burn tissue classification ability of several models trained via common machine-learning algorithms. The algorithms tested, from least to most complex, were: K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), weighted linear discriminant analysis (W-LDA), quadratic discriminant analysis (QDA), ensemble linear discriminant analysis (EN-LDA), ensemble K-nearest neighbors (EN-KNN), and ensemble decision tree (EN-DT). After the ground-truth database of six tissue types (healthy skin, wound bed, blood, hyperemia, partial injury, full injury) was generated by histopathological analysis, we used 10-fold cross validation to compare the algorithms' performances based on their accuracies in classifying data against the ground truth, and each algorithm was tested 100 times. The mean test accuracy of the algorithms were KNN 68.3%, DT 61.5%, LDA 70.5%, W-LDA 68.1%, QDA 68.9%, EN-LDA 56.8%, EN-KNN 49.7%, and EN-DT 36.5%. LDA had the highest test accuracy, reflecting the bias-variance tradeoff over the range of complexities inherent to the algorithms tested. Several algorithms were able to match the current standard in burn tissue classification, the clinical judgment of expert burn surgeons. These results will guide further development of an MSI burn tissue classification system. Given that there are few surgeons and facilities specializing in burn care, this technology may improve the standard of burn care for patients without access to specialized facilities.
Fast Query-Optimized Kernel-Machine Classification
NASA Technical Reports Server (NTRS)
Mazzoni, Dominic; DeCoste, Dennis
2004-01-01
A recently developed algorithm performs kernel-machine classification via incremental approximate nearest support vectors. The algorithm implements support-vector machines (SVMs) at speeds 10 to 100 times those attainable by use of conventional SVM algorithms. The algorithm offers potential benefits for classification of images, recognition of speech, recognition of handwriting, and diverse other applications in which there are requirements to discern patterns in large sets of data. SVMs constitute a subset of kernel machines (KMs), which have become popular as models for machine learning and, more specifically, for automated classification of input data on the basis of labeled training data. While similar in many ways to k-nearest-neighbors (k-NN) models and artificial neural networks (ANNs), SVMs tend to be more accurate. Using representations that scale only linearly in the numbers of training examples, while exploring nonlinear (kernelized) feature spaces that are exponentially larger than the original input dimensionality, KMs elegantly and practically overcome the classic curse of dimensionality. However, the price that one must pay for the power of KMs is that query-time complexity scales linearly with the number of training examples, making KMs often orders of magnitude more computationally expensive than are ANNs, decision trees, and other popular machine learning alternatives. The present algorithm treats an SVM classifier as a special form of a k-NN. The algorithm is based partly on an empirical observation that one can often achieve the same classification as that of an exact KM by using only small fraction of the nearest support vectors (SVs) of a query. The exact KM output is a weighted sum over the kernel values between the query and the SVs. In this algorithm, the KM output is approximated with a k-NN classifier, the output of which is a weighted sum only over the kernel values involving k selected SVs. Before query time, there are gathered statistics about how misleading the output of the k-NN model can be, relative to the outputs of the exact KM for a representative set of examples, for each possible k from 1 to the total number of SVs. From these statistics, there are derived upper and lower thresholds for each step k. These thresholds identify output levels for which the particular variant of the k-NN model already leans so strongly positively or negatively that a reversal in sign is unlikely, given the weaker SV neighbors still remaining. At query time, the partial output of each query is incrementally updated, stopping as soon as it exceeds the predetermined statistical thresholds of the current step. For an easy query, stopping can occur as early as step k = 1. For more difficult queries, stopping might not occur until nearly all SVs are touched. A key empirical observation is that this approach can tolerate very approximate nearest-neighbor orderings. In experiments, SVs and queries were projected to a subspace comprising the top few principal- component dimensions and neighbor orderings were computed in that subspace. This approach ensured that the overhead of the nearest-neighbor computations was insignificant, relative to that of the exact KM computation.
Ronald E. McRoberts; Grant M. Domke; Qi Chen; Erik Næsset; Terje Gobakken
2016-01-01
The relatively small sampling intensities used by national forest inventories are often insufficient to produce the desired precision for estimates of population parameters unless the estimation process is augmented with auxiliary information, usually in the form of remotely sensed data. The k-Nearest Neighbors (k-NN) technique is a non-parametric,multivariate approach...
Generative Models for Similarity-based Classification
2007-01-01
NC), local nearest centroid (local NC), k-nearest neighbors ( kNN ), and condensed nearest neighbors (CNN) are all similarity-based classifiers which...vector machine to the k nearest neighbors of the test sample [80]. The SVM- KNN method was developed to address the robustness and dimensionality...concerns that afflict nearest neighbors and SVMs. Similarly to the nearest-means classifier, the SVM- KNN is a hybrid local and global classifier developed
Nation-Building Modeling and Resource Allocation Via Dynamic Programming
2014-09-01
Figure 2. RAND Study Models[59:98,115] (WMA) and used both the k-Nearest Neighbor ( KNN ) and Nearest Centroid (NC) algorithms to classify future features...The study found that KNN performed bet- ter than NC with 85% or greater accuracy in all test cases. The methodology was adopted for use under the...analysis feature of the model. 3.7.1 The No Surge Alternative. On the 10th of January 2007, President George W. Bush delivered a speech to the American
On the classification techniques in data mining for microarray data classification
NASA Astrophysics Data System (ADS)
Aydadenta, Husna; Adiwijaya
2018-03-01
Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.
1991-12-01
user using the ’: knn ’ option in the do-scenario command line). An instance of the K-Nearest Neighbor object is first created and initialized before...Navigation Computer HF High Frequency ILS Instrument Landing System KNN K - Nearest Neighbor LRU Line Replaceable Unit MC Mission Computer MTCA...approaches have been investigated here, K-nearest Neighbors ( KNN ) and neural networks (NN). Both approaches require that previously classified examples of
Minimum Expected Risk Estimation for Near-neighbor Classification
2006-04-01
We consider the problems of class probability estimation and classification when using near-neighbor classifiers, such as k-nearest neighbors ( kNN ...estimate for weighted kNN classifiers with different prior information, for a broad class of risk functions. Theory and simulations show how significant...the difference is compared to the standard maximum likelihood weighted kNN estimates. Comparisons are made with uniform weights, symmetric weights
Lip reading using neural networks
NASA Astrophysics Data System (ADS)
Kalbande, Dhananjay; Mishra, Akassh A.; Patil, Sanjivani; Nirgudkar, Sneha; Patel, Prashant
2011-10-01
Computerized lip reading, or speech reading, is concerned with the difficult task of converting a video signal of a speaking person to written text. It has several applications like teaching deaf and dumb to speak and communicate effectively with the other people, its crime fighting potential and invariance to acoustic environment. We convert the video of the subject speaking vowels into images and then images are further selected manually for processing. However, several factors like fast speech, bad pronunciation, and poor illumination, movement of face, moustaches and beards make lip reading difficult. Contour tracking methods and Template matching are used for the extraction of lips from the face. K Nearest Neighbor algorithm is then used to classify the 'speaking' images and the 'silent' images. The sequence of images is then transformed into segments of utterances. Feature vector is calculated on each frame for all the segments and is stored in the database with properly labeled class. Character recognition is performed using modified KNN algorithm which assigns more weight to nearer neighbors. This paper reports the recognition of vowels using KNN algorithms
Multidimensional k-nearest neighbor model based on EEMD for financial time series forecasting
NASA Astrophysics Data System (ADS)
Zhang, Ningning; Lin, Aijing; Shang, Pengjian
2017-07-01
In this paper, we propose a new two-stage methodology that combines the ensemble empirical mode decomposition (EEMD) with multidimensional k-nearest neighbor model (MKNN) in order to forecast the closing price and high price of the stocks simultaneously. The modified algorithm of k-nearest neighbors (KNN) has an increasingly wide application in the prediction of all fields. Empirical mode decomposition (EMD) decomposes a nonlinear and non-stationary signal into a series of intrinsic mode functions (IMFs), however, it cannot reveal characteristic information of the signal with much accuracy as a result of mode mixing. So ensemble empirical mode decomposition (EEMD), an improved method of EMD, is presented to resolve the weaknesses of EMD by adding white noise to the original data. With EEMD, the components with true physical meaning can be extracted from the time series. Utilizing the advantage of EEMD and MKNN, the new proposed ensemble empirical mode decomposition combined with multidimensional k-nearest neighbor model (EEMD-MKNN) has high predictive precision for short-term forecasting. Moreover, we extend this methodology to the case of two-dimensions to forecast the closing price and high price of the four stocks (NAS, S&P500, DJI and STI stock indices) at the same time. The results indicate that the proposed EEMD-MKNN model has a higher forecast precision than EMD-KNN, KNN method and ARIMA.
2012-03-01
WMA) and used both the k-Nearest Neighbor ( KNN ) and Nearest Centroid 27 (a) Coalition and Regional (b) Indigenous Figure 3. RAND Study Models[32:98,115...NC) algorithms to classify future features. The study found that KNN performed better than NC with 85% or greater accuracy in all test cases. The...the model. 4.2.1 No Surge. On the 10th of January 2007, President George W. Bush delivered a speech to the American Public outlining a new strategy in
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang Yanxia; Ma He; Peng Nanbo
We apply one of the lazy learning methods, the k-nearest neighbor (kNN) algorithm, to estimate the photometric redshifts of quasars based on various data sets from the Sloan Digital Sky Survey (SDSS), the UKIRT Infrared Deep Sky Survey (UKIDSS), and the Wide-field Infrared Survey Explorer (WISE; the SDSS sample, the SDSS-UKIDSS sample, the SDSS-WISE sample, and the SDSS-UKIDSS-WISE sample). The influence of the k value and different input patterns on the performance of kNN is discussed. kNN performs best when k is different with a special input pattern for a special data set. The best result belongs to the SDSS-UKIDSS-WISEmore » sample. The experimental results generally show that the more information from more bands, the better performance of photometric redshift estimation with kNN. The results also demonstrate that kNN using multiband data can effectively solve the catastrophic failure of photometric redshift estimation, which is met by many machine learning methods. Compared with the performance of various other methods of estimating the photometric redshifts of quasars, kNN based on KD-Tree shows superiority, exhibiting the best accuracy.« less
Privacy Preserving Nearest Neighbor Search
NASA Astrophysics Data System (ADS)
Shaneck, Mark; Kim, Yongdae; Kumar, Vipin
Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by providing mechanisms to mine the data while giving certain privacy guarantees. In this chapter we address the issue of privacy preserving nearest neighbor search, which forms the kernel of many data mining applications. To this end, we present a novel algorithm based on secure multiparty computation primitives to compute the nearest neighbors of records in horizontally distributed data. We show how this algorithm can be used in three important data mining algorithms, namely LOF outlier detection, SNN clustering, and kNN classification. We prove the security of these algorithms under the semi-honest adversarial model, and describe methods that can be used to optimize their performance. Keywords: Privacy Preserving Data Mining, Nearest Neighbor Search, Outlier Detection, Clustering, Classification, Secure Multiparty Computation
Using machine learning algorithms to guide rehabilitation planning for home care clients.
Zhu, Mu; Zhang, Zhanyang; Hirdes, John P; Stolee, Paul
2007-12-20
Targeting older clients for rehabilitation is a clinical challenge and a research priority. We investigate the potential of machine learning algorithms - Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) - to guide rehabilitation planning for home care clients. This study is a secondary analysis of data on 24,724 longer-term clients from eight home care programs in Ontario. Data were collected with the RAI-HC assessment system, in which the Activities of Daily Living Clinical Assessment Protocol (ADLCAP) is used to identify clients with rehabilitation potential. For study purposes, a client is defined as having rehabilitation potential if there was: i) improvement in ADL functioning, or ii) discharge home. SVM and KNN results are compared with those obtained using the ADLCAP. For comparison, the machine learning algorithms use the same functional and health status indicators as the ADLCAP. The KNN and SVM algorithms achieved similar substantially improved performance over the ADLCAP, although false positive and false negative rates were still fairly high (FP > .18, FN > .34 versus FP > .29, FN. > .58 for ADLCAP). Results are used to suggest potential revisions to the ADLCAP. Machine learning algorithms achieved superior predictions than the current protocol. Machine learning results are less readily interpretable, but can also be used to guide development of improved clinical protocols.
Sun, Xiyang; Miao, Jiacheng; Wang, You; Luo, Zhiyuan; Li, Guang
2017-01-01
An estimate on the reliability of prediction in the applications of electronic nose is essential, which has not been paid enough attention. An algorithm framework called conformal prediction is introduced in this work for discriminating different kinds of ginsengs with a home-made electronic nose instrument. Nonconformity measure based on k-nearest neighbors (KNN) is implemented separately as underlying algorithm of conformal prediction. In offline mode, the conformal predictor achieves a classification rate of 84.44% based on 1NN and 80.63% based on 3NN, which is better than that of simple KNN. In addition, it provides an estimate of reliability for each prediction. In online mode, the validity of predictions is guaranteed, which means that the error rate of region predictions never exceeds the significance level set by a user. The potential of this framework for detecting borderline examples and outliers in the application of E-nose is also investigated. The result shows that conformal prediction is a promising framework for the application of electronic nose to make predictions with reliability and validity. PMID:28805721
Na, X D; Zang, S Y; Wu, C S; Li, W L
2015-11-01
Knowledge of the spatial extent of forested wetlands is essential to many studies including wetland functioning assessment, greenhouse gas flux estimation, and wildlife suitable habitat identification. For discriminating forested wetlands from their adjacent land cover types, researchers have resorted to image analysis techniques applied to numerous remotely sensed data. While with some success, there is still no consensus on the optimal approaches for mapping forested wetlands. To address this problem, we examined two machine learning approaches, random forest (RF) and K-nearest neighbor (KNN) algorithms, and applied these two approaches to the framework of pixel-based and object-based classifications. The RF and KNN algorithms were constructed using predictors derived from Landsat 8 imagery, Radarsat-2 advanced synthetic aperture radar (SAR), and topographical indices. The results show that the objected-based classifications performed better than per-pixel classifications using the same algorithm (RF) in terms of overall accuracy and the difference of their kappa coefficients are statistically significant (p<0.01). There were noticeably omissions for forested and herbaceous wetlands based on the per-pixel classifications using the RF algorithm. As for the object-based image analysis, there were also statistically significant differences (p<0.01) of Kappa coefficient between results performed based on RF and KNN algorithms. The object-based classification using RF provided a more visually adequate distribution of interested land cover types, while the object classifications based on the KNN algorithm showed noticeably commissions for forested wetlands and omissions for agriculture land. This research proves that the object-based classification with RF using optical, radar, and topographical data improved the mapping accuracy of land covers and provided a feasible approach to discriminate the forested wetlands from the other land cover types in forestry area.
NASA Astrophysics Data System (ADS)
Huang, Jian; Liu, Gui-xiong
2016-09-01
The identification of targets varies in different surge tests. A multi-color space threshold segmentation and self-learning k-nearest neighbor algorithm ( k-NN) for equipment under test status identification was proposed after using feature matching to identify equipment status had to train new patterns every time before testing. First, color space (L*a*b*, hue saturation lightness (HSL), hue saturation value (HSV)) to segment was selected according to the high luminance points ratio and white luminance points ratio of the image. Second, the unknown class sample S r was classified by the k-NN algorithm with training set T z according to the feature vector, which was formed from number of pixels, eccentricity ratio, compactness ratio, and Euler's numbers. Last, while the classification confidence coefficient equaled k, made S r as one sample of pre-training set T z '. The training set T z increased to T z+1 by T z ' if T z ' was saturated. In nine series of illuminant, indicator light, screen, and disturbances samples (a total of 21600 frames), the algorithm had a 98.65%identification accuracy, also selected five groups of samples to enlarge the training set from T 0 to T 5 by itself.
NASA Astrophysics Data System (ADS)
Juniati, D.; Khotimah, C.; Wardani, D. E. K.; Budayasa, K.
2018-01-01
The heart abnormalities can be detected from heart sound. A heart sound can be heard directly with a stethoscope or indirectly by a phonocardiograph, a machine of the heart sound recording. This paper presents the implementation of fractal dimension theory to make a classification of phonocardiograms into a normal heart sound, a murmur, or an extrasystole. The main algorithm used to calculate the fractal dimension was Higuchi’s Algorithm. There were two steps to make a classification of phonocardiograms, feature extraction, and classification. For feature extraction, we used Discrete Wavelet Transform to decompose the signal of heart sound into several sub-bands depending on the selected level. After the decomposition process, the signal was processed using Fast Fourier Transform (FFT) to determine the spectral frequency. The fractal dimension of the FFT output was calculated using Higuchi Algorithm. The classification of fractal dimension of all phonocardiograms was done with KNN and Fuzzy c-mean clustering methods. Based on the research results, the best accuracy obtained was 86.17%, the feature extraction by DWT decomposition level 3 with the value of kmax 50, using 5-fold cross validation and the number of neighbors was 5 at K-NN algorithm. Meanwhile, for fuzzy c-mean clustering, the accuracy was 78.56%.
An Improvement To The k-Nearest Neighbor Classifier For ECG Database
NASA Astrophysics Data System (ADS)
Jaafar, Haryati; Hidayah Ramli, Nur; Nasir, Aimi Salihah Abdul
2018-03-01
The k nearest neighbor (kNN) is a non-parametric classifier and has been widely used for pattern classification. However, in practice, the performance of kNN often tends to fail due to the lack of information on how the samples are distributed among them. Moreover, kNN is no longer optimal when the training samples are limited. Another problem observed in kNN is regarding the weighting issues in assigning the class label before classification. Thus, to solve these limitations, a new classifier called Mahalanobis fuzzy k-nearest centroid neighbor (MFkNCN) is proposed in this study. Here, a Mahalanobis distance is applied to avoid the imbalance of samples distribition. Then, a surrounding rule is employed to obtain the nearest centroid neighbor based on the distributions of training samples and its distance to the query point. Consequently, the fuzzy membership function is employed to assign the query point to the class label which is frequently represented by the nearest centroid neighbor Experimental studies from electrocardiogram (ECG) signal is applied in this study. The classification performances are evaluated in two experimental steps i.e. different values of k and different sizes of feature dimensions. Subsequently, a comparative study of kNN, kNCN, FkNN and MFkCNN classifier is conducted to evaluate the performances of the proposed classifier. The results show that the performance of MFkNCN consistently exceeds the kNN, kNCN and FkNN with the best classification rates of 96.5%.
Applied algorithm in the liner inspection of solid rocket motors
NASA Astrophysics Data System (ADS)
Hoffmann, Luiz Felipe Simões; Bizarria, Francisco Carlos Parquet; Bizarria, José Walter Parquet
2018-03-01
In rocket motors, the bonding between the solid propellant and thermal insulation is accomplished by a thin adhesive layer, known as liner. The liner application method involves a complex sequence of tasks, which includes in its final stage, the surface integrity inspection. Nowadays in Brazil, an expert carries out a thorough visual inspection to detect defects on the liner surface that may compromise the propellant interface bonding. Therefore, this paper proposes an algorithm that uses the photometric stereo technique and the K-nearest neighbor (KNN) classifier to assist the expert in the surface inspection. Photometric stereo allows the surface information recovery of the test images, while the KNN method enables image pixels classification into two classes: non-defect and defect. Tests performed on a computer vision based prototype validate the algorithm. The positive results suggest that the algorithm is feasible and when implemented in a real scenario, will be able to help the expert in detecting defective areas on the liner surface.
Weighted Parzen Windows for Pattern Classification
1994-05-01
Nearest-Neighbor Rule The k-Nearest-Neighbor ( kNN ) technique is nonparametric, assuming nothing about the distribution of the data. Stated succinctly...probabilities P(wj I x) from samples." Raudys and Jain [20:255] advance this interpretation by pointing out that the kNN technique can be viewed as the...34Parzen window classifier with a hyper- rectangular window function." As with the Parzen-window technique, the kNN classifier is more accurate as the
Optimal Detection Range of RFID Tag for RFID-based Positioning System Using the k-NN Algorithm.
Han, Soohee; Kim, Junghwan; Park, Choung-Hwan; Yoon, Hee-Cheon; Heo, Joon
2009-01-01
Positioning technology to track a moving object is an important and essential component of ubiquitous computing environments and applications. An RFID-based positioning system using the k-nearest neighbor (k-NN) algorithm can determine the position of a moving reader from observed reference data. In this study, the optimal detection range of an RFID-based positioning system was determined on the principle that tag spacing can be derived from the detection range. It was assumed that reference tags without signal strength information are regularly distributed in 1-, 2- and 3-dimensional spaces. The optimal detection range was determined, through analytical and numerical approaches, to be 125% of the tag-spacing distance in 1-dimensional space. Through numerical approaches, the range was 134% in 2-dimensional space, 143% in 3-dimensional space.
Heterogeneous Multi-Metric Learning for Multi-Sensor Fusion
2011-07-01
distance”. One of the most widely used methods is the k-nearest neighbor ( KNN ) method [4], which labels an input data sample to be the class with majority...despite of its simplicity, it can be an effective candidate and can be easily extended to handle multiple sensors. Distance based method such as KNN relies...Neighbor (LMNN) method [21] which will be briefly reviewed in the sequel. LMNN method tries to learn an optimal metric specifically for KNN classifier. The
Yousef, Malik; Khalifa, Waleed; AbedAllah, Loai
2016-12-22
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that ECkNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
Yousef, Malik; Khalifa, Waleed; AbdAllah, Loai
2016-12-01
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that EC-kNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
An Efficient Statistical Computation Technique for Health Care Big Data using R
NASA Astrophysics Data System (ADS)
Sushma Rani, N.; Srinivasa Rao, P., Dr; Parimala, P.
2017-08-01
Due to the changes in living conditions and other factors many critical health related problems are arising. The diagnosis of the problem at earlier stages will increase the chances of survival and fast recovery. This reduces the time of recovery and the cost associated for the treatment. One such medical related issue is cancer and breast cancer has been identified as the second leading cause of cancer death. If detected in the early stage it can be cured. Once a patient is detected with breast cancer tumor, it should be classified whether it is cancerous or non-cancerous. So the paper uses k-nearest neighbors(KNN) algorithm which is one of the simplest machine learning algorithms and is an instance-based learning algorithm to classify the data. Day-to -day new records are added which leds to increase in the data to be classified and this tends to be big data problem. The algorithm is implemented in R whichis the most popular platform applied to machine learning algorithms for statistical computing. Experimentation is conducted by using various classification evaluation metric onvarious values of k. The results show that the KNN algorithm out performes better than existing models.
A comparison between skeleton and bounding box models for falling direction recognition
NASA Astrophysics Data System (ADS)
Narupiyakul, Lalita; Srisrisawang, Nitikorn
2017-12-01
Falling is an injury that can lead to a serious medical condition in every range of the age of people. However, in the case of elderly, the risk of serious injury is much higher. Due to the fact that one way of preventing serious injury is to treat the fallen person as soon as possible, several works attempted to implement different algorithms to recognize the fall. Our work compares the performance of two models based on features extraction: (i) Body joint data (Skeleton Data) which are the joint's positions in 3 axes and (ii) Bounding box (Box-size Data) covering all body joints. Machine learning algorithms that were chosen are Decision Tree (DT), Naïve Bayes (NB), K-nearest neighbors (KNN), Linear discriminant analysis (LDA), Voting Classification (VC), and Gradient boosting (GB). The results illustrate that the models trained with Skeleton data are performed far better than those trained with Box-size data (with an average accuracy of 94-81% and 80-75%, respectively). KNN shows the best performance in both Body joint model and Bounding box model. In conclusion, KNN with Body joint model performs the best among the others.
NASA Astrophysics Data System (ADS)
Wang, Yonggang; Li, Deng; Lu, Xiaoming; Cheng, Xinyi; Wang, Liwei
2014-10-01
Continuous crystal-based positron emission tomography (PET) detectors could be an ideal alternative for current high-resolution pixelated PET detectors if the issues of high performance γ interaction position estimation and its real-time implementation are solved. Unfortunately, existing position estimators are not very feasible for implementation on field-programmable gate array (FPGA). In this paper, we propose a new self-organizing map neural network-based nearest neighbor (SOM-NN) positioning scheme aiming not only at providing high performance, but also at being realistic for FPGA implementation. Benefitting from the SOM feature mapping mechanism, the large set of input reference events at each calibration position is approximated by a small set of prototypes, and the computation of the nearest neighbor searching for unknown events is largely reduced. Using our experimental data, the scheme was evaluated, optimized and compared with the smoothed k-NN method. The spatial resolutions of full-width-at-half-maximum (FWHM) of both methods averaged over the center axis of the detector were obtained as 1.87 ±0.17 mm and 1.92 ±0.09 mm, respectively. The test results show that the SOM-NN scheme has an equivalent positioning performance with the smoothed k-NN method, but the amount of computation is only about one-tenth of the smoothed k-NN method. In addition, the algorithm structure of the SOM-NN scheme is more feasible for implementation on FPGA. It has the potential to realize real-time position estimation on an FPGA with a high-event processing throughput.
Jay M. Ver Hoef; Hailemariam Temesgen; Sergio Gómez
2013-01-01
Forest surveys provide critical information for many diverse interests. Data are often collected from samples, and from these samples, maps of resources and estimates of aerial totals or averages are required. In this paper, two approaches for mapping and estimating totals; the spatial linear model (SLM) and k-NN (k-Nearest Neighbor) are compared, theoretically,...
K-nearest neighbor imputation of forest inventory variables in New Hampshire
Andrew Lister; Michael Hoppus; Raymond L. Czaplewski
2005-01-01
The k-nearest neighbor (kNN) method was used to map stand volume for a mosaic of 4 Landsat scenes covering the state of New Hampshire. Data for gross cubic foot volume and trees per acre were summarized from USDA Forest Service Forest Inventory and Analysis (FIA) plots and used as training for kNN. Six bands of...
Vehicle Classification Using an Imbalanced Dataset Based on a Single Magnetic Sensor.
Xu, Chang; Wang, Yingguan; Bao, Xinghe; Li, Fengrong
2018-05-24
This paper aims to improve the accuracy of automatic vehicle classifiers for imbalanced datasets. Classification is made through utilizing a single anisotropic magnetoresistive sensor, with the models of vehicles involved being classified into hatchbacks, sedans, buses, and multi-purpose vehicles (MPVs). Using time domain and frequency domain features in combination with three common classification algorithms in pattern recognition, we develop a novel feature extraction method for vehicle classification. These three common classification algorithms are the k-nearest neighbor, the support vector machine, and the back-propagation neural network. Nevertheless, a problem remains with the original vehicle magnetic dataset collected being imbalanced, and may lead to inaccurate classification results. With this in mind, we propose an approach called SMOTE, which can further boost the performance of classifiers. Experimental results show that the k-nearest neighbor (KNN) classifier with the SMOTE algorithm can reach a classification accuracy of 95.46%, thus minimizing the effect of the imbalance.
Exploitation of RF-DNA for Device Classification and Verification Using GRLVQI Processing
2012-12-01
5 FLD Fisher’s Linear Discriminant . . . . . . . . . . . . . . . . . . . 6 kNN K-Nearest Neighbor...Neighbor ( kNN ), Support Vector Machine (SVM), and simple cross-correlation techniques [40, 57, 82, 88, 94, 95]. The RF-DNA fingerprinting research in...Expansion and the Dis- crete Gabor Transform on a Non-Separable Lattice”. 2000 IEEE Int’l Conf on Acoustics, Speech , and Signal Processing (ICASSP00
Nearest neighbor-density-based clustering methods for large hyperspectral images
NASA Astrophysics Data System (ADS)
Cariou, Claude; Chehdi, Kacem
2017-10-01
We address the problem of hyperspectral image (HSI) pixel partitioning using nearest neighbor - density-based (NN-DB) clustering methods. NN-DB methods are able to cluster objects without specifying the number of clusters to be found. Within the NN-DB approach, we focus on deterministic methods, e.g. ModeSeek, knnClust, and GWENN (standing for Graph WatershEd using Nearest Neighbors). These methods only require the availability of a k-nearest neighbor (kNN) graph based on a given distance metric. Recently, a new DB clustering method, called Density Peak Clustering (DPC), has received much attention, and kNN versions of it have quickly followed and showed their efficiency. However, NN-DB methods still suffer from the difficulty of obtaining the kNN graph due to the quadratic complexity with respect to the number of pixels. This is why GWENN was embedded into a multiresolution (MR) scheme to bypass the computation of the full kNN graph over the image pixels. In this communication, we propose to extent the MR-GWENN scheme on three aspects. Firstly, similarly to knnClust, the original labeling rule of GWENN is modified to account for local density values, in addition to the labels of previously processed objects. Secondly, we set up a modified NN search procedure within the MR scheme, in order to stabilize of the number of clusters found from the coarsest to the finest spatial resolution. Finally, we show that these extensions can be easily adapted to the three other NN-DB methods (ModeSeek, knnClust, knnDPC) for pixel clustering in large HSIs. Experiments are conducted to compare the four NN-DB methods for pixel clustering in HSIs. We show that NN-DB methods can outperform a classical clustering method such as fuzzy c-means (FCM), in terms of classification accuracy, relevance of found clusters, and clustering speed. Finally, we demonstrate the feasibility and evaluate the performances of NN-DB methods on a very large image acquired by our AISA Eagle hyperspectral imaging sensor.
Pan, Yongke; Niu, Wenjia
2017-01-01
Semisupervised Discriminant Analysis (SDA) is a semisupervised dimensionality reduction algorithm, which can easily resolve the out-of-sample problem. Relative works usually focus on the geometric relationships of data points, which are not obvious, to enhance the performance of SDA. Different from these relative works, the regularized graph construction is researched here, which is important in the graph-based semisupervised learning methods. In this paper, we propose a novel graph for Semisupervised Discriminant Analysis, which is called combined low-rank and k-nearest neighbor (LRKNN) graph. In our LRKNN graph, we map the data to the LR feature space and then the kNN is adopted to satisfy the algorithmic requirements of SDA. Since the low-rank representation can capture the global structure and the k-nearest neighbor algorithm can maximally preserve the local geometrical structure of the data, the LRKNN graph can significantly improve the performance of SDA. Extensive experiments on several real-world databases show that the proposed LRKNN graph is an efficient graph constructor, which can largely outperform other commonly used baselines. PMID:28316616
An Examination of Diameter Density Prediction with k-NN and Airborne Lidar
Strunk, Jacob L.; Gould, Peter J.; Packalen, Petteri; ...
2017-11-16
While lidar-based forest inventory methods have been widely demonstrated, performances of methods to predict tree diameters with airborne lidar (lidar) are not well understood. One cause for this is that the performance metrics typically used in studies for prediction of diameters can be difficult to interpret, and may not support comparative inferences between sampling designs and study areas. To help with this problem we propose two indices and use them to evaluate a variety of lidar and k nearest neighbor (k-NN) strategies for prediction of tree diameter distributions. The indices are based on the coefficient of determination ( R 2),more » and root mean square deviation (RMSD). Both of the indices are highly interpretable, and the RMSD-based index facilitates comparisons with alternative (non-lidar) inventory strategies, and with projects in other regions. K-NN diameter distribution prediction strategies were examined using auxiliary lidar for 190 training plots distribute across the 800 km 2 Savannah River Site in South Carolina, USA. In conclusion, we evaluate the performance of k-NN with respect to distance metrics, number of neighbors, predictor sets, and response sets. K-NN and lidar explained 80% of variability in diameters, and Mahalanobis distance with k = 3 neighbors performed best according to a number of criteria.« less
An Examination of Diameter Density Prediction with k-NN and Airborne Lidar
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strunk, Jacob L.; Gould, Peter J.; Packalen, Petteri
While lidar-based forest inventory methods have been widely demonstrated, performances of methods to predict tree diameters with airborne lidar (lidar) are not well understood. One cause for this is that the performance metrics typically used in studies for prediction of diameters can be difficult to interpret, and may not support comparative inferences between sampling designs and study areas. To help with this problem we propose two indices and use them to evaluate a variety of lidar and k nearest neighbor (k-NN) strategies for prediction of tree diameter distributions. The indices are based on the coefficient of determination ( R 2),more » and root mean square deviation (RMSD). Both of the indices are highly interpretable, and the RMSD-based index facilitates comparisons with alternative (non-lidar) inventory strategies, and with projects in other regions. K-NN diameter distribution prediction strategies were examined using auxiliary lidar for 190 training plots distribute across the 800 km 2 Savannah River Site in South Carolina, USA. In conclusion, we evaluate the performance of k-NN with respect to distance metrics, number of neighbors, predictor sets, and response sets. K-NN and lidar explained 80% of variability in diameters, and Mahalanobis distance with k = 3 neighbors performed best according to a number of criteria.« less
NASA Astrophysics Data System (ADS)
Georganos, Stefanos; Grippa, Tais; Vanhuysse, Sabine; Lennert, Moritz; Shimoni, Michal; Wolff, Eléonore
2017-10-01
This study evaluates the impact of three Feature Selection (FS) algorithms in an Object Based Image Analysis (OBIA) framework for Very-High-Resolution (VHR) Land Use-Land Cover (LULC) classification. The three selected FS algorithms, Correlation Based Selection (CFS), Mean Decrease in Accuracy (MDA) and Random Forest (RF) based Recursive Feature Elimination (RFE), were tested on Support Vector Machine (SVM), K-Nearest Neighbor, and Random Forest (RF) classifiers. The results demonstrate that the accuracy of SVM and KNN classifiers are the most sensitive to FS. The RF appeared to be more robust to high dimensionality, although a significant increase in accuracy was found by using the RFE method. In terms of classification accuracy, SVM performed the best using FS, followed by RF and KNN. Finally, only a small number of features is needed to achieve the highest performance using each classifier. This study emphasizes the benefits of rigorous FS for maximizing performance, as well as for minimizing model complexity and interpretation.
Nearest Neighbor Algorithms for Pattern Classification
NASA Technical Reports Server (NTRS)
Barrios, J. O.
1972-01-01
A solution of the discrimination problem is considered by means of the minimum distance classifier, commonly referred to as the nearest neighbor (NN) rule. The NN rule is nonparametric, or distribution free, in the sense that it does not depend on any assumptions about the underlying statistics for its application. The k-NN rule is a procedure that assigns an observation vector z to a category F if most of the k nearby observations x sub i are elements of F. The condensed nearest neighbor (CNN) rule may be used to reduce the size of the training set required categorize The Bayes risk serves merely as a reference-the limit of excellence beyond which it is not possible to go. The NN rule is bounded below by the Bayes risk and above by twice the Bayes risk.
A Comparative Evaluation of Anomaly Detection Algorithms for Maritime Video Surveillance
2011-01-01
of k-means clustering and the k- NN Localized p-value Estimator ( KNN -LPE). K-means is a popular distance-based clustering algorithm while KNN -LPE...implemented the sparse cluster identification rule we described in Section 3.1. 2. k-NN Localized p-value Estimator ( KNN -LPE): We implemented this using...Average Density ( KNN -NAD): This was implemented as described in Section 3.4. Algorithm Parameter Settings The global and local density-based anomaly
NASA Astrophysics Data System (ADS)
Lazcano, R.; Madroñal, D.; Fabelo, H.; Ortega, S.; Salvador, R.; Callicó, G. M.; Juárez, E.; Sanz, C.
2017-10-01
Hyperspectral Imaging (HI) assembles high resolution spectral information from hundreds of narrow bands across the electromagnetic spectrum, thus generating 3D data cubes in which each pixel gathers the spectral information of the reflectance of every spatial pixel. As a result, each image is composed of large volumes of data, which turns its processing into a challenge, as performance requirements have been continuously tightened. For instance, new HI applications demand real-time responses. Hence, parallel processing becomes a necessity to achieve this requirement, so the intrinsic parallelism of the algorithms must be exploited. In this paper, a spatial-spectral classification approach has been implemented using a dataflow language known as RVCCAL. This language represents a system as a set of functional units, and its main advantage is that it simplifies the parallelization process by mapping the different blocks over different processing units. The spatial-spectral classification approach aims at refining the classification results previously obtained by using a K-Nearest Neighbors (KNN) filtering process, in which both the pixel spectral value and the spatial coordinates are considered. To do so, KNN needs two inputs: a one-band representation of the hyperspectral image and the classification results provided by a pixel-wise classifier. Thus, spatial-spectral classification algorithm is divided into three different stages: a Principal Component Analysis (PCA) algorithm for computing the one-band representation of the image, a Support Vector Machine (SVM) classifier, and the KNN-based filtering algorithm. The parallelization of these algorithms shows promising results in terms of computational time, as the mapping of them over different cores presents a speedup of 2.69x when using 3 cores. Consequently, experimental results demonstrate that real-time processing of hyperspectral images is achievable.
Geometric k-nearest neighbor estimation of entropy and mutual information
NASA Astrophysics Data System (ADS)
Lord, Warren M.; Sun, Jie; Bollt, Erik M.
2018-03-01
Nonparametric estimation of mutual information is used in a wide range of scientific problems to quantify dependence between variables. The k-nearest neighbor (knn) methods are consistent, and therefore expected to work well for a large sample size. These methods use geometrically regular local volume elements. This practice allows maximum localization of the volume elements, but can also induce a bias due to a poor description of the local geometry of the underlying probability measure. We introduce a new class of knn estimators that we call geometric knn estimators (g-knn), which use more complex local volume elements to better model the local geometry of the probability measures. As an example of this class of estimators, we develop a g-knn estimator of entropy and mutual information based on elliptical volume elements, capturing the local stretching and compression common to a wide range of dynamical system attractors. A series of numerical examples in which the thickness of the underlying distribution and the sample sizes are varied suggest that local geometry is a source of problems for knn methods such as the Kraskov-Stögbauer-Grassberger estimator when local geometric effects cannot be removed by global preprocessing of the data. The g-knn method performs well despite the manipulation of the local geometry. In addition, the examples suggest that the g-knn estimators can be of particular relevance to applications in which the system is large, but the data size is limited.
The distance function effect on k-nearest neighbor classification for medical datasets.
Hu, Li-Yu; Huang, Min-Wei; Ke, Shih-Wen; Tsai, Chih-Fong
2016-01-01
K-nearest neighbor (k-NN) classification is conventional non-parametric classifier, which has been used as the baseline classifier in many pattern classification problems. It is based on measuring the distances between the test data and each of the training data to decide the final classification output. Since the Euclidean distance function is the most widely used distance metric in k-NN, no study examines the classification performance of k-NN by different distance functions, especially for various medical domain problems. Therefore, the aim of this paper is to investigate whether the distance function can affect the k-NN performance over different medical datasets. Our experiments are based on three different types of medical datasets containing categorical, numerical, and mixed types of data and four different distance functions including Euclidean, cosine, Chi square, and Minkowsky are used during k-NN classification individually. The experimental results show that using the Chi square distance function is the best choice for the three different types of datasets. However, using the cosine and Euclidean (and Minkowsky) distance function perform the worst over the mixed type of datasets. In this paper, we demonstrate that the chosen distance function can affect the classification accuracy of the k-NN classifier. For the medical domain datasets including the categorical, numerical, and mixed types of data, K-NN based on the Chi square distance function performs the best.
The Use of Fuzzy Set Classification for Pattern Recognition of the Polygraph
1993-12-01
actual feature extraction was done, It was decided to use the K-nearest neighbor ( KNN ) the data was preprocessed. The electrocardiogram classifier in...showing heart pulse, and a low frequency not known beforehand, and the KNN classifier does not component showing blood volume. The derivative of...the characteristics of the conventional KNN these six derived signals were detrended and filtered, classification method is that it assigns each
NASA Astrophysics Data System (ADS)
He, Runnan; Wang, Kuanquan; Li, Qince; Yuan, Yongfeng; Zhao, Na; Liu, Yang; Zhang, Henggui
2017-12-01
Cardiovascular diseases are associated with high morbidity and mortality. However, it is still a challenge to diagnose them accurately and efficiently. Electrocardiogram (ECG), a bioelectrical signal of the heart, provides crucial information about the dynamical functions of the heart, playing an important role in cardiac diagnosis. As the QRS complex in ECG is associated with ventricular depolarization, therefore, accurate QRS detection is vital for interpreting ECG features. In this paper, we proposed a real-time, accurate, and effective algorithm for QRS detection. In the algorithm, a proposed preprocessor with a band-pass filter was first applied to remove baseline wander and power-line interference from the signal. After denoising, a method combining K-Nearest Neighbor (KNN) and Particle Swarm Optimization (PSO) was used for accurate QRS detection in ECGs with different morphologies. The proposed algorithm was tested and validated using 48 ECG records from MIT-BIH arrhythmia database (MITDB), achieved a high averaged detection accuracy, sensitivity and positive predictivity of 99.43, 99.69, and 99.72%, respectively, indicating a notable improvement to extant algorithms as reported in literatures.
Nearest neighbors by neighborhood counting.
Wang, Hui
2006-06-01
Finding nearest neighbors is a general idea that underlies many artificial intelligence tasks, including machine learning, data mining, natural language understanding, and information retrieval. This idea is explicitly used in the k-nearest neighbors algorithm (kNN), a popular classification method. In this paper, this idea is adopted in the development of a general methodology, neighborhood counting, for devising similarity functions. We turn our focus from neighbors to neighborhoods, a region in the data space covering the data point in question. To measure the similarity between two data points, we consider all neighborhoods that cover both data points. We propose to use the number of such neighborhoods as a measure of similarity. Neighborhood can be defined for different types of data in different ways. Here, we consider one definition of neighborhood for multivariate data and derive a formula for such similarity, called neighborhood counting measure or NCM. NCM was tested experimentally in the framework of kNN. Experiments show that NCM is generally comparable to VDM and its variants, the state-of-the-art distance functions for multivariate data, and, at the same time, is consistently better for relatively large k values. Additionally, NCM consistently outperforms HEOM (a mixture of Euclidean and Hamming distances), the "standard" and most widely used distance function for multivariate data. NCM has a computational complexity in the same order as the standard Euclidean distance function and NCM is task independent and works for numerical and categorical data in a conceptually uniform way. The neighborhood counting methodology is proven sound for multivariate data experimentally. We hope it will work for other types of data.
Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio
NASA Astrophysics Data System (ADS)
Nababan, A. A.; Sitompul, O. S.; Tulus
2018-04-01
K- Nearest Neighbor (KNN) is a good classifier, but from several studies, the result performance accuracy of KNN still lower than other methods. One of the causes of the low accuracy produced, because each attribute has the same effect on the classification process, while some less relevant characteristics lead to miss-classification of the class assignment for new data. In this research, we proposed Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio as a parameter to see the correlation between each attribute in the data and the Gain Ratio also will be used as the basis for weighting each attribute of the dataset. The accuracy of results is compared to the accuracy acquired from the original KNN method using 10-fold Cross-Validation with several datasets from the UCI Machine Learning repository and KEEL-Dataset Repository, such as abalone, glass identification, haberman, hayes-roth and water quality status. Based on the result of the test, the proposed method was able to increase the classification accuracy of KNN, where the highest difference of accuracy obtained hayes-roth dataset is worth 12.73%, and the lowest difference of accuracy obtained in the abalone dataset of 0.07%. The average result of the accuracy of all dataset increases the accuracy by 5.33%.
Emotional State Classification in Virtual Reality Using Wearable Electroencephalography
NASA Astrophysics Data System (ADS)
Suhaimi, N. S.; Teo, J.; Mountstephens, J.
2018-03-01
This paper presents the classification of emotions on EEG signals. One of the key issues in this research is the lack of mental classification using VR as the medium to stimulate emotion. The approach towards this research is by using K-nearest neighbor (KNN) and Support Vector Machine (SVM). Firstly, each of the participant will be required to wear the EEG headset and recording their brainwaves when they are immersed inside the VR. The data points are then marked if they showed any physical signs of emotion or by observing the brainwave pattern. Secondly, the data will then be tested and trained with KNN and SVM algorithms. The accuracy achieved from both methods were approximately 82% throughout the brainwave spectrum (α, β, γ, δ, θ). These methods showed promising results and will be further enhanced using other machine learning approaches in VR stimulus.
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds.
Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M; Bloom, Peter H; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd
2017-01-01
Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds
Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd
2017-01-01
Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data. PMID:28403159
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds
Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael J.; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd
2017-01-01
Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.
Webcam mouse using face and eye tracking in various illumination environments.
Lin, Yuan-Pin; Chao, Yi-Ping; Lin, Chung-Chih; Chen, Jyh-Horng
2005-01-01
Nowadays, due to enhancement of computer performance and popular usage of webcam devices, it has become possible to acquire users' gestures for the human-computer-interface with PC via webcam. However, the effects of illumination variation would dramatically decrease the stability and accuracy of skin-based face tracking system; especially for a notebook or portable platform. In this study we present an effective illumination recognition technique, combining K-Nearest Neighbor classifier and adaptive skin model, to realize the real-time tracking system. We have demonstrated that the accuracy of face detection based on the KNN classifier is higher than 92% in various illumination environments. In real-time implementation, the system successfully tracks user face and eyes features at 15 fps under standard notebook platforms. Although KNN classifier only initiates five environments at preliminary stage, the system permits users to define and add their favorite environments to KNN for computer access. Eventually, based on this efficient tracking algorithm, we have developed a "Webcam Mouse" system to control the PC cursor using face and eye tracking. Preliminary studies in "point and click" style PC web games also shows promising applications in consumer electronic markets in the future.
Song, Wen; Ade, Carl; Broxterman, Ryan; Barstow, Thomas; Nelson, Thomas; Warren, Steve
2012-01-01
Accelerometer data provide useful information about subject activity in many different application scenarios. For this study, single-accelerometer data were acquired from subjects participating in field tests that mimic tasks that astronauts might encounter in reduced gravity environments. The primary goal of this effort was to apply classification algorithms that could identify these tasks based on features present in their corresponding accelerometer data, where the end goal is to establish methods to unobtrusively gauge subject well-being based on sensors that reside in their local environment. In this initial analysis, six different activities that involve leg movement are classified. The k-Nearest Neighbors (kNN) algorithm was found to be the most effective, with an overall classification success rate of 90.8%.
Mapping growing stock volume and forest live biomass: a case study of the Polissya region of Ukraine
NASA Astrophysics Data System (ADS)
Bilous, Andrii; Myroniuk, Viktor; Holiaka, Dmytrii; Bilous, Svitlana; See, Linda; Schepaschenko, Dmitry
2017-10-01
Forest inventory and biomass mapping are important tasks that require inputs from multiple data sources. In this paper we implement two methods for the Ukrainian region of Polissya: random forest (RF) for tree species prediction and k-nearest neighbors (k-NN) for growing stock volume and biomass mapping. We examined the suitability of the five-band RapidEye satellite image to predict the distribution of six tree species. The accuracy of RF is quite high: ~99% for forest/non-forest mask and 89% for tree species prediction. Our results demonstrate that inclusion of elevation as a predictor variable in the RF model improved the performance of tree species classification. We evaluated different distance metrics for the k-NN method, including Euclidean or Mahalanobis distance, most similar neighbor (MSN), gradient nearest neighbor, and independent component analysis. The MSN with the four nearest neighbors (k = 4) is the most precise (according to the root-mean-square deviation) for predicting forest attributes across the study area. The k-NN method allowed us to estimate growing stock volume with an accuracy of 3 m3 ha-1 and for live biomass of about 2 t ha-1 over the study area.
Improving the accuracy of k-nearest neighbor using local mean based and distance weight
NASA Astrophysics Data System (ADS)
Syaliman, K. U.; Nababan, E. B.; Sitompul, O. S.
2018-03-01
In k-nearest neighbor (kNN), the determination of classes for new data is normally performed by a simple majority vote system, which may ignore the similarities among data, as well as allowing the occurrence of a double majority class that can lead to misclassification. In this research, we propose an approach to resolve the majority vote issues by calculating the distance weight using a combination of local mean based k-nearest neighbor (LMKNN) and distance weight k-nearest neighbor (DWKNN). The accuracy of results is compared to the accuracy acquired from the original k-NN method using several datasets from the UCI Machine Learning repository, Kaggle and Keel, such as ionosphare, iris, voice genre, lower back pain, and thyroid. In addition, the proposed method is also tested using real data from a public senior high school in city of Tualang, Indonesia. Results shows that the combination of LMKNN and DWKNN was able to increase the classification accuracy of kNN, whereby the average accuracy on test data is 2.45% with the highest increase in accuracy of 3.71% occurring on the lower back pain symptoms dataset. For the real data, the increase in accuracy is obtained as high as 5.16%.
An RFID Indoor Positioning Algorithm Based on Bayesian Probability and K-Nearest Neighbor.
Xu, He; Ding, Ye; Li, Peng; Wang, Ruchuan; Li, Yizhu
2017-08-05
The Global Positioning System (GPS) is widely used in outdoor environmental positioning. However, GPS cannot support indoor positioning because there is no signal for positioning in an indoor environment. Nowadays, there are many situations which require indoor positioning, such as searching for a book in a library, looking for luggage in an airport, emergence navigation for fire alarms, robot location, etc. Many technologies, such as ultrasonic, sensors, Bluetooth, WiFi, magnetic field, Radio Frequency Identification (RFID), etc., are used to perform indoor positioning. Compared with other technologies, RFID used in indoor positioning is more cost and energy efficient. The Traditional RFID indoor positioning algorithm LANDMARC utilizes a Received Signal Strength (RSS) indicator to track objects. However, the RSS value is easily affected by environmental noise and other interference. In this paper, our purpose is to reduce the location fluctuation and error caused by multipath and environmental interference in LANDMARC. We propose a novel indoor positioning algorithm based on Bayesian probability and K -Nearest Neighbor (BKNN). The experimental results show that the Gaussian filter can filter some abnormal RSS values. The proposed BKNN algorithm has the smallest location error compared with the Gaussian-based algorithm, LANDMARC and an improved KNN algorithm. The average error in location estimation is about 15 cm using our method.
NASA Astrophysics Data System (ADS)
Hu, Yifan; Han, Hao; Zhu, Wei; Li, Lihong; Pickhardt, Perry J.; Liang, Zhengrong
2016-03-01
Feature classification plays an important role in differentiation or computer-aided diagnosis (CADx) of suspicious lesions. As a widely used ensemble learning algorithm for classification, random forest (RF) has a distinguished performance for CADx. Our recent study has shown that the location index (LI), which is derived from the well-known kNN (k nearest neighbor) and wkNN (weighted k nearest neighbor) classifier [1], has also a distinguished role in the classification for CADx. Therefore, in this paper, based on the property that the LI will achieve a very high accuracy, we design an algorithm to integrate the LI into RF for improved or higher value of AUC (area under the curve of receiver operating characteristics -- ROC). Experiments were performed by the use of a database of 153 lesions (polyps), including 116 neoplastic lesions and 37 hyperplastic lesions, with comparison to the existing classifiers of RF and wkNN, respectively. A noticeable gain by the proposed integrated classifier was quantified by the AUC measure.
Distributed Computation of the knn Graph for Large High-Dimensional Point Sets
Plaku, Erion; Kavraki, Lydia E.
2009-01-01
High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate knn or range queries. Our experiments show nearly linear speedup with over one hundred processors and indicate that similar speedup can be obtained with several hundred processors. PMID:19847318
Kruppa, Jochen; Liu, Yufeng; Biau, Gérard; Kohler, Michael; König, Inke R; Malley, James D; Ziegler, Andreas
2014-07-01
Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A novel feature ranking algorithm for biometric recognition with PPG signals.
Reşit Kavsaoğlu, A; Polat, Kemal; Recep Bozkurt, M
2014-06-01
This study is intended for describing the application of the Photoplethysmography (PPG) signal and the time domain features acquired from its first and second derivatives for biometric identification. For this purpose, a sum of 40 features has been extracted and a feature-ranking algorithm is proposed. This proposed algorithm calculates the contribution of each feature to biometric recognition and collocates the features, the contribution of which is from great to small. While identifying the contribution of the features, the Euclidean distance and absolute distance formulas are used. The efficiency of the proposed algorithms is demonstrated by the results of the k-NN (k-nearest neighbor) classifier applications of the features. During application, each 15-period-PPG signal belonging to two different durations from each of the thirty healthy subjects were used with a PPG data acquisition card. The first PPG signals recorded from the subjects were evaluated as the 1st configuration; the PPG signals recorded later at a different time as the 2nd configuration and the combination of both were evaluated as the 3rd configuration. When the results were evaluated for the k-NN classifier model created along with the proposed algorithm, an identification of 90.44% for the 1st configuration, 94.44% for the 2nd configuration, and 87.22% for the 3rd configuration has successfully been attained. The obtained results showed that both the proposed algorithm and the biometric identification model based on this developed PPG signal are very promising for contactless recognizing the people with the proposed method. Copyright © 2014 Elsevier Ltd. All rights reserved.
A Novel Artificial Intelligence System for Endotracheal Intubation.
Carlson, Jestin N; Das, Samarjit; De la Torre, Fernando; Frisch, Adam; Guyette, Francis X; Hodgins, Jessica K; Yealy, Donald M
2016-01-01
Adequate visualization of the glottic opening is a key factor to successful endotracheal intubation (ETI); however, few objective tools exist to help guide providers' ETI attempts toward the glottic opening in real-time. Machine learning/artificial intelligence has helped to automate the detection of other visual structures but its utility with ETI is unknown. We sought to test the accuracy of various computer algorithms in identifying the glottic opening, creating a tool that could aid successful intubation. We collected a convenience sample of providers who each performed ETI 10 times on a mannequin using a video laryngoscope (C-MAC, Karl Storz Corp, Tuttlingen, Germany). We recorded each attempt and reviewed one-second time intervals for the presence or absence of the glottic opening. Four different machine learning/artificial intelligence algorithms analyzed each attempt and time point: k-nearest neighbor (KNN), support vector machine (SVM), decision trees, and neural networks (NN). We used half of the videos to train the algorithms and the second half to test the accuracy, sensitivity, and specificity of each algorithm. We enrolled seven providers, three Emergency Medicine attendings, and four paramedic students. From the 70 total recorded laryngoscopic video attempts, we created 2,465 time intervals. The algorithms had the following sensitivity and specificity for detecting the glottic opening: KNN (70%, 90%), SVM (70%, 90%), decision trees (68%, 80%), and NN (72%, 78%). Initial efforts at computer algorithms using artificial intelligence are able to identify the glottic opening with over 80% accuracy. With further refinements, video laryngoscopy has the potential to provide real-time, direction feedback to the provider to help guide successful ETI.
Rezaee, Kh.; Azizi, E.; Haddadnia, J.
2016-01-01
Background Epilepsy is a severe disorder of the central nervous system that predisposes the person to recurrent seizures. Fifty million people worldwide suffer from epilepsy; after Alzheimer’s and stroke, it is the third widespread nervous disorder. Objective In this paper, an algorithm to detect the onset of epileptic seizures based on the analysis of brain electrical signals (EEG) has been proposed. 844 hours of EEG were recorded form 23 pediatric patients consecutively with 163 occurrences of seizures. Signals had been collected from Children’s Hospital Boston with a sampling frequency of 256 Hz through 18 channels in order to assess epilepsy surgery. By selecting effective features from seizure and non-seizure signals of each individual and putting them into two categories, the proposed algorithm detects the onset of seizures quickly and with high sensitivity. Method In this algorithm, L-sec epochs of signals are displayed in form of a third-order tensor in spatial, spectral and temporal spaces by applying wavelet transform. Then, after applying general tensor discriminant analysis (GTDA) on tensors and calculating mapping matrix, feature vectors are extracted. GTDA increases the sensitivity of the algorithm by storing data without deleting them. Finally, K-Nearest neighbors (KNN) is used to classify the selected features. Results The results of simulating algorithm on algorithm standard dataset shows that the algorithm is capable of detecting 98 percent of seizures with an average delay of 4.7 seconds and the average error rate detection of three errors in 24 hours. Conclusion Today, the lack of an automated system to detect or predict the seizure onset is strongly felt. PMID:27672628
yaImpute: An R package for kNN imputation
Nicholas L. Crookston; Andrew O. Finley
2008-01-01
This article introduces yaImpute, an R package for nearest neighbor search and imputation. Although nearest neighbor imputation is used in a host of disciplines, the methods implemented in the yaImpute package are tailored to imputation-based forest attribute estimation and mapping. The impetus to writing the yaImpute is a growing interest in nearest neighbor...
Amaral, Jorge L M; Lopes, Agnaldo J; Jansen, José M; Faria, Alvaro C D; Melo, Pedro L
2013-12-01
The purpose of this study was to develop an automatic classifier to increase the accuracy of the forced oscillation technique (FOT) for diagnosing early respiratory abnormalities in smoking patients. The data consisted of FOT parameters obtained from 56 volunteers, 28 healthy and 28 smokers with low tobacco consumption. Many supervised learning techniques were investigated, including logistic linear classifiers, k nearest neighbor (KNN), neural networks and support vector machines (SVM). To evaluate performance, the ROC curve of the most accurate parameter was established as baseline. To determine the best input features and classifier parameters, we used genetic algorithms and a 10-fold cross-validation using the average area under the ROC curve (AUC). In the first experiment, the original FOT parameters were used as input. We observed a significant improvement in accuracy (KNN=0.89 and SVM=0.87) compared with the baseline (0.77). The second experiment performed a feature selection on the original FOT parameters. This selection did not cause any significant improvement in accuracy, but it was useful in identifying more adequate FOT parameters. In the third experiment, we performed a feature selection on the cross products of the FOT parameters. This selection resulted in a further increase in AUC (KNN=SVM=0.91), which allows for high diagnostic accuracy. In conclusion, machine learning classifiers can help identify early smoking-induced respiratory alterations. The use of FOT cross products and the search for the best features and classifier parameters can markedly improve the performance of machine learning classifiers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Le, Anh H.; Park, Young W.; Ma, Kevin; Jacobs, Colin; Liu, Brent J.
2010-03-01
Multiple Sclerosis (MS) is a progressive neurological disease affecting myelin pathways in the brain. Multiple lesions in the white matter can cause paralysis and severe motor disabilities of the affected patient. To solve the issue of inconsistency and user-dependency in manual lesion measurement of MRI, we have proposed a 3-D automated lesion quantification algorithm to enable objective and efficient lesion volume tracking. The computer-aided detection (CAD) of MS, written in MATLAB, utilizes K-Nearest Neighbors (KNN) method to compute the probability of lesions on a per-voxel basis. Despite the highly optimized algorithm of imaging processing that is used in CAD development, MS CAD integration and evaluation in clinical workflow is technically challenging due to the requirement of high computation rates and memory bandwidth in the recursive nature of the algorithm. In this paper, we present the development and evaluation of using a computing engine in the graphical processing unit (GPU) with MATLAB for segmentation of MS lesions. The paper investigates the utilization of a high-end GPU for parallel computing of KNN in the MATLAB environment to improve algorithm performance. The integration is accomplished using NVIDIA's CUDA developmental toolkit for MATLAB. The results of this study will validate the practicality and effectiveness of the prototype MS CAD in a clinical setting. The GPU method may allow MS CAD to rapidly integrate in an electronic patient record or any disease-centric health care system.
NASA Astrophysics Data System (ADS)
Nishizuka, N.; Sugiura, K.; Kubo, Y.; Den, M.; Watari, S.; Ishii, M.
2017-02-01
We developed a flare prediction model using machine learning, which is optimized to predict the maximum class of flares occurring in the following 24 hr. Machine learning is used to devise algorithms that can learn from and make decisions on a huge amount of data. We used solar observation data during the period 2010-2015, such as vector magnetograms, ultraviolet (UV) emission, and soft X-ray emission taken by the Solar Dynamics Observatory and the Geostationary Operational Environmental Satellite. We detected active regions (ARs) from the full-disk magnetogram, from which ˜60 features were extracted with their time differentials, including magnetic neutral lines, the current helicity, the UV brightening, and the flare history. After standardizing the feature database, we fully shuffled and randomly separated it into two for training and testing. To investigate which algorithm is best for flare prediction, we compared three machine-learning algorithms: the support vector machine, k-nearest neighbors (k-NN), and extremely randomized trees. The prediction score, the true skill statistic, was higher than 0.9 with a fully shuffled data set, which is higher than that for human forecasts. It was found that k-NN has the highest performance among the three algorithms. The ranking of the feature importance showed that previous flare activity is most effective, followed by the length of magnetic neutral lines, the unsigned magnetic flux, the area of UV brightening, and the time differentials of features over 24 hr, all of which are strongly correlated with the flux emergence dynamics in an AR.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishizuka, N.; Kubo, Y.; Den, M.
We developed a flare prediction model using machine learning, which is optimized to predict the maximum class of flares occurring in the following 24 hr. Machine learning is used to devise algorithms that can learn from and make decisions on a huge amount of data. We used solar observation data during the period 2010–2015, such as vector magnetograms, ultraviolet (UV) emission, and soft X-ray emission taken by the Solar Dynamics Observatory and the Geostationary Operational Environmental Satellite . We detected active regions (ARs) from the full-disk magnetogram, from which ∼60 features were extracted with their time differentials, including magnetic neutralmore » lines, the current helicity, the UV brightening, and the flare history. After standardizing the feature database, we fully shuffled and randomly separated it into two for training and testing. To investigate which algorithm is best for flare prediction, we compared three machine-learning algorithms: the support vector machine, k-nearest neighbors (k-NN), and extremely randomized trees. The prediction score, the true skill statistic, was higher than 0.9 with a fully shuffled data set, which is higher than that for human forecasts. It was found that k-NN has the highest performance among the three algorithms. The ranking of the feature importance showed that previous flare activity is most effective, followed by the length of magnetic neutral lines, the unsigned magnetic flux, the area of UV brightening, and the time differentials of features over 24 hr, all of which are strongly correlated with the flux emergence dynamics in an AR.« less
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification
1999-05-17
Experimental Results In this section, we compare kNN -mut which uses the weight vector obtained using mutual information as the fi- nal weight vector and...WAKNN against kNN , C4.5 [Qui93], RIPPER [Coh95], PEBLS [CS93], Rainbow [McC96], VSM [Low95] on several synthetic and real data sets. VSM is another k...obtained without this option. 3 C4.5 RIPPER PEBLS Rainbow kNN WAKNN Syn-1 100.0 100.0 100.0 100.0 77.3 100.0 Syn-2 67.5 69.5 62.0 50.0 66.0 68.8 Syn
Activity Recognition in Egocentric video using SVM, kNN and Combined SVMkNN Classifiers
NASA Astrophysics Data System (ADS)
Sanal Kumar, K. P.; Bhavani, R., Dr.
2017-08-01
Egocentric vision is a unique perspective in computer vision which is human centric. The recognition of egocentric actions is a challenging task which helps in assisting elderly people, disabled patients and so on. In this work, life logging activity videos are taken as input. There are 2 categories, first one is the top level and second one is second level. Here, the recognition is done using the features like Histogram of Oriented Gradients (HOG), Motion Boundary Histogram (MBH) and Trajectory. The features are fused together and it acts as a single feature. The extracted features are reduced using Principal Component Analysis (PCA). The features that are reduced are provided as input to the classifiers like Support Vector Machine (SVM), k nearest neighbor (kNN) and combined Support Vector Machine (SVM) and k Nearest Neighbor (kNN) (combined SVMkNN). These classifiers are evaluated and the combined SVMkNN provided better results than other classifiers in the literature.
Frog sound identification using extended k-nearest neighbor classifier
NASA Astrophysics Data System (ADS)
Mukahar, Nordiana; Affendi Rosdi, Bakhtiar; Athiar Ramli, Dzati; Jaafar, Haryati
2017-09-01
Frog sound identification based on the vocalization becomes important for biological research and environmental monitoring. As a result, different types of feature extractions and classifiers have been employed to evaluate the accuracy of frog sound identification. This paper presents a frog sound identification with Extended k-Nearest Neighbor (EKNN) classifier. The EKNN classifier integrates the nearest neighbors and mutual sharing of neighborhood concepts, with the aims of improving the classification performance. It makes a prediction based on who are the nearest neighbors of the testing sample and who consider the testing sample as their nearest neighbors. In order to evaluate the classification performance in frog sound identification, the EKNN classifier is compared with competing classifier, k -Nearest Neighbor (KNN), Fuzzy k -Nearest Neighbor (FKNN) k - General Nearest Neighbor (KGNN)and Mutual k -Nearest Neighbor (MKNN) on the recorded sounds of 15 frog species obtained in Malaysia forest. The recorded sounds have been segmented using Short Time Energy and Short Time Average Zero Crossing Rate (STE+STAZCR), sinusoidal modeling (SM), manual and the combination of Energy (E) and Zero Crossing Rate (ZCR) (E+ZCR) while the features are extracted by Mel Frequency Cepstrum Coefficient (MFCC). The experimental results have shown that the EKNCN classifier exhibits the best performance in terms of accuracy compared to the competing classifiers, KNN, FKNN, GKNN and MKNN for all cases.
2007-02-28
Shah, D. Waagen, H. Schmitt, S. Bellofiore, A. Spanias, and D. Cochran, 32nd International Conference on Acoustics, Speech , and Signal Processing...Information Exploitation Office kNN k-Nearest Neighbor LEAN Laplacian Eigenmap Adaptive Neighbor LIP Linear Integer Programming ISP
Physical Human Activity Recognition Using Wearable Sensors.
Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine
2015-12-11
This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors' placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject.
Physical Human Activity Recognition Using Wearable Sensors
Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine
2015-01-01
This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors’ placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject. PMID:26690450
Reija Haapanen; Kimmo Lehtinen; Jukka Miettinen; Marvin E. Bauer; Alan R. Ek
2002-01-01
The k-nearest neighbor (k-NN) method has been undergoing development and testing for applications with USDA Forest Service Forest Inventory and Analysis (FIA) data in Minnesota since 1997. Research began using the 1987-1990 FIA inventory of the state, the then standard 10-point cluster plots, and Landsat TM imagery. In the past year, research has moved to examine...
Performance-driven Multimodality Sensor Fusion
2012-01-23
in IEEE Intl Conf. on Acoust., Speech , Signal Processing, (Dallas), Mar. 2010. [10] K. Sricharan, R. Raich, and A. Hero III, “Boundary compensated knn ...nearest neighbor ( kNN ) plug-in estima- tors, we have developed a generally applicable theory that gives analytical closed-form expressions for asymptotic...Co-PI’s Raich and Hero and was published in the IEEE Proc. of 2011 Intl Conf. on Acoustics, Speech , and Signal Processing. 2.4 Dimension estimation in
Rapid lard identification with portable electronic nose
NASA Astrophysics Data System (ADS)
Latief, Marsad; Khorsidtalab, Aida; Saputra, Irwan; Akmeliawati, Rini; Nurashikin, Anis; Jaswir, Irwandi; Witjaksono, Gunawan
2017-11-01
Human sensory systems are limited in many different regards, yet they are great sources of inspiration for development of technologies that help humans to overcome their restraints. This paper signifies the capability of our developed electronic nose in rapid lard identification. The developed device, known as E-Nose, mimics human’s olfactory system’s technique to identify a particular substance. Lard is a common pig derivative which is often used as a food additive, emulsion or shortening. It’s also commonly used as an adulterant or as an alternative for cooking oils, margarine and butter. This substance is prohibited to be consumed by Muslims and Orthodox Jews for religious reasons. A portable reliable device with an ability to identify lard rapidly can be convenient to users concerned about lard adulteration. The prototype was examined using K-Nearest Neighbors algorithm (KNN), Support Vector Machine (SVM), Bagged Trees and Simple Tree, and can identify lard with the highest accuracy of 95.6% among three types of fat (lard, chicken and beef) in liquid form over a certain range of temperature using KNN.
Unsupervised Indoor Localization Based on Smartphone Sensors, iBeacon and Wi-Fi.
Chen, Jing; Zhang, Yi; Xue, Wei
2018-04-28
In this paper, we propose UILoc, an unsupervised indoor localization scheme that uses a combination of smartphone sensors, iBeacons and Wi-Fi fingerprints for reliable and accurate indoor localization with zero labor cost. Firstly, compared with the fingerprint-based method, the UILoc system can build a fingerprint database automatically without any site survey and the database will be applied in the fingerprint localization algorithm. Secondly, since the initial position is vital to the system, UILoc will provide the basic location estimation through the pedestrian dead reckoning (PDR) method. To provide accurate initial localization, this paper proposes an initial localization module, a weighted fusion algorithm combined with a k-nearest neighbors (KNN) algorithm and a least squares algorithm. In UILoc, we have also designed a reliable model to reduce the landmark correction error. Experimental results show that the UILoc can provide accurate positioning, the average localization error is about 1.1 m in the steady state, and the maximum error is 2.77 m.
Multi-site Stochastic Simulation of Daily Streamflow with Markov Chain and KNN Algorithm
NASA Astrophysics Data System (ADS)
Mathai, J.; Mujumdar, P.
2017-12-01
A key focus of this study is to develop a method which is physically consistent with the hydrologic processes that can capture short-term characteristics of daily hydrograph as well as the correlation of streamflow in temporal and spatial domains. In complex water resource systems, flow fluctuations at small time intervals require that discretisation be done at small time scales such as daily scales. Also, simultaneous generation of synthetic flows at different sites in the same basin are required. We propose a method to equip water managers with a streamflow generator within a stochastic streamflow simulation framework. The motivation for the proposed method is to generate sequences that extend beyond the variability represented in the historical record of streamflow time series. The method has two steps: In step 1, daily flow is generated independently at each station by a two-state Markov chain, with rising limb increments randomly sampled from a Gamma distribution and the falling limb modelled as exponential recession and in step 2, the streamflow generated in step 1 is input to a nonparametric K-nearest neighbor (KNN) time series bootstrap resampler. The KNN model, being data driven, does not require assumptions on the dependence structure of the time series. A major limitation of KNN based streamflow generators is that they do not produce new values, but merely reshuffle the historical data to generate realistic streamflow sequences. However, daily flow generated using the Markov chain approach is capable of generating a rich variety of streamflow sequences. Furthermore, the rising and falling limbs of daily hydrograph represent different physical processes, and hence they need to be modelled individually. Thus, our method combines the strengths of the two approaches. We show the utility of the method and improvement over the traditional KNN by simulating daily streamflow sequences at 7 locations in the Godavari River basin in India.
Short-term Power Load Forecasting Based on Balanced KNN
NASA Astrophysics Data System (ADS)
Lv, Xianlong; Cheng, Xingong; YanShuang; Tang, Yan-mei
2018-03-01
To improve the accuracy of load forecasting, a short-term load forecasting model based on balanced KNN algorithm is proposed; According to the load characteristics, the historical data of massive power load are divided into scenes by the K-means algorithm; In view of unbalanced load scenes, the balanced KNN algorithm is proposed to classify the scene accurately; The local weighted linear regression algorithm is used to fitting and predict the load; Adopting the Apache Hadoop programming framework of cloud computing, the proposed algorithm model is parallelized and improved to enhance its ability of dealing with massive and high-dimension data. The analysis of the household electricity consumption data for a residential district is done by 23-nodes cloud computing cluster, and experimental results show that the load forecasting accuracy and execution time by the proposed model are the better than those of traditional forecasting algorithm.
Balouchestani, Mohammadreza; Krishnan, Sridhar
2014-01-01
Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.
Detecting the Difficulty Level of Foreign Language Texts
2010-02-01
continuous tenses), as well as part- of- speech labels for words. The authors used a k-Nearest Neighbor ( kNN ) classifier (Cover and Hart, 1967; Mitchell, 1997...anticipate, and influence these situations and to operate in them is found in foreign language speech and text. For this reason, military linguists are...the language model system, LGR is the prediction of one of the grammar-based classifiers, and CkNN is a confidence value of the kNN prediction for the
Ronald E. McRoberts; Erkki O. Tomppo; Andrew O. Finley; Heikkinen Juha
2007-01-01
The k-Nearest Neighbor (k-NN) technique has become extremely popular for a variety of forest inventory mapping and estimation applications. Much of this popularity may be attributed to the non-parametric, multivariate features of the technique, its intuitiveness, and its ease of use. When used with satellite imagery and forest...
NASA Astrophysics Data System (ADS)
Wigdahl, J.; Agurto, C.; Murray, V.; Barriga, S.; Soliz, P.
2013-03-01
Diabetic retinopathy (DR) affects more than 4.4 million Americans age 40 and over. Automatic screening for DR has shown to be an efficient and cost-effective way to lower the burden on the healthcare system, by triaging diabetic patients and ensuring timely care for those presenting with DR. Several supervised algorithms have been developed to detect pathologies related to DR, but little work has been done in determining the size of the training set that optimizes an algorithm's performance. In this paper we analyze the effect of the training sample size on the performance of a top-down DR screening algorithm for different types of statistical classifiers. Results are based on partial least squares (PLS), support vector machines (SVM), k-nearest neighbor (kNN), and Naïve Bayes classifiers. Our dataset consisted of digital retinal images collected from a total of 745 cases (595 controls, 150 with DR). We varied the number of normal controls in the training set, while keeping the number of DR samples constant, and repeated the procedure 10 times using randomized training sets to avoid bias. Results show increasing performance in terms of area under the ROC curve (AUC) when the number of DR subjects in the training set increased, with similar trends for each of the classifiers. Of these, PLS and k-NN had the highest average AUC. Lower standard deviation and a flattening of the AUC curve gives evidence that there is a limit to the learning ability of the classifiers and an optimal number of cases to train on.
NASA Astrophysics Data System (ADS)
Chen, Dan; Guo, Lin-yuan; Wang, Chen-hao; Ke, Xi-zheng
2017-07-01
Equalization can compensate channel distortion caused by channel multipath effects, and effectively improve convergent of modulation constellation diagram in optical wireless system. In this paper, the subspace blind equalization algorithm is used to preprocess M-ary phase shift keying (MPSK) subcarrier modulation signal in receiver. Mountain clustering is adopted to get the clustering centers of MPSK modulation constellation diagram, and the modulation order is automatically identified through the k-nearest neighbor (KNN) classifier. The experiment has been done under four different weather conditions. Experimental results show that the convergent of constellation diagram is improved effectively after using the subspace blind equalization algorithm, which means that the accuracy of modulation recognition is increased. The correct recognition rate of 16PSK can be up to 85% in any kind of weather condition which is mentioned in paper. Meanwhile, the correct recognition rate is the highest in cloudy and the lowest in heavy rain condition.
Categorizing document by fuzzy C-Means and K-nearest neighbors approach
NASA Astrophysics Data System (ADS)
Priandini, Novita; Zaman, Badrus; Purwanti, Endah
2017-08-01
Increasing of technology had made categorizing documents become important. It caused by increasing of number of documents itself. Managing some documents by categorizing is one of Information Retrieval application, because it involve text mining on its process. Whereas, categorization technique could be done both Fuzzy C-Means (FCM) and K-Nearest Neighbors (KNN) method. This experiment would consolidate both methods. The aim of the experiment is increasing performance of document categorize. First, FCM is in order to clustering training documents. Second, KNN is in order to categorize testing document until the output of categorization is shown. Result of the experiment is 14 testing documents retrieve relevantly to its category. Meanwhile 6 of 20 testing documents retrieve irrelevant to its category. Result of system evaluation shows that both precision and recall are 0,7.
A Fast Robot Identification and Mapping Algorithm Based on Kinect Sensor.
Zhang, Liang; Shen, Peiyi; Zhu, Guangming; Wei, Wei; Song, Houbing
2015-08-14
Internet of Things (IoT) is driving innovation in an ever-growing set of application domains such as intelligent processing for autonomous robots. For an autonomous robot, one grand challenge is how to sense its surrounding environment effectively. The Simultaneous Localization and Mapping with RGB-D Kinect camera sensor on robot, called RGB-D SLAM, has been developed for this purpose but some technical challenges must be addressed. Firstly, the efficiency of the algorithm cannot satisfy real-time requirements; secondly, the accuracy of the algorithm is unacceptable. In order to address these challenges, this paper proposes a set of novel improvement methods as follows. Firstly, the ORiented Brief (ORB) method is used in feature detection and descriptor extraction. Secondly, a bidirectional Fast Library for Approximate Nearest Neighbors (FLANN) k-Nearest Neighbor (KNN) algorithm is applied to feature match. Then, the improved RANdom SAmple Consensus (RANSAC) estimation method is adopted in the motion transformation. In the meantime, high precision General Iterative Closest Points (GICP) is utilized to register a point cloud in the motion transformation optimization. To improve the accuracy of SLAM, the reduced dynamic covariance scaling (DCS) algorithm is formulated as a global optimization problem under the G2O framework. The effectiveness of the improved algorithm has been verified by testing on standard data and comparing with the ground truth obtained on Freiburg University's datasets. The Dr Robot X80 equipped with a Kinect camera is also applied in a building corridor to verify the correctness of the improved RGB-D SLAM algorithm. With the above experiments, it can be seen that the proposed algorithm achieves higher processing speed and better accuracy.
Probabilistic brain tissue segmentation in neonatal magnetic resonance imaging.
Anbeek, Petronella; Vincken, Koen L; Groenendaal, Floris; Koeman, Annemieke; van Osch, Matthias J P; van der Grond, Jeroen
2008-02-01
A fully automated method has been developed for segmentation of four different structures in the neonatal brain: white matter (WM), central gray matter (CEGM), cortical gray matter (COGM), and cerebrospinal fluid (CSF). The segmentation algorithm is based on information from T2-weighted (T2-w) and inversion recovery (IR) scans. The method uses a K nearest neighbor (KNN) classification technique with features derived from spatial information and voxel intensities. Probabilistic segmentations of each tissue type were generated. By applying thresholds on these probability maps, binary segmentations were obtained. These final segmentations were evaluated by comparison with a gold standard. The sensitivity, specificity, and Dice similarity index (SI) were calculated for quantitative validation of the results. High sensitivity and specificity with respect to the gold standard were reached: sensitivity >0.82 and specificity >0.9 for all tissue types. Tissue volumes were calculated from the binary and probabilistic segmentations. The probabilistic segmentation volumes of all tissue types accurately estimated the gold standard volumes. The KNN approach offers valuable ways for neonatal brain segmentation. The probabilistic outcomes provide a useful tool for accurate volume measurements. The described method is based on routine diagnostic magnetic resonance imaging (MRI) and is suitable for large population studies.
Automated Identification of Abnormal Adult EEGs
López, S.; Suarez, G.; Jungreis, D.; Obeid, I.; Picone, J.
2016-01-01
The interpretation of electroencephalograms (EEGs) is a process that is still dependent on the subjective analysis of the examiners. Though interrater agreement on critical events such as seizures is high, it is much lower on subtler events (e.g., when there are benign variants). The process used by an expert to interpret an EEG is quite subjective and hard to replicate by machine. The performance of machine learning technology is far from human performance. We have been developing an interpretation system, AutoEEG, with a goal of exceeding human performance on this task. In this work, we are focusing on one of the early decisions made in this process – whether an EEG is normal or abnormal. We explore two baseline classification algorithms: k-Nearest Neighbor (kNN) and Random Forest Ensemble Learning (RF). A subset of the TUH EEG Corpus was used to evaluate performance. Principal Components Analysis (PCA) was used to reduce the dimensionality of the data. kNN achieved a 41.8% detection error rate while RF achieved an error rate of 31.7%. These error rates are significantly lower than those obtained by random guessing based on priors (49.5%). The majority of the errors were related to misclassification of normal EEGs. PMID:27195311
1980-12-05
classification procedures that are common in speech processing. The anesthesia level classification by EEG time series population screening problem example is in...formance. The use of the KL number type metric in NN rule classification, in a delete-one subj ect ’s EE-at-a-time KL-NN and KL- kNN classification of the...17 individual labeled EEG sample population using KL-NN and KL- kNN rules. The results obtained are shown in Table 1. The entries in the table indicate
Hatamikia, Sepideh; Maghooli, Keivan; Nasrabadi, Ali Motie
2014-01-01
Electroencephalogram (EEG) is one of the useful biological signals to distinguish different brain diseases and mental states. In recent years, detecting different emotional states from biological signals has been merged more attention by researchers and several feature extraction methods and classifiers are suggested to recognize emotions from EEG signals. In this research, we introduce an emotion recognition system using autoregressive (AR) model, sequential forward feature selection (SFS) and K-nearest neighbor (KNN) classifier using EEG signals during emotional audio-visual inductions. The main purpose of this paper is to investigate the performance of AR features in the classification of emotional states. To achieve this goal, a distinguished AR method (Burg's method) based on Levinson-Durbin's recursive algorithm is used and AR coefficients are extracted as feature vectors. In the next step, two different feature selection methods based on SFS algorithm and Davies–Bouldin index are used in order to decrease the complexity of computing and redundancy of features; then, three different classifiers include KNN, quadratic discriminant analysis and linear discriminant analysis are used to discriminate two and three different classes of valence and arousal levels. The proposed method is evaluated with EEG signals of available database for emotion analysis using physiological signals, which are recorded from 32 participants during 40 1 min audio visual inductions. According to the results, AR features are efficient to recognize emotional states from EEG signals, and KNN performs better than two other classifiers in discriminating of both two and three valence/arousal classes. The results also show that SFS method improves accuracies by almost 10-15% as compared to Davies–Bouldin based feature selection. The best accuracies are %72.33 and %74.20 for two classes of valence and arousal and %61.10 and %65.16 for three classes, respectively. PMID:25298928
Majid, Abdul; Ali, Safdar; Iqbal, Mubashar; Kausar, Nabeela
2014-03-01
This study proposes a novel prediction approach for human breast and colon cancers using different feature spaces. The proposed scheme consists of two stages: the preprocessor and the predictor. In the preprocessor stage, the mega-trend diffusion (MTD) technique is employed to increase the samples of the minority class, thereby balancing the dataset. In the predictor stage, machine-learning approaches of K-nearest neighbor (KNN) and support vector machines (SVM) are used to develop hybrid MTD-SVM and MTD-KNN prediction models. MTD-SVM model has provided the best values of accuracy, G-mean and Matthew's correlation coefficient of 96.71%, 96.70% and 71.98% for cancer/non-cancer dataset, breast/non-breast cancer dataset and colon/non-colon cancer dataset, respectively. We found that hybrid MTD-SVM is the best with respect to prediction performance and computational cost. MTD-KNN model has achieved moderately better prediction as compared to hybrid MTD-NB (Naïve Bayes) but at the expense of higher computing cost. MTD-KNN model is faster than MTD-RF (random forest) but its prediction is not better than MTD-RF. To the best of our knowledge, the reported results are the best results, so far, for these datasets. The proposed scheme indicates that the developed models can be used as a tool for the prediction of cancer. This scheme may be useful for study of any sequential information such as protein sequence or any nucleic acid sequence. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu Xiaoying; Ho, Shirley; Trac, Hy
We investigate machine learning (ML) techniques for predicting the number of galaxies (N{sub gal}) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N{sub gal}. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: supportmore » vector machines (SVM) and k-nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N{sub gal} by training our algorithms on the following six halo properties: number of particles, M{sub 200}, {sigma}{sub v}, v{sub max}, half-mass radius, and spin. For Millennium, our predicted N{sub gal} values have a mean-squared error (MSE) of {approx}0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to {approx}5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N{sub gal}. Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M{sub star}, low M{sub star}). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.« less
On Algorithms for Generating Computationally Simple Piecewise Linear Classifiers
1989-05-01
suffers. - Waveform classification, e.g. speech recognition, seismic analysis (i.e. discrimination between earthquakes and nuclear explosions), target...assuming Gaussian distributions (B-G) d) Bayes classifier with probability densities estimated with the k-N-N method (B- kNN ) e) The -arest neighbour...range of classifiers are chosen including a fast, easy computable and often used classifier (B-G), reliable and complex classifiers (B- kNN and NNR
NASA Astrophysics Data System (ADS)
Moldovanu, Simona; Bibicu, Dorin; Moraru, Luminita; Nicolae, Mariana Carmen
2011-12-01
Co-occurrence matrix has been applied successfully for echographic images characterization because it contains information about spatial distribution of grey-scale levels in an image. The paper deals with the analysis of pixels in selected regions of interest of an US image of the liver. The useful information obtained refers to texture features such as entropy, contrast, dissimilarity and correlation extract with co-occurrence matrix. The analyzed US images were grouped in two distinct sets: healthy liver and steatosis (or fatty) liver. These two sets of echographic images of the liver build a database that includes only histological confirmed cases: 10 images of healthy liver and 10 images of steatosis liver. The healthy subjects help to compute four textural indices and as well as control dataset. We chose to study these diseases because the steatosis is the abnormal retention of lipids in cells. The texture features are statistical measures and they can be used to characterize irregularity of tissues. The goal is to extract the information using the Nearest Neighbor classification algorithm. The K-NN algorithm is a powerful tool to classify features textures by means of grouping in a training set using healthy liver, on the one hand, and in a holdout set using the features textures of steatosis liver, on the other hand. The results could be used to quantify the texture information and will allow a clear detection between health and steatosis liver.
Floor Identification with Commercial Smartphones in Wifi-Based Indoor Localization System
NASA Astrophysics Data System (ADS)
Ai, H. J.; Liu, M. Y.; Shi, Y. M.; Zhao, J. Q.
2016-06-01
In this paper, we utilize novel sensors built-in commercial smart devices to propose a schema which can identify floors with high accuracy and efficiency. This schema can be divided into two modules: floor identifying and floor change detection. Floor identifying module starts at initial phase of positioning, and responsible for determining which floor the positioning start. We have estimated two methods to identify initial floor based on K-Nearest Neighbors (KNN) and BP Neural Network, respectively. In order to improve performance of KNN algorithm, we proposed a novel method based on weighting signal strength, which can identify floors robust and quickly. Floor change detection module turns on after entering into continues positioning procedure. In this module, sensors (such as accelerometer and barometer) of smart devices are used to determine whether the user is going up and down stairs or taking an elevator. This method has fused different kinds of sensor data and can adapt various motion pattern of users. We conduct our experiment with mobile client on Android Phone (Nexus 5) at a four-floors building with an open area between the second and third floor. The results demonstrate that our scheme can achieve an accuracy of 99% to identify floor and 97% to detecting floor changes as a whole.
NASA Astrophysics Data System (ADS)
Morizet, N.; Godin, N.; Tang, J.; Maillet, E.; Fregonese, M.; Normand, B.
2016-03-01
This paper aims to propose a novel approach to classify acoustic emission (AE) signals deriving from corrosion experiments, even if embedded into a noisy environment. To validate this new methodology, synthetic data are first used throughout an in-depth analysis, comparing Random Forests (RF) to the k-Nearest Neighbor (k-NN) algorithm. Moreover, a new evaluation tool called the alter-class matrix (ACM) is introduced to simulate different degrees of uncertainty on labeled data for supervised classification. Then, tests on real cases involving noise and crevice corrosion are conducted, by preprocessing the waveforms including wavelet denoising and extracting a rich set of features as input of the RF algorithm. To this end, a software called RF-CAM has been developed. Results show that this approach is very efficient on ground truth data and is also very promising on real data, especially for its reliability, performance and speed, which are serious criteria for the chemical industry.
Biswas, Surama; Dutta, Subarna; Acharyya, Sriyankar
2017-12-01
Identifying a small subset of disease critical genes out of a large size of microarray gene expression data is a challenge in computational life sciences. This paper has applied four meta-heuristic algorithms, namely, honey bee mating optimization (HBMO), harmony search (HS), differential evolution (DE) and genetic algorithm (basic version GA) to find disease critical genes of preeclampsia which affects women during gestation. Two hybrid algorithms, namely, HBMO-kNN and HS-kNN have been newly proposed here where kNN (k nearest neighbor classifier) is used for sample classification. Performances of these new approaches have been compared with other two hybrid algorithms, namely, DE-kNN and SGA-kNN. Three datasets of different sizes have been used. In a dataset, the set of genes found common in the output of each algorithm is considered here as disease critical genes. In different datasets, the percentage of classification or classification accuracy of meta-heuristic algorithms varied between 92.46 and 100%. HBMO-kNN has the best performance (99.64-100%) in almost all data sets. DE-kNN secures the second position (99.42-100%). Disease critical genes obtained here match with clinically revealed preeclampsia genes to a large extent.
Identification of Anisomerous Motor Imagery EEG Signals Based on Complex Algorithms
Zhang, Zhiwen; Duan, Feng; Zhou, Xin; Meng, Zixuan
2017-01-01
Motor imagery (MI) electroencephalograph (EEG) signals are widely applied in brain-computer interface (BCI). However, classified MI states are limited, and their classification accuracy rates are low because of the characteristics of nonlinearity and nonstationarity. This study proposes a novel MI pattern recognition system that is based on complex algorithms for classifying MI EEG signals. In electrooculogram (EOG) artifact preprocessing, band-pass filtering is performed to obtain the frequency band of MI-related signals, and then, canonical correlation analysis (CCA) combined with wavelet threshold denoising (WTD) is used for EOG artifact preprocessing. We propose a regularized common spatial pattern (R-CSP) algorithm for EEG feature extraction by incorporating the principle of generic learning. A new classifier combining the K-nearest neighbor (KNN) and support vector machine (SVM) approaches is used to classify four anisomerous states, namely, imaginary movements with the left hand, right foot, and right shoulder and the resting state. The highest classification accuracy rate is 92.5%, and the average classification accuracy rate is 87%. The proposed complex algorithm identification method can significantly improve the identification rate of the minority samples and the overall classification performance. PMID:28874909
Algorithms for Autonomous Plume Detection on Outer Planet Satellites
NASA Astrophysics Data System (ADS)
Lin, Y.; Bunte, M. K.; Saripalli, S.; Greeley, R.
2011-12-01
We investigate techniques for automated detection of geophysical events (i.e., volcanic plumes) from spacecraft images. The algorithms presented here have not been previously applied to detection of transient events on outer planet satellites. We apply Scale Invariant Feature Transform (SIFT) to raw images of Io and Enceladus from the Voyager, Galileo, Cassini, and New Horizons missions. SIFT produces distinct interest points in every image; feature descriptors are reasonably invariant to changes in illumination, image noise, rotation, scaling, and small changes in viewpoint. We classified these descriptors as plumes using the k-nearest neighbor (KNN) algorithm. In KNN, an object is classified by its similarity to examples in a training set of images based on user defined thresholds. Using the complete database of Io images and a selection of Enceladus images where 1-3 plumes were manually detected in each image, we successfully detected 74% of plumes in Galileo and New Horizons images, 95% in Voyager images, and 93% in Cassini images. Preliminary tests yielded some false positive detections; further iterations will improve performance. In images where detections fail, plumes are less than 9 pixels in size or are lost in image glare. We compared the appearance of plumes and illuminated mountain slopes to determine the potential for feature classification. We successfully differentiated features. An advantage over other methods is the ability to detect plumes in non-limb views where they appear in the shadowed part of the surface; improvements will enable detection against the illuminated background surface where gradient changes would otherwise preclude detection. This detection method has potential applications to future outer planet missions for sustained plume monitoring campaigns and onboard automated prioritization of all spacecraft data. The complementary nature of this method is such that it could be used in conjunction with edge detection algorithms to increase effectiveness. We have demonstrated an ability to detect transient events above the planetary limb and on the surface and to distinguish feature classes in spacecraft images.
Mei, Wenjuan; Zeng, Xianping; Yang, Chenglin; Zhou, Xiuyun
2017-01-01
The insulated gate bipolar transistor (IGBT) is a kind of excellent performance switching device used widely in power electronic systems. How to estimate the remaining useful life (RUL) of an IGBT to ensure the safety and reliability of the power electronics system is currently a challenging issue in the field of IGBT reliability. The aim of this paper is to develop a prognostic technique for estimating IGBTs’ RUL. There is a need for an efficient prognostic algorithm that is able to support in-situ decision-making. In this paper, a novel prediction model with a complete structure based on optimally pruned extreme learning machine (OPELM) and Volterra series is proposed to track the IGBT’s degradation trace and estimate its RUL; we refer to this model as Volterra k-nearest neighbor OPELM prediction (VKOPP) model. This model uses the minimum entropy rate method and Volterra series to reconstruct phase space for IGBTs’ ageing samples, and a new weight update algorithm, which can effectively reduce the influence of the outliers and noises, is utilized to establish the VKOPP network; then a combination of the k-nearest neighbor method (KNN) and least squares estimation (LSE) method is used to calculate the output weights of OPELM and predict the RUL of the IGBT. The prognostic results show that the proposed approach can predict the RUL of IGBT modules with small error and achieve higher prediction precision and lower time cost than some classic prediction approaches. PMID:29099811
van Diest, Mike; Stegenga, Jan; Wörtche, Heinrich J.; Roerdink, Jos B. T. M; Verkerke, Gijsbertus J.; Lamoth, Claudine J. C.
2015-01-01
Background Exergames are becoming an increasingly popular tool for training balance ability, thereby preventing falls in older adults. Automatic, real time, assessment of the user’s balance control offers opportunities in terms of providing targeted feedback and dynamically adjusting the gameplay to the individual user, yet algorithms for quantification of balance control remain to be developed. The aim of the present study was to identify movement patterns, and variability therein, of young and older adults playing a custom-made weight-shifting (ice-skating) exergame. Methods Twenty older adults and twenty young adults played a weight-shifting exergame under five conditions of varying complexity, while multi-segmental whole-body movement data were captured using Kinect. Movement coordination patterns expressed during gameplay were identified using Self Organizing Maps (SOM), an artificial neural network, and variability in these patterns was quantified by computing Total Trajectory Variability (TTvar). Additionally a k Nearest Neighbor (kNN) classifier was trained to discriminate between young and older adults based on the SOM features. Results Results showed that TTvar was significantly higher in older adults than in young adults, when playing the exergame under complex task conditions. The kNN classifier showed a classification accuracy of 65.8%. Conclusions Older adults display more variable sway behavior than young adults, when playing the exergame under complex task conditions. The SOM features characterizing movement patterns expressed during exergaming allow for discriminating between young and older adults with limited accuracy. Our findings contribute to the development of algorithms for quantification of balance ability during home-based exergaming for balance training. PMID:26230655
van Diest, Mike; Stegenga, Jan; Wörtche, Heinrich J; Roerdink, Jos B T M; Verkerke, Gijsbertus J; Lamoth, Claudine J C
2015-01-01
Exergames are becoming an increasingly popular tool for training balance ability, thereby preventing falls in older adults. Automatic, real time, assessment of the user's balance control offers opportunities in terms of providing targeted feedback and dynamically adjusting the gameplay to the individual user, yet algorithms for quantification of balance control remain to be developed. The aim of the present study was to identify movement patterns, and variability therein, of young and older adults playing a custom-made weight-shifting (ice-skating) exergame. Twenty older adults and twenty young adults played a weight-shifting exergame under five conditions of varying complexity, while multi-segmental whole-body movement data were captured using Kinect. Movement coordination patterns expressed during gameplay were identified using Self Organizing Maps (SOM), an artificial neural network, and variability in these patterns was quantified by computing Total Trajectory Variability (TTvar). Additionally a k Nearest Neighbor (kNN) classifier was trained to discriminate between young and older adults based on the SOM features. Results showed that TTvar was significantly higher in older adults than in young adults, when playing the exergame under complex task conditions. The kNN classifier showed a classification accuracy of 65.8%. Older adults display more variable sway behavior than young adults, when playing the exergame under complex task conditions. The SOM features characterizing movement patterns expressed during exergaming allow for discriminating between young and older adults with limited accuracy. Our findings contribute to the development of algorithms for quantification of balance ability during home-based exergaming for balance training.
NASA Astrophysics Data System (ADS)
Fernandez Rojas, Raul; Huang, Xu; Ou, Keng-Liang
2017-10-01
Pain diagnosis for nonverbal patients represents a challenge in clinical settings. Neuroimaging methods, such as functional magnetic resonance imaging and functional near-infrared spectroscopy (fNIRS), have shown promising results to assess neuronal function in response to nociception and pain. Recent studies suggest that neuroimaging in conjunction with machine learning models can be used to predict different cognitive tasks. The aim of this study is to expand previous studies by exploring the classification of fNIRS signals (oxyhaemoglobin) according to temperature level (cold and hot) and corresponding pain intensity (low and high) using machine learning models. Toward this aim, we used the quantitative sensory testing to determine pain threshold and pain tolerance to cold and heat in 18 healthy subjects (three females), mean age±standard deviation (31.9±5.5). The classification model is based on the bag-of-words approach, a histogram representation used in document classification based on the frequencies of extracted words and adapted for time series; two learning algorithms were used separately, K-nearest neighbor (K-NN) and support vector machines (SVM). A comparison between two sets of fNIRS channels was also made in the classification task, all 24 channels and 8 channels from the somatosensory region defined as our region of interest (RoI). The results showed that K-NN obtained slightly better results (92.08%) than SVM (91.25%) using the 24 channels; however, the performance slightly dropped using only channels from the RoI with K-NN (91.53%) and SVM (90.83%). These results indicate potential applications of fNIRS in the development of a physiologically based diagnosis of human pain that would benefit vulnerable patients who cannot self-report pain.
Detecting paroxysmal coughing from pertussis cases using voice recognition technology.
Parker, Danny; Picone, Joseph; Harati, Amir; Lu, Shuang; Jenkyns, Marion H; Polgreen, Philip M
2013-01-01
Pertussis is highly contagious; thus, prompt identification of cases is essential to control outbreaks. Clinicians experienced with the disease can easily identify classic cases, where patients have bursts of rapid coughing followed by gasps, and a characteristic whooping sound. However, many clinicians have never seen a case, and thus may miss initial cases during an outbreak. The purpose of this project was to use voice-recognition software to distinguish pertussis coughs from croup and other coughs. We collected a series of recordings representing pertussis, croup and miscellaneous coughing by children. We manually categorized coughs as either pertussis or non-pertussis, and extracted features for each category. We used Mel-frequency cepstral coefficients (MFCC), a sampling rate of 16 KHz, a frame Duration of 25 msec, and a frame rate of 10 msec. The coughs were filtered. Each cough was divided into 3 sections of proportion 3-4-3. The average of the 13 MFCCs for each section was computed and made into a 39-element feature vector used for the classification. We used the following machine learning algorithms: Neural Networks, K-Nearest Neighbor (KNN), and a 200 tree Random Forest (RF). Data were reserved for cross-validation of the KNN and RF. The Neural Network was trained 100 times, and the averaged results are presented. After categorization, we had 16 examples of non-pertussis coughs and 31 examples of pertussis coughs. Over 90% of all pertussis coughs were properly classified as pertussis. The error rates were: Type I errors of 7%, 12%, and 25% and Type II errors of 8%, 0%, and 0%, using the Neural Network, Random Forest, and KNN, respectively. Our results suggest that we can build a robust classifier to assist clinicians and the public to help identify pertussis cases in children presenting with typical symptoms.
A floor-map-aided WiFi/pseudo-odometry integration algorithm for an indoor positioning system.
Wang, Jian; Hu, Andong; Liu, Chunyan; Li, Xin
2015-03-24
This paper proposes a scheme for indoor positioning by fusing floor map, WiFi and smartphone sensor data to provide meter-level positioning without additional infrastructure. A topology-constrained K nearest neighbor (KNN) algorithm based on a floor map layout provides the coordinates required to integrate WiFi data with pseudo-odometry (P-O) measurements simulated using a pedestrian dead reckoning (PDR) approach. One method of further improving the positioning accuracy is to use a more effective multi-threshold step detection algorithm, as proposed by the authors. The "go and back" phenomenon caused by incorrect matching of the reference points (RPs) of a WiFi algorithm is eliminated using an adaptive fading-factor-based extended Kalman filter (EKF), taking WiFi positioning coordinates, P-O measurements and fused heading angles as observations. The "cross-wall" problem is solved based on the development of a floor-map-aided particle filter algorithm by weighting the particles, thereby also eliminating the gross-error effects originating from WiFi or P-O measurements. The performance observed in a field experiment performed on the fourth floor of the School of Environmental Science and Spatial Informatics (SESSI) building on the China University of Mining and Technology (CUMT) campus confirms that the proposed scheme can reliably achieve meter-level positioning.
Optimization of internet content filtering-Combined with KNN and OCAT algorithms
NASA Astrophysics Data System (ADS)
Guo, Tianze; Wu, Lingjing; Liu, Jiaming
2018-04-01
The face of the status quo that rampant illegal content in the Internet, the result of traditional way to filter information, keyword recognition and manual screening, is getting worse. Based on this, this paper uses OCAT algorithm nested by KNN classification algorithm to construct a corpus training library that can dynamically learn and update, which can be improved on the filter corpus for constantly updated illegal content of the network, including text and pictures, and thus can better filter and investigate illegal content and its source. After that, the research direction will focus on the simplified updating of recognition and comparison algorithms and the optimization of the corpus learning ability in order to improve the efficiency of filtering, save time and resources.
Thanh Noi, Phan; Kappas, Martin
2017-01-01
In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets. PMID:29271909
Thanh Noi, Phan; Kappas, Martin
2017-12-22
In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.
Potential of near-infrared spectroscopy for quality evaluation of cattle leather.
Braz, Carlos Eduardo M; Jacinto, Manuel Antonio C; Pereira-Filho, Edenir R; Souza, Gilberto B; Nogueira, Ana Rita A
2018-05-09
Models using near-infrared spectroscopy (NIRS) were constructed based on physical-mechanical tests to determine the quality of cattle leather. The following official parameters were used, considering the industry requirements: tensile strength (TS), percentage elongation (%E), tear strength (TT), and double hole tear strength (DHS). Classification models were constructed with the use of k-nearest neighbor (kNN), soft independent modeling of class analogy (SIMCA), and partial least squares-discriminant analysis (PLS-DA). The evaluated figures of merit, accuracy, sensitivity, and specificity presented results between 85% and 93%, and the false alarm rates from 9% to 14%. The model with lowest validation percentage (92%) was kNN, and the highest was PLS-DA (100%). For TS, lower values were obtained, from 52% for kNN and 74% for SIMCA. The other parameters %E, TT, and DHS presented hit rates between 87 and 100%. The abilities of the models were similar, showing they can be used to predict the quality of cattle leather. Copyright © 2018 Elsevier B.V. All rights reserved.
Hussain, Lal
2018-06-01
Epilepsy is a neurological disorder produced due to abnormal excitability of neurons in the brain. The research reveals that brain activity is monitored through electroencephalogram (EEG) of patients suffered from seizure to detect the epileptic seizure. The performance of EEG detection based epilepsy require feature extracting strategies. In this research, we have extracted varying features extracting strategies based on time and frequency domain characteristics, nonlinear, wavelet based entropy and few statistical features. A deeper study was undertaken using novel machine learning classifiers by considering multiple factors. The support vector machine kernels are evaluated based on multiclass kernel and box constraint level. Likewise, for K-nearest neighbors (KNN), we computed the different distance metrics, Neighbor weights and Neighbors. Similarly, the decision trees we tuned the paramours based on maximum splits and split criteria and ensemble classifiers are evaluated based on different ensemble methods and learning rate. For training/testing tenfold Cross validation was employed and performance was evaluated in form of TPR, NPR, PPV, accuracy and AUC. In this research, a deeper analysis approach was performed using diverse features extracting strategies using robust machine learning classifiers with more advanced optimal options. Support Vector Machine linear kernel and KNN with City block distance metric give the overall highest accuracy of 99.5% which was higher than using the default parameters for these classifiers. Moreover, highest separation (AUC = 0.9991, 0.9990) were obtained at different kernel scales using SVM. Additionally, the K-nearest neighbors with inverse squared distance weight give higher performance at different Neighbors. Moreover, to distinguish the postictal heart rate oscillations from epileptic ictal subjects, and highest performance of 100% was obtained using different machine learning classifiers.
NASA Astrophysics Data System (ADS)
Zhu, Maohu; Jie, Nanfeng; Jiang, Tianzi
2014-03-01
A reliable and precise classification of schizophrenia is significant for its diagnosis and treatment of schizophrenia. Functional magnetic resonance imaging (fMRI) is a novel tool increasingly used in schizophrenia research. Recent advances in statistical learning theory have led to applying pattern classification algorithms to access the diagnostic value of functional brain networks, discovered from resting state fMRI data. The aim of this study was to propose an adaptive learning algorithm to distinguish schizophrenia patients from normal controls using resting-state functional language network. Furthermore, here the classification of schizophrenia was regarded as a sample selection problem where a sparse subset of samples was chosen from the labeled training set. Using these selected samples, which we call informative vectors, a classifier for the clinic diagnosis of schizophrenia was established. We experimentally demonstrated that the proposed algorithm incorporating resting-state functional language network achieved 83.6% leaveone- out accuracy on resting-state fMRI data of 27 schizophrenia patients and 28 normal controls. In contrast with KNearest- Neighbor (KNN), Support Vector Machine (SVM) and l1-norm, our method yielded better classification performance. Moreover, our results suggested that a dysfunction of resting-state functional language network plays an important role in the clinic diagnosis of schizophrenia.
Pattern recognition for passive polarimetric data using nonparametric classifiers
NASA Astrophysics Data System (ADS)
Thilak, Vimal; Saini, Jatinder; Voelz, David G.; Creusere, Charles D.
2005-08-01
Passive polarization based imaging is a useful tool in computer vision and pattern recognition. A passive polarization imaging system forms a polarimetric image from the reflection of ambient light that contains useful information for computer vision tasks such as object detection (classification) and recognition. Applications of polarization based pattern recognition include material classification and automatic shape recognition. In this paper, we present two target detection algorithms for images captured by a passive polarimetric imaging system. The proposed detection algorithms are based on Bayesian decision theory. In these approaches, an object can belong to one of any given number classes and classification involves making decisions that minimize the average probability of making incorrect decisions. This minimum is achieved by assigning an object to the class that maximizes the a posteriori probability. Computing a posteriori probabilities requires estimates of class conditional probability density functions (likelihoods) and prior probabilities. A Probabilistic neural network (PNN), which is a nonparametric method that can compute Bayes optimal boundaries, and a -nearest neighbor (KNN) classifier, is used for density estimation and classification. The proposed algorithms are applied to polarimetric image data gathered in the laboratory with a liquid crystal-based system. The experimental results validate the effectiveness of the above algorithms for target detection from polarimetric data.
Adaptive local linear regression with application to printer color management.
Gupta, Maya R; Garcia, Eric K; Chin, Erika
2008-06-01
Local learning methods, such as local linear regression and nearest neighbor classifiers, base estimates on nearby training samples, neighbors. Usually, the number of neighbors used in estimation is fixed to be a global "optimal" value, chosen by cross validation. This paper proposes adapting the number of neighbors used for estimation to the local geometry of the data, without need for cross validation. The term enclosing neighborhood is introduced to describe a set of neighbors whose convex hull contains the test point when possible. It is proven that enclosing neighborhoods yield bounded estimation variance under some assumptions. Three such enclosing neighborhood definitions are presented: natural neighbors, natural neighbors inclusive, and enclosing k-NN. The effectiveness of these neighborhood definitions with local linear regression is tested for estimating lookup tables for color management. Significant improvements in error metrics are shown, indicating that enclosing neighborhoods may be a promising adaptive neighborhood definition for other local learning tasks as well, depending on the density of training samples.
NASA Astrophysics Data System (ADS)
Jiang, Li; Shi, Tielin; Xuan, Jianping
2012-05-01
Generally, the vibration signals of fault bearings are non-stationary and highly nonlinear under complicated operating conditions. Thus, it's a big challenge to extract optimal features for improving classification and simultaneously decreasing feature dimension. Kernel Marginal Fisher analysis (KMFA) is a novel supervised manifold learning algorithm for feature extraction and dimensionality reduction. In order to avoid the small sample size problem in KMFA, we propose regularized KMFA (RKMFA). A simple and efficient intelligent fault diagnosis method based on RKMFA is put forward and applied to fault recognition of rolling bearings. So as to directly excavate nonlinear features from the original high-dimensional vibration signals, RKMFA constructs two graphs describing the intra-class compactness and the inter-class separability, by combining traditional manifold learning algorithm with fisher criteria. Therefore, the optimal low-dimensional features are obtained for better classification and finally fed into the simplest K-nearest neighbor (KNN) classifier to recognize different fault categories of bearings. The experimental results demonstrate that the proposed approach improves the fault classification performance and outperforms the other conventional approaches.
Developing a radiomics framework for classifying non-small cell lung carcinoma subtypes
NASA Astrophysics Data System (ADS)
Yu, Dongdong; Zang, Yali; Dong, Di; Zhou, Mu; Gevaert, Olivier; Fang, Mengjie; Shi, Jingyun; Tian, Jie
2017-03-01
Patient-targeted treatment of non-small cell lung carcinoma (NSCLC) has been well documented according to the histologic subtypes over the past decade. In parallel, recent development of quantitative image biomarkers has recently been highlighted as important diagnostic tools to facilitate histological subtype classification. In this study, we present a radiomics analysis that classifies the adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). We extract 52-dimensional, CT-based features (7 statistical features and 45 image texture features) to represent each nodule. We evaluate our approach on a clinical dataset including 324 ADCs and 110 SqCCs patients with CT image scans. Classification of these features is performed with four different machine-learning classifiers including Support Vector Machines with Radial Basis Function kernel (RBF-SVM), Random forest (RF), K-nearest neighbor (KNN), and RUSBoost algorithms. To improve the classifiers' performance, optimal feature subset is selected from the original feature set by using an iterative forward inclusion and backward eliminating algorithm. Extensive experimental results demonstrate that radiomics features achieve encouraging classification results on both complete feature set (AUC=0.89) and optimal feature subset (AUC=0.91).
Research on cardiovascular disease prediction based on distance metric learning
NASA Astrophysics Data System (ADS)
Ni, Zhuang; Liu, Kui; Kang, Guixia
2018-04-01
Distance metric learning algorithm has been widely applied to medical diagnosis and exhibited its strengths in classification problems. The k-nearest neighbour (KNN) is an efficient method which treats each feature equally. The large margin nearest neighbour classification (LMNN) improves the accuracy of KNN by learning a global distance metric, which did not consider the locality of data distributions. In this paper, we propose a new distance metric algorithm adopting cosine metric and LMNN named COS-SUBLMNN which takes more care about local feature of data to overcome the shortage of LMNN and improve the classification accuracy. The proposed methodology is verified on CVDs patient vector derived from real-world medical data. The Experimental results show that our method provides higher accuracy than KNN and LMNN did, which demonstrates the effectiveness of the Risk predictive model of CVDs based on COS-SUBLMNN.
Vrooman, Henri A; Cocosco, Chris A; van der Lijn, Fedde; Stokking, Rik; Ikram, M Arfan; Vernooij, Meike W; Breteler, Monique M B; Niessen, Wiro J
2007-08-01
Conventional k-Nearest-Neighbor (kNN) classification, which has been successfully applied to classify brain tissue in MR data, requires training on manually labeled subjects. This manual labeling is a laborious and time-consuming procedure. In this work, a new fully automated brain tissue classification procedure is presented, in which kNN training is automated. This is achieved by non-rigidly registering the MR data with a tissue probability atlas to automatically select training samples, followed by a post-processing step to keep the most reliable samples. The accuracy of the new method was compared to rigid registration-based training and to conventional kNN-based segmentation using training on manually labeled subjects for segmenting gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) in 12 data sets. Furthermore, for all classification methods, the performance was assessed when varying the free parameters. Finally, the robustness of the fully automated procedure was evaluated on 59 subjects. The automated training method using non-rigid registration with a tissue probability atlas was significantly more accurate than rigid registration. For both automated training using non-rigid registration and for the manually trained kNN classifier, the difference with the manual labeling by observers was not significantly larger than inter-observer variability for all tissue types. From the robustness study, it was clear that, given an appropriate brain atlas and optimal parameters, our new fully automated, non-rigid registration-based method gives accurate and robust segmentation results. A similarity index was used for comparison with manually trained kNN. The similarity indices were 0.93, 0.92 and 0.92, for CSF, GM and WM, respectively. It can be concluded that our fully automated method using non-rigid registration may replace manual segmentation, and thus that automated brain tissue segmentation without laborious manual training is feasible.
NASA Astrophysics Data System (ADS)
Li, Cuiling; Jiang, Kai; Zhao, Xueguan; Fan, Pengfei; Wang, Xiu; Liu, Chuan
2017-10-01
Impurity of melon seeds variety will cause reductions of melon production and economic benefits of farmers, this research aimed to adopt spectral technology combined with chemometrics methods to identify melon seeds variety. Melon seeds whose varieties were "Yi Te Bai", "Yi Te Jin", "Jing Mi NO.7", "Jing Mi NO.11" and " Yi Li Sha Bai "were used as research samples. A simple spectral system was developed to collect reflective spectral data of melon seeds, including a light source unit, a spectral data acquisition unit and a data processing unit, the detection wavelength range of this system was 200-1100nm with spectral resolution of 0.14 7.7nm. The original reflective spectral data was pre-treated with de-trend (DT), multiple scattering correction (MSC), first derivative (FD), normalization (NOR) and Savitzky-Golay (SG) convolution smoothing methods. Principal Component Analysis (PCA) method was adopted to reduce the dimensions of reflective spectral data and extract principal components. K-nearest neighbour (KNN) and Fisher discriminant analysis (FDA) methods were used to develop discriminant models of melon seeds variety based on PCA. Spectral data pretreatments improved the discriminant effects of KNN and FDA, FDA generated better discriminant results than KNN, both KNN and FDA methods produced discriminant accuracies reaching to 90.0% for validation set. Research results showed that using spectral technology in combination with KNN and FDA modelling methods to identify melon seeds variety was feasible.
An ensemble of dissimilarity based classifiers for Mackerel gender determination
NASA Astrophysics Data System (ADS)
Blanco, A.; Rodriguez, R.; Martinez-Maranon, I.
2014-03-01
Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.
Ding, Huijun; He, Qing; Zhou, Yongjin; Dan, Guo; Cui, Song
2017-01-01
Motion-intent-based finger gesture recognition systems are crucial for many applications such as prosthesis control, sign language recognition, wearable rehabilitation system, and human–computer interaction. In this article, a motion-intent-based finger gesture recognition system is designed to correctly identify the tapping of every finger for the first time. Two auto-event annotation algorithms are firstly applied and evaluated for detecting the finger tapping frame. Based on the truncated signals, the Wavelet packet transform (WPT) coefficients are calculated and compressed as the features, followed by a feature selection method that is able to improve the performance by optimizing the feature set. Finally, three popular classifiers including naive Bayes (NBC), K-nearest neighbor (KNN), and support vector machine (SVM) are applied and evaluated. The recognition accuracy can be achieved up to 94%. The design and the architecture of the system are presented with full system characterization results. PMID:29167655
Multisite rainfall downscaling and disaggregation in a tropical urban area
NASA Astrophysics Data System (ADS)
Lu, Y.; Qin, X. S.
2014-02-01
A systematic downscaling-disaggregation study was conducted over Singapore Island, with an aim to generate high spatial and temporal resolution rainfall data under future climate-change conditions. The study consisted of two major components. The first part was to perform an inter-comparison of various alternatives of downscaling and disaggregation methods based on observed data. This included (i) single-site generalized linear model (GLM) plus K-nearest neighbor (KNN) (S-G-K) vs. multisite GLM (M-G) for spatial downscaling, (ii) HYETOS vs. KNN for single-site disaggregation, and (iii) KNN vs. MuDRain (Multivariate Rainfall Disaggregation tool) for multisite disaggregation. The results revealed that, for multisite downscaling, M-G performs better than S-G-K in covering the observed data with a lower RMSE value; for single-site disaggregation, KNN could better keep the basic statistics (i.e. standard deviation, lag-1 autocorrelation and probability of wet hour) than HYETOS; for multisite disaggregation, MuDRain outperformed KNN in fitting interstation correlations. In the second part of the study, an integrated downscaling-disaggregation framework based on M-G, KNN, and MuDRain was used to generate hourly rainfall at multiple sites. The results indicated that the downscaled and disaggregated rainfall data based on multiple ensembles from HadCM3 for the period from 1980 to 2010 could well cover the observed mean rainfall amount and extreme data, and also reasonably keep the spatial correlations both at daily and hourly timescales. The framework was also used to project future rainfall conditions under HadCM3 SRES A2 and B2 scenarios. It was indicated that the annual rainfall amount could reduce up to 5% at the end of this century, but the rainfall of wet season and extreme hourly rainfall could notably increase.
A Floor-Map-Aided WiFi/Pseudo-Odometry Integration Algorithm for an Indoor Positioning System
Wang, Jian; Hu, Andong; Liu, Chunyan; Li, Xin
2015-01-01
This paper proposes a scheme for indoor positioning by fusing floor map, WiFi and smartphone sensor data to provide meter-level positioning without additional infrastructure. A topology-constrained K nearest neighbor (KNN) algorithm based on a floor map layout provides the coordinates required to integrate WiFi data with pseudo-odometry (P-O) measurements simulated using a pedestrian dead reckoning (PDR) approach. One method of further improving the positioning accuracy is to use a more effective multi-threshold step detection algorithm, as proposed by the authors. The “go and back” phenomenon caused by incorrect matching of the reference points (RPs) of a WiFi algorithm is eliminated using an adaptive fading-factor-based extended Kalman filter (EKF), taking WiFi positioning coordinates, P-O measurements and fused heading angles as observations. The “cross-wall” problem is solved based on the development of a floor-map-aided particle filter algorithm by weighting the particles, thereby also eliminating the gross-error effects originating from WiFi or P-O measurements. The performance observed in a field experiment performed on the fourth floor of the School of Environmental Science and Spatial Informatics (SESSI) building on the China University of Mining and Technology (CUMT) campus confirms that the proposed scheme can reliably achieve meter-level positioning. PMID:25811224
Creating Profiles from User Network Behavior
2013-09-01
We varied the m-estimate in Naïve Bayes, m for pruning in Learning Tree, and how many k nearest neighbors to select from in KNN, before settling on the...N. Taft, “The cubicle vs. the coffee shop: behavioral modes in enterprise end-users,” in Proc. of the 9th Int. Conf. on Passive and Active Network
Handwritten digits recognition based on immune network
NASA Astrophysics Data System (ADS)
Li, Yangyang; Wu, Yunhui; Jiao, Lc; Wu, Jianshe
2011-11-01
With the development of society, handwritten digits recognition technique has been widely applied to production and daily life. It is a very difficult task to solve these problems in the field of pattern recognition. In this paper, a new method is presented for handwritten digit recognition. The digit samples firstly are processed and features extraction. Based on these features, a novel immune network classification algorithm is designed and implemented to the handwritten digits recognition. The proposed algorithm is developed by Jerne's immune network model for feature selection and KNN method for classification. Its characteristic is the novel network with parallel commutating and learning. The performance of the proposed method is experimented to the handwritten number datasets MNIST and compared with some other recognition algorithms-KNN, ANN and SVM algorithm. The result shows that the novel classification algorithm based on immune network gives promising performance and stable behavior for handwritten digits recognition.
Mapping ionospheric observations using combined techniques for Europe region
NASA Astrophysics Data System (ADS)
Tomasik, Lukasz; Gulyaeva, Tamara; Stanislawska, Iwona; Swiatek, Anna; Pozoga, Mariusz; Dziak-Jankowska, Beata
An k nearest neighbours algorithm (KNN) was used for filling the gaps of the missing F2-layer critical frequency is proposed and applied. This method uses TEC data calculated from EGNOS Vertical Delay Estimate (VDE ≈0.78 TECU) and several GNSS stations and its spatial correlation whit data from selected ionosondes. For mapping purposes two-dimensional similarity function in KNN method was proposed.
Ali, Safdar; Majid, Abdul; Khan, Asifullah
2014-04-01
Development of an accurate and reliable intelligent decision-making method for the construction of cancer diagnosis system is one of the fast growing research areas of health sciences. Such decision-making system can provide adequate information for cancer diagnosis and drug discovery. Descriptors derived from physicochemical properties of protein sequences are very useful for classifying cancerous proteins. Recently, several interesting research studies have been reported on breast cancer classification. To this end, we propose the exploitation of the physicochemical properties of amino acids in protein primary sequences such as hydrophobicity (Hd) and hydrophilicity (Hb) for breast cancer classification. Hd and Hb properties of amino acids, in recent literature, are reported to be quite effective in characterizing the constituent amino acids and are used to study protein foldings, interactions, structures, and sequence-order effects. Especially, using these physicochemical properties, we observed that proline, serine, tyrosine, cysteine, arginine, and asparagine amino acids offer high discrimination between cancerous and healthy proteins. In addition, unlike traditional ensemble classification approaches, the proposed 'IDM-PhyChm-Ens' method was developed by combining the decision spaces of a specific classifier trained on different feature spaces. The different feature spaces used were amino acid composition, split amino acid composition, and pseudo amino acid composition. Consequently, we have exploited different feature spaces using Hd and Hb properties of amino acids to develop an accurate method for classification of cancerous protein sequences. We developed ensemble classifiers using diverse learning algorithms such as random forest (RF), support vector machines (SVM), and K-nearest neighbor (KNN) trained on different feature spaces. We observed that ensemble-RF, in case of cancer classification, performed better than ensemble-SVM and ensemble-KNN. Our analysis demonstrates that ensemble-RF, ensemble-SVM and ensemble-KNN are more effective than their individual counterparts. The proposed 'IDM-PhyChm-Ens' method has shown improved performance compared to existing techniques.
Detecting Paroxysmal Coughing from Pertussis Cases Using Voice Recognition Technology
Parker, Danny; Picone, Joseph; Harati, Amir; Lu, Shuang; Jenkyns, Marion H.; Polgreen, Philip M.
2013-01-01
Background Pertussis is highly contagious; thus, prompt identification of cases is essential to control outbreaks. Clinicians experienced with the disease can easily identify classic cases, where patients have bursts of rapid coughing followed by gasps, and a characteristic whooping sound. However, many clinicians have never seen a case, and thus may miss initial cases during an outbreak. The purpose of this project was to use voice-recognition software to distinguish pertussis coughs from croup and other coughs. Methods We collected a series of recordings representing pertussis, croup and miscellaneous coughing by children. We manually categorized coughs as either pertussis or non-pertussis, and extracted features for each category. We used Mel-frequency cepstral coefficients (MFCC), a sampling rate of 16 KHz, a frame Duration of 25 msec, and a frame rate of 10 msec. The coughs were filtered. Each cough was divided into 3 sections of proportion 3-4-3. The average of the 13 MFCCs for each section was computed and made into a 39-element feature vector used for the classification. We used the following machine learning algorithms: Neural Networks, K-Nearest Neighbor (KNN), and a 200 tree Random Forest (RF). Data were reserved for cross-validation of the KNN and RF. The Neural Network was trained 100 times, and the averaged results are presented. Results After categorization, we had 16 examples of non-pertussis coughs and 31 examples of pertussis coughs. Over 90% of all pertussis coughs were properly classified as pertussis. The error rates were: Type I errors of 7%, 12%, and 25% and Type II errors of 8%, 0%, and 0%, using the Neural Network, Random Forest, and KNN, respectively. Conclusion Our results suggest that we can build a robust classifier to assist clinicians and the public to help identify pertussis cases in children presenting with typical symptoms. PMID:24391730
An Approach to Biometric Verification Based on Human Body Communication in Wearable Devices
Li, Jingzhen; Liu, Yuhang; Nie, Zedong; Qin, Wenjian; Pang, Zengyao; Wang, Lei
2017-01-01
In this paper, an approach to biometric verification based on human body communication (HBC) is presented for wearable devices. For this purpose, the transmission gain S21 of volunteer’s forearm is measured by vector network analyzer (VNA). Specifically, in order to determine the chosen frequency for biometric verification, 1800 groups of data are acquired from 10 volunteers in the frequency range 0.3 MHz to 1500 MHz, and each group includes 1601 sample data. In addition, to achieve the rapid verification, 30 groups of data for each volunteer are acquired at the chosen frequency, and each group contains only 21 sample data. Furthermore, a threshold-adaptive template matching (TATM) algorithm based on weighted Euclidean distance is proposed for rapid verification in this work. The results indicate that the chosen frequency for biometric verification is from 650 MHz to 750 MHz. The false acceptance rate (FAR) and false rejection rate (FRR) based on TATM are approximately 5.79% and 6.74%, respectively. In contrast, the FAR and FRR were 4.17% and 37.5%, 3.37% and 33.33%, and 3.80% and 34.17% using K-nearest neighbor (KNN) classification, support vector machines (SVM), and naive Bayesian method (NBM) classification, respectively. In addition, the running time of TATM is 0.019 s, whereas the running times of KNN, SVM and NBM are 0.310 s, 0.0385 s, and 0.168 s, respectively. Therefore, TATM is suggested to be appropriate for rapid verification use in wearable devices. PMID:28075375
An Approach to Biometric Verification Based on Human Body Communication in Wearable Devices.
Li, Jingzhen; Liu, Yuhang; Nie, Zedong; Qin, Wenjian; Pang, Zengyao; Wang, Lei
2017-01-10
In this paper, an approach to biometric verification based on human body communication (HBC) is presented for wearable devices. For this purpose, the transmission gain S21 of volunteer's forearm is measured by vector network analyzer (VNA). Specifically, in order to determine the chosen frequency for biometric verification, 1800 groups of data are acquired from 10 volunteers in the frequency range 0.3 MHz to 1500 MHz, and each group includes 1601 sample data. In addition, to achieve the rapid verification, 30 groups of data for each volunteer are acquired at the chosen frequency, and each group contains only 21 sample data. Furthermore, a threshold-adaptive template matching (TATM) algorithm based on weighted Euclidean distance is proposed for rapid verification in this work. The results indicate that the chosen frequency for biometric verification is from 650 MHz to 750 MHz. The false acceptance rate (FAR) and false rejection rate (FRR) based on TATM are approximately 5.79% and 6.74%, respectively. In contrast, the FAR and FRR were 4.17% and 37.5%, 3.37% and 33.33%, and 3.80% and 34.17% using K-nearest neighbor (KNN) classification, support vector machines (SVM), and naive Bayesian method (NBM) classification, respectively. In addition, the running time of TATM is 0.019 s, whereas the running times of KNN, SVM and NBM are 0.310 s, 0.0385 s, and 0.168 s, respectively. Therefore, TATM is suggested to be appropriate for rapid verification use in wearable devices.
ERIC Educational Resources Information Center
Pankau, Brian L.
2009-01-01
This empirical study evaluates the document category prediction effectiveness of Naive Bayes (NB) and K-Nearest Neighbor (KNN) classifier treatments built from different feature selection and machine learning settings and trained and tested against textual corpora of 2300 Gang-Of-Four (GOF) design pattern documents. Analysis of the experiment's…
Iris Recognition Using Feature Extraction of Box Counting Fractal Dimension
NASA Astrophysics Data System (ADS)
Khotimah, C.; Juniati, D.
2018-01-01
Biometrics is a science that is now growing rapidly. Iris recognition is a biometric modality which captures a photo of the eye pattern. The markings of the iris are distinctive that it has been proposed to use as a means of identification, instead of fingerprints. Iris recognition was chosen for identification in this research because every human has a special feature that each individual is different and the iris is protected by the cornea so that it will have a fixed shape. This iris recognition consists of three step: pre-processing of data, feature extraction, and feature matching. Hough transformation is used in the process of pre-processing to locate the iris area and Daugman’s rubber sheet model to normalize the iris data set into rectangular blocks. To find the characteristics of the iris, it was used box counting method to get the fractal dimension value of the iris. Tests carried out by used k-fold cross method with k = 5. In each test used 10 different grade K of K-Nearest Neighbor (KNN). The result of iris recognition was obtained with the best accuracy was 92,63 % for K = 3 value on K-Nearest Neighbor (KNN) method.
Fusing Bluetooth Beacon Data with Wi-Fi Radiomaps for Improved Indoor Localization
Kanaris, Loizos; Kokkinis, Akis; Liotta, Antonio; Stavrou, Stavros
2017-01-01
Indoor user localization and tracking are instrumental to a broad range of services and applications in the Internet of Things (IoT) and particularly in Body Sensor Networks (BSN) and Ambient Assisted Living (AAL) scenarios. Due to the widespread availability of IEEE 802.11, many localization platforms have been proposed, based on the Wi-Fi Received Signal Strength (RSS) indicator, using algorithms such as K-Nearest Neighbour (KNN), Maximum A Posteriori (MAP) and Minimum Mean Square Error (MMSE). In this paper, we introduce a hybrid method that combines the simplicity (and low cost) of Bluetooth Low Energy (BLE) and the popular 802.11 infrastructure, to improve the accuracy of indoor localization platforms. Building on KNN, we propose a new positioning algorithm (dubbed i-KNN) which is able to filter the initial fingerprint dataset (i.e., the radiomap), after considering the proximity of RSS fingerprints with respect to the BLE devices. In this way, i-KNN provides an optimised small subset of possible user locations, based on which it finally estimates the user position. The proposed methodology achieves fast positioning estimation due to the utilization of a fragment of the initial fingerprint dataset, while at the same time improves positioning accuracy by minimizing any calculation errors. PMID:28394268
Fusing Bluetooth Beacon Data with Wi-Fi Radiomaps for Improved Indoor Localization.
Kanaris, Loizos; Kokkinis, Akis; Liotta, Antonio; Stavrou, Stavros
2017-04-10
Indoor user localization and tracking are instrumental to a broad range of services and applications in the Internet of Things (IoT) and particularly in Body Sensor Networks (BSN) and Ambient Assisted Living (AAL) scenarios. Due to the widespread availability of IEEE 802.11, many localization platforms have been proposed, based on the Wi-Fi Received Signal Strength (RSS) indicator, using algorithms such as K -Nearest Neighbour (KNN), Maximum A Posteriori (MAP) and Minimum Mean Square Error (MMSE). In this paper, we introduce a hybrid method that combines the simplicity (and low cost) of Bluetooth Low Energy (BLE) and the popular 802.11 infrastructure, to improve the accuracy of indoor localization platforms. Building on KNN, we propose a new positioning algorithm (dubbed i-KNN) which is able to filter the initial fingerprint dataset (i.e., the radiomap), after considering the proximity of RSS fingerprints with respect to the BLE devices. In this way, i-KNN provides an optimised small subset of possible user locations, based on which it finally estimates the user position. The proposed methodology achieves fast positioning estimation due to the utilization of a fragment of the initial fingerprint dataset, while at the same time improves positioning accuracy by minimizing any calculation errors.
A False Alarm Reduction Method for a Gas Sensor Based Electronic Nose
Rahman, Mohammad Mizanur; Suksompong, Prapun; Toochinda, Pisanu; Taparugssanagorn, Attaphongse
2017-01-01
Electronic noses (E-Noses) are becoming popular for food and fruit quality assessment due to their robustness and repeated usability without fatigue, unlike human experts. An E-Nose equipped with classification algorithms and having open ended classification boundaries such as the k-nearest neighbor (k-NN), support vector machine (SVM), and multilayer perceptron neural network (MLPNN), are found to suffer from false classification errors of irrelevant odor data. To reduce false classification and misclassification errors, and to improve correct rejection performance; algorithms with a hyperspheric boundary, such as a radial basis function neural network (RBFNN) and generalized regression neural network (GRNN) with a Gaussian activation function in the hidden layer should be used. The simulation results presented in this paper show that GRNN has more correct classification efficiency and false alarm reduction capability compared to RBFNN. As the design of a GRNN and RBFNN is complex and expensive due to large numbers of neuron requirements, a simple hyperspheric classification method based on minimum, maximum, and mean (MMM) values of each class of the training dataset was presented. The MMM algorithm was simple and found to be fast and efficient in correctly classifying data of training classes, and correctly rejecting data of extraneous odors, and thereby reduced false alarms. PMID:28895910
Rajagopal, Rekha; Ranganathan, Vidhyapriya
2018-06-05
Automation in cardiac arrhythmia classification helps medical professionals make accurate decisions about the patient's health. The aim of this work was to design a hybrid classification model to classify cardiac arrhythmias. The design phase of the classification model comprises the following stages: preprocessing of the cardiac signal by eliminating detail coefficients that contain noise, feature extraction through Daubechies wavelet transform, and arrhythmia classification using a collaborative decision from the K nearest neighbor classifier (KNN) and a support vector machine (SVM). The proposed model is able to classify 5 arrhythmia classes as per the ANSI/AAMI EC57: 1998 classification standard. Level 1 of the proposed model involves classification using the KNN and the classifier is trained with examples from all classes. Level 2 involves classification using an SVM and is trained specifically to classify overlapped classes. The final classification of a test heartbeat pertaining to a particular class is done using the proposed KNN/SVM hybrid model. The experimental results demonstrated that the average sensitivity of the proposed model was 92.56%, the average specificity 99.35%, the average positive predictive value 98.13%, the average F-score 94.5%, and the average accuracy 99.78%. The results obtained using the proposed model were compared with the results of discriminant, tree, and KNN classifiers. The proposed model is able to achieve a high classification accuracy.
NASA Astrophysics Data System (ADS)
Wood, W. T.; Runyan, T. E.; Palmsten, M.; Dale, J.; Crawford, C.
2016-12-01
Natural Gas (primarily methane) and gas hydrate accumulations require certain bio-geochemical, as well as physical conditions, some of which are poorly sampled and/or poorly understood. We exploit recent advances in the prediction of seafloor porosity and heat flux via machine learning techniques (e.g. Random forests and Bayesian networks) to predict the occurrence of gas and subsequently gas hydrate in marine sediments. The prediction (actually guided interpolation) of key parameters we use in this study is a K-nearest neighbor technique. KNN requires only minimal pre-processing of the data and predictors, and requires minimal run-time input so the results are almost entirely data-driven. Specifically we use new estimates of sedimentation rate and sediment type, along with recently derived compaction modeling to estimate profiles of porosity and age. We combined the compaction with seafloor heat flux to estimate temperature with depth and geologic age, which, with estimates of organic carbon, and models of methanogenesis yield limits on the production of methane. Results include geospatial predictions of gas (and gas hydrate) accumulations, with quantitative estimates of uncertainty. The Generic Earth Modeling System (GEMS) we have developed to derive the machine learning estimates is modular and easily updated with new algorithms or data.
Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar
2016-04-01
Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data. Copyright © 2016 Elsevier Inc. All rights reserved.
Source Camera Identification and Blind Tamper Detections for Images
2007-04-24
measures and image quality measures in camera identification problem was studied using conjunction with a KNN classifier to identify the feature sets...shots varying from nature scenes .-.. motorala to close-ups of people. We experimented with the KNN *~. * ny classifier (K=5) as well SVM algorithm of...on Acoustic, Speech and Signal Processing (ICASSP), France, May 2006, vol. 5, pp. 401-404. [9] H. Farid and S. Lyu, "Higher-order wavelet statistics
NASA Astrophysics Data System (ADS)
Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying
2018-06-01
In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.
Feature weight estimation for gene selection: a local hyperlinear learning approach
2014-01-01
Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071
LMD Based Features for the Automatic Seizure Detection of EEG Signals Using SVM.
Zhang, Tao; Chen, Wanzhong
2017-08-01
Achieving the goal of detecting seizure activity automatically using electroencephalogram (EEG) signals is of great importance and significance for the treatment of epileptic seizures. To realize this aim, a newly-developed time-frequency analytical algorithm, namely local mean decomposition (LMD), is employed in the presented study. LMD is able to decompose an arbitrary signal into a series of product functions (PFs). Primarily, the raw EEG signal is decomposed into several PFs, and then the temporal statistical and non-linear features of the first five PFs are calculated. The features of each PF are fed into five classifiers, including back propagation neural network (BPNN), K-nearest neighbor (KNN), linear discriminant analysis (LDA), un-optimized support vector machine (SVM) and SVM optimized by genetic algorithm (GA-SVM), for five classification cases, respectively. Confluent features of all PFs and raw EEG are further passed into the high-performance GA-SVM for the same classification tasks. Experimental results on the international public Bonn epilepsy EEG dataset show that the average classification accuracy of the presented approach are equal to or higher than 98.10% in all the five cases, and this indicates the effectiveness of the proposed approach for automated seizure detection.
Galván-Tejada, Carlos E.; Zanella-Calzada, Laura A.; Galván-Tejada, Jorge I.; Celaya-Padilla, José M.; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L.
2017-01-01
Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions. PMID:28216571
Galván-Tejada, Carlos E; Zanella-Calzada, Laura A; Galván-Tejada, Jorge I; Celaya-Padilla, José M; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L
2017-02-14
Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions.
David M. Bell; Matthew J. Gregory; Janet L. Ohmann
2015-01-01
Imputation provides a useful method for mapping forest attributes across broad geographic areas based on field plot measurements and Landsat multi-spectral data, but the resulting map products may be of limited use without corresponding analyses of uncertainties in predictions. In the case of k-nearest neighbor (kNN) imputation with k = 1, such as the Gradient Nearest...
Suyundikov, Anvar; Stevens, John R.; Corcoran, Christopher; Herrick, Jennifer; Wolff, Roger K.; Slattery, Martha L.
2015-01-01
Missing data can arise in bioinformatics applications for a variety of reasons, and imputation methods are frequently applied to such data. We are motivated by a colorectal cancer study where miRNA expression was measured in paired tumor-normal samples of hundreds of patients, but data for many normal samples were missing due to lack of tissue availability. We compare the precision and power performance of several imputation methods, and draw attention to the statistical dependence induced by K-Nearest Neighbors (KNN) imputation. This imputation-induced dependence has not previously been addressed in the literature. We demonstrate how to account for this dependence, and show through simulation how the choice to ignore or account for this dependence affects both power and type I error rate control. PMID:25849489
A triboelectric motion sensor in wearable body sensor network for human activity recognition.
Hui Huang; Xian Li; Ye Sun
2016-08-01
The goal of this study is to design a novel triboelectric motion sensor in wearable body sensor network for human activity recognition. Physical activity recognition is widely used in well-being management, medical diagnosis and rehabilitation. Other than traditional accelerometers, we design a novel wearable sensor system based on triboelectrification. The triboelectric motion sensor can be easily attached to human body and collect motion signals caused by physical activities. The experiments are conducted to collect five common activity data: sitting and standing, walking, climbing upstairs, downstairs, and running. The k-Nearest Neighbor (kNN) clustering algorithm is adopted to recognize these activities and validate the feasibility of this new approach. The results show that our system can perform physical activity recognition with a successful rate over 80% for walking, sitting and standing. The triboelectric structure can also be used as an energy harvester for motion harvesting due to its high output voltage in random low-frequency motion.
Multi-texture local ternary pattern for face recognition
NASA Astrophysics Data System (ADS)
Essa, Almabrok; Asari, Vijayan
2017-05-01
In imagery and pattern analysis domain a variety of descriptors have been proposed and employed for different computer vision applications like face detection and recognition. Many of them are affected under different conditions during the image acquisition process such as variations in illumination and presence of noise, because they totally rely on the image intensity values to encode the image information. To overcome these problems, a novel technique named Multi-Texture Local Ternary Pattern (MTLTP) is proposed in this paper. MTLTP combines the edges and corners based on the local ternary pattern strategy to extract the local texture features of the input image. Then returns a spatial histogram feature vector which is the descriptor for each image that we use to recognize a human being. Experimental results using a k-nearest neighbors classifier (k-NN) on two publicly available datasets justify our algorithm for efficient face recognition in the presence of extreme variations of illumination/lighting environments and slight variation of pose conditions.
Georgouli, Konstantia; Martinez Del Rincon, Jesus; Koidis, Anastasios
2017-02-15
The main objective of this work was to develop a novel dimensionality reduction technique as a part of an integrated pattern recognition solution capable of identifying adulterants such as hazelnut oil in extra virgin olive oil at low percentages based on spectroscopic chemical fingerprints. A novel Continuous Locality Preserving Projections (CLPP) technique is proposed which allows the modelling of the continuous nature of the produced in-house admixtures as data series instead of discrete points. The maintenance of the continuous structure of the data manifold enables the better visualisation of this examined classification problem and facilitates the more accurate utilisation of the manifold for detecting the adulterants. The performance of the proposed technique is validated with two different spectroscopic techniques (Raman and Fourier transform infrared, FT-IR). In all cases studied, CLPP accompanied by k-Nearest Neighbors (kNN) algorithm was found to outperform any other state-of-the-art pattern recognition techniques. Copyright © 2016 Elsevier Ltd. All rights reserved.
2013-01-01
Background Significant efforts have been made to address the problem of identifying short genes in prokaryotic genomes. However, most known methods are not effective in detecting short genes. Because of the limited information contained in short DNA sequences, it is very difficult to accurately distinguish between protein coding and non-coding sequences in prokaryotic genomes. We have developed a new Iteratively Adaptive Sparse Partial Least Squares (IASPLS) algorithm as the classifier to improve the accuracy of the identification process. Results For testing, we chose the short coding and non-coding sequences from seven prokaryotic organisms. We used seven feature sets (including GC content, Z-curve, etc.) of short genes. In comparison with GeneMarkS, Metagene, Orphelia, and Heuristic Approachs methods, our model achieved the best prediction performance in identification of short prokaryotic genes. Even when we focused on the very short length group ([60–100 nt)), our model provided sensitivity as high as 83.44% and specificity as high as 92.8%. These values are two or three times higher than three of the other methods while Metagene fails to recognize genes in this length range. The experiments also proved that the IASPLS can improve the identification accuracy in comparison with other widely used classifiers, i.e. Logistic, Random Forest (RF) and K nearest neighbors (KNN). The accuracy in using IASPLS was improved 5.90% or more in comparison with the other methods. In addition to the improvements in accuracy, IASPLS required ten times less computer time than using KNN or RF. Conclusions It is conclusive that our method is preferable for application as an automated method of short gene classification. Its linearity and easily optimized parameters make it practicable for predicting short genes of newly-sequenced or under-studied species. Reviewers This article was reviewed by Alexey Kondrashov, Rajeev Azad (nominated by Dr J.Peter Gogarten) and Yuriy Fofanov (nominated by Dr Janet Siefert). PMID:24067167
Preserved Network Metrics across Translated Texts
NASA Astrophysics Data System (ADS)
Cabatbat, Josephine Jill T.; Monsanto, Jica P.; Tapang, Giovanni A.
2014-09-01
Co-occurrence language networks based on Bible translations and the Universal Declaration of Human Rights (UDHR) translations in different languages were constructed and compared with random text networks. Among the considered network metrics, the network size, N, the normalized betweenness centrality (BC), and the average k-nearest neighbors, knn, were found to be the most preserved across translations. Moreover, similar frequency distributions of co-occurring network motifs were observed for translated texts networks.
Model-based segmentation of abdominal aortic aneurysms in CTA images
NASA Astrophysics Data System (ADS)
de Bruijne, Marleen; van Ginneken, Bram; Niessen, Wiro J.; Loog, Marco; Viergever, Max A.
2003-05-01
Segmentation of thrombus in abdominal aortic aneurysms is complicated by regions of low boundary contrast and by the presence of many neighboring structures in close proximity to the aneurysm wall. We present an automated method that is similar to the well known Active Shape Models (ASM), combining a three-dimensional shape model with a one-dimensional boundary appearance model. Our contribution is twofold: we developed a non-parametric appearance modeling scheme that effectively deals with a highly varying background, and we propose a way of generalizing models of curvilinear structures from small training sets. In contrast with the conventional ASM approach, the new appearance model trains on both true and false examples of boundary profiles. The probability that a given image profile belongs to the boundary is obtained using k nearest neighbor (kNN) probability density estimation. The performance of this scheme is compared to that of original ASMs, which minimize the Mahalanobis distance to the average true profile in the training set. The generalizability of the shape model is improved by modeling the objects axis deformation independent of its cross-sectional deformation. A leave-one-out experiment was performed on 23 datasets. Segmentation using the kNN appearance model significantly outperformed the original ASM scheme; average volume errors were 5.9% and 46% respectively.
Murugappan, Murugappan; Murugappan, Subbulakshmi; Zheng, Bong Siao
2013-01-01
[Purpose] Intelligent emotion assessment systems have been highly successful in a variety of applications, such as e-learning, psychology, and psycho-physiology. This study aimed to assess five different human emotions (happiness, disgust, fear, sadness, and neutral) using heart rate variability (HRV) signals derived from an electrocardiogram (ECG). [Subjects] Twenty healthy university students (10 males and 10 females) with a mean age of 23 years participated in this experiment. [Methods] All five emotions were induced by audio-visual stimuli (video clips). ECG signals were acquired using 3 electrodes and were preprocessed using a Butterworth 3rd order filter to remove noise and baseline wander. The Pan-Tompkins algorithm was used to derive the HRV signals from ECG. Discrete wavelet transform (DWT) was used to extract statistical features from the HRV signals using four wavelet functions: Daubechies6 (db6), Daubechies7 (db7), Symmlet8 (sym8), and Coiflet5 (coif5). The k-nearest neighbor (KNN) and linear discriminant analysis (LDA) were used to map the statistical features into corresponding emotions. [Results] KNN provided the maximum average emotion classification rate compared to LDA for five emotions (sadness − 50.28%; happiness − 79.03%; fear − 77.78%; disgust − 88.69%; and neutral − 78.34%). [Conclusion] The results of this study indicate that HRV may be a reliable indicator of changes in the emotional state of subjects and provides an approach to the development of a real-time emotion assessment system with a higher reliability than other systems. PMID:24259846
Murugappan, Murugappan; Murugappan, Subbulakshmi; Zheng, Bong Siao
2013-07-01
[Purpose] Intelligent emotion assessment systems have been highly successful in a variety of applications, such as e-learning, psychology, and psycho-physiology. This study aimed to assess five different human emotions (happiness, disgust, fear, sadness, and neutral) using heart rate variability (HRV) signals derived from an electrocardiogram (ECG). [Subjects] Twenty healthy university students (10 males and 10 females) with a mean age of 23 years participated in this experiment. [Methods] All five emotions were induced by audio-visual stimuli (video clips). ECG signals were acquired using 3 electrodes and were preprocessed using a Butterworth 3rd order filter to remove noise and baseline wander. The Pan-Tompkins algorithm was used to derive the HRV signals from ECG. Discrete wavelet transform (DWT) was used to extract statistical features from the HRV signals using four wavelet functions: Daubechies6 (db6), Daubechies7 (db7), Symmlet8 (sym8), and Coiflet5 (coif5). The k-nearest neighbor (KNN) and linear discriminant analysis (LDA) were used to map the statistical features into corresponding emotions. [Results] KNN provided the maximum average emotion classification rate compared to LDA for five emotions (sadness - 50.28%; happiness - 79.03%; fear - 77.78%; disgust - 88.69%; and neutral - 78.34%). [Conclusion] The results of this study indicate that HRV may be a reliable indicator of changes in the emotional state of subjects and provides an approach to the development of a real-time emotion assessment system with a higher reliability than other systems.
Multi-feature classifiers for burst detection in single EEG channels from preterm infants
NASA Astrophysics Data System (ADS)
Navarro, X.; Porée, F.; Kuchenbuch, M.; Chavez, M.; Beuchée, Alain; Carrault, G.
2017-08-01
Objective. The study of electroencephalographic (EEG) bursts in preterm infants provides valuable information about maturation or prognostication after perinatal asphyxia. Over the last two decades, a number of works proposed algorithms to automatically detect EEG bursts in preterm infants, but they were designed for populations under 35 weeks of post menstrual age (PMA). However, as the brain activity evolves rapidly during postnatal life, these solutions might be under-performing with increasing PMA. In this work we focused on preterm infants reaching term ages (PMA ⩾36 weeks) using multi-feature classification on a single EEG channel. Approach. Five EEG burst detectors relying on different machine learning approaches were compared: logistic regression (LR), linear discriminant analysis (LDA), k-nearest neighbors (kNN), support vector machines (SVM) and thresholding (Th). Classifiers were trained by visually labeled EEG recordings from 14 very preterm infants (born after 28 weeks of gestation) with 36-41 weeks PMA. Main results. The most performing classifiers reached about 95% accuracy (kNN, SVM and LR) whereas Th obtained 84%. Compared to human-automatic agreements, LR provided the highest scores (Cohen’s kappa = 0.71) using only three EEG features. Applying this classifier in an unlabeled database of 21 infants ⩾36 weeks PMA, we found that long EEG bursts and short inter-burst periods are characteristic of infants with the highest PMA and weights. Significance. In view of these results, LR-based burst detection could be a suitable tool to study maturation in monitoring or portable devices using a single EEG channel.
Intelligent data analysis: the best approach for chronic heart failure (CHF) follow up management.
Mohammadzadeh, Niloofar; Safdari, Reza; Baraani, Alireza; Mohammadzadeh, Farshid
2014-08-01
Intelligent data analysis has ability to prepare and present complex relations between symptoms and diseases, medical and treatment consequences and definitely has significant role in improving follow-up management of chronic heart failure (CHF) patients, increasing speed and accuracy in diagnosis and treatments; reducing costs, designing and implementation of clinical guidelines. The aim of this article is to describe intelligent data analysis methods in order to improve patient monitoring in follow and treatment of chronic heart failure patients as the best approach for CHF follow up management. Minimum data set (MDS) requirements for monitoring and follow up of CHF patient designed in checklist with six main parts. All CHF patients that discharged in 2013 from Tehran heart center have been selected. The MDS for monitoring CHF patient status were collected during 5 months in three different times of follow up. Gathered data was imported in RAPIDMINER 5 software. Modeling was based on decision trees methods such as C4.5, CHAID, ID3 and k-Nearest Neighbors algorithm (K-NN) with k=1. Final analysis was based on voting method. Decision trees and K-NN evaluate according to Cross-Validation. Creating and using standard terminologies and databases consistent with these terminologies help to meet the challenges related to data collection from various places and data application in intelligent data analysis. It should be noted that intelligent analysis of health data and intelligent system can never replace cardiologists. It can only act as a helpful tool for the cardiologist's decisions making.
A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset.
Kamal, Sarwar; Ripon, Shamim Hasnat; Dey, Nilanjan; Ashour, Amira S; Santhi, V
2016-07-01
In the age of information superhighway, big data play a significant role in information processing, extractions, retrieving and management. In computational biology, the continuous challenge is to manage the biological data. Data mining techniques are sometimes imperfect for new space and time requirements. Thus, it is critical to process massive amounts of data to retrieve knowledge. The existing software and automated tools to handle big data sets are not sufficient. As a result, an expandable mining technique that enfolds the large storage and processing capability of distributed or parallel processing platforms is essential. In this analysis, a contemporary distributed clustering methodology for imbalance data reduction using k-nearest neighbor (K-NN) classification approach has been introduced. The pivotal objective of this work is to illustrate real training data sets with reduced amount of elements or instances. These reduced amounts of data sets will ensure faster data classification and standard storage management with less sensitivity. However, general data reduction methods cannot manage very big data sets. To minimize these difficulties, a MapReduce-oriented framework is designed using various clusters of automated contents, comprising multiple algorithmic approaches. To test the proposed approach, a real DNA (deoxyribonucleic acid) dataset that consists of 90 million pairs has been used. The proposed model reduces the imbalance data sets from large-scale data sets without loss of its accuracy. The obtained results depict that MapReduce based K-NN classifier provided accurate results for big data of DNA. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Thermography based diagnosis of ruptured anterior cruciate ligament (ACL) in canines
NASA Astrophysics Data System (ADS)
Lama, Norsang; Umbaugh, Scott E.; Mishra, Deependra; Dahal, Rohini; Marino, Dominic J.; Sackman, Joseph
2016-09-01
Anterior cruciate ligament (ACL) rupture in canines is a common orthopedic injury in veterinary medicine. Veterinarians use both imaging and non-imaging methods to diagnose the disease. Common imaging methods such as radiography, computed tomography (CT scan) and magnetic resonance imaging (MRI) have some disadvantages: expensive setup, high dose of radiation, and time-consuming. In this paper, we present an alternative diagnostic method based on feature extraction and pattern classification (FEPC) to diagnose abnormal patterns in ACL thermograms. The proposed method was experimented with a total of 30 thermograms for each camera view (anterior, lateral and posterior) including 14 disease and 16 non-disease cases provided from Long Island Veterinary Specialists. The normal and abnormal patterns in thermograms are analyzed in two steps: feature extraction and pattern classification. Texture features based on gray level co-occurrence matrices (GLCM), histogram features and spectral features are extracted from the color normalized thermograms and the computed feature vectors are applied to Nearest Neighbor (NN) classifier, K-Nearest Neighbor (KNN) classifier and Support Vector Machine (SVM) classifier with leave-one-out validation method. The algorithm gives the best classification success rate of 86.67% with a sensitivity of 85.71% and a specificity of 87.5% in ACL rupture detection using NN classifier for the lateral view and Norm-RGB-Lum color normalization method. Our results show that the proposed method has the potential to detect ACL rupture in canines.
NASA Astrophysics Data System (ADS)
Hsieh, Jui-Hua; Wang, Xiang S.; Teotico, Denise; Golbraikh, Alexander; Tropsha, Alexander
2008-09-01
The use of inaccurate scoring functions in docking algorithms may result in the selection of compounds with high predicted binding affinity that nevertheless are known experimentally not to bind to the target receptor. Such falsely predicted binders have been termed `binding decoys'. We posed a question as to whether true binders and decoys could be distinguished based only on their structural chemical descriptors using approaches commonly used in ligand based drug design. We have applied the k-Nearest Neighbor ( kNN) classification QSAR approach to a dataset of compounds characterized as binders or binding decoys of AmpC beta-lactamase. Models were subjected to rigorous internal and external validation as part of our standard workflow and a special QSAR modeling scheme was employed that took into account the imbalanced ratio of inhibitors to non-binders (1:4) in this dataset. 342 predictive models were obtained with correct classification rate (CCR) for both training and test sets as high as 0.90 or higher. The prediction accuracy was as high as 100% (CCR = 1.00) for the external validation set composed of 10 compounds (5 true binders and 5 decoys) selected randomly from the original dataset. For an additional external set of 50 known non-binders, we have achieved the CCR of 0.87 using very conservative model applicability domain threshold. The validated binary kNN QSAR models were further employed for mining the NCGC AmpC screening dataset (69653 compounds). The consensus prediction of 64 compounds identified as screening hits in the AmpC PubChem assay disagreed with their annotation in PubChem but was in agreement with the results of secondary assays. At the same time, 15 compounds were identified as potential binders contrary to their annotation in PubChem. Five of them were tested experimentally and showed inhibitory activities in millimolar range with the highest binding constant Ki of 135 μM. Our studies suggest that validated QSAR models could complement structure based docking and scoring approaches in identifying promising hits by virtual screening of molecular libraries.
NASA Astrophysics Data System (ADS)
Naghibi, Seyed Amir; Moradi Dashtpagerdi, Mostafa
2017-01-01
One important tool for water resources management in arid and semi-arid areas is groundwater potential mapping. In this study, four data-mining models including K-nearest neighbor (KNN), linear discriminant analysis (LDA), multivariate adaptive regression splines (MARS), and quadric discriminant analysis (QDA) were used for groundwater potential mapping to get better and more accurate groundwater potential maps (GPMs). For this purpose, 14 groundwater influence factors were considered, such as altitude, slope angle, slope aspect, plan curvature, profile curvature, slope length, topographic wetness index (TWI), stream power index, distance from rivers, river density, distance from faults, fault density, land use, and lithology. From 842 springs in the study area, in the Khalkhal region of Iran, 70 % (589 springs) were considered for training and 30 % (253 springs) were used as a validation dataset. Then, KNN, LDA, MARS, and QDA models were applied in the R statistical software and the results were mapped as GPMs. Finally, the receiver operating characteristics (ROC) curve was implemented to evaluate the performance of the models. According to the results, the area under the curve of ROCs were calculated as 81.4, 80.5, 79.6, and 79.2 % for MARS, QDA, KNN, and LDA, respectively. So, it can be concluded that the performances of KNN and LDA were acceptable and the performances of MARS and QDA were excellent. Also, the results depicted high contribution of altitude, TWI, slope angle, and fault density, while plan curvature and land use were seen to be the least important factors.
[Terahertz Spectroscopic Identification with Deep Belief Network].
Ma, Shuai; Shen, Tao; Wang, Rui-qi; Lai, Hua; Yu, Zheng-tao
2015-12-01
Feature extraction and classification are the key issues of terahertz spectroscopy identification. Because many materials have no apparent absorption peaks in the terahertz band, it is difficult to extract theirs terahertz spectroscopy feature and identify. To this end, a novel of identify terahertz spectroscopy approach with Deep Belief Network (DBN) was studied in this paper, which combines the advantages of DBN and K-Nearest Neighbors (KNN) classifier. Firstly, cubic spline interpolation and S-G filter were used to normalize the eight kinds of substances (ATP, Acetylcholine Bromide, Bifenthrin, Buprofezin, Carbazole, Bleomycin, Buckminster and Cylotriphosphazene) terahertz transmission spectra in the range of 0.9-6 THz. Secondly, the DBN model was built by two restricted Boltzmann machine (RBM) and then trained layer by layer using unsupervised approach. Instead of using handmade features, the DBN was employed to learn suitable features automatically with raw input data. Finally, a KNN classifier was applied to identify the terahertz spectrum. Experimental results show that using the feature learned by DBN can identify the terahertz spectrum of different substances with the recognition rate of over 90%, which demonstrates that the proposed method can automatically extract the effective features of terahertz spectrum. Furthermore, this KNN classifier was compared with others (BP neural network, SOM neural network and RBF neural network). Comparisons showed that the recognition rate of KNN classifier is better than the other three classifiers. Using the approach that automatic extract terahertz spectrum features by DBN can greatly reduce the workload of feature extraction. This proposed method shows a promising future in the application of identifying the mass terahertz spectroscopy.
Automated classification of neurological disorders of gait using spatio-temporal gait parameters.
Pradhan, Cauchy; Wuehr, Max; Akrami, Farhoud; Neuhaeusser, Maximilian; Huth, Sabrina; Brandt, Thomas; Jahn, Klaus; Schniepp, Roman
2015-04-01
Automated pattern recognition systems have been used for accurate identification of neurological conditions as well as the evaluation of the treatment outcomes. This study aims to determine the accuracy of diagnoses of (oto-)neurological gait disorders using different types of automated pattern recognition techniques. Clinically confirmed cases of phobic postural vertigo (N = 30), cerebellar ataxia (N = 30), progressive supranuclear palsy (N = 30), bilateral vestibulopathy (N = 30), as well as healthy subjects (N = 30) were recruited for the study. 8 measurements with 136 variables using a GAITRite(®) sensor carpet were obtained from each subject. Subjects were randomly divided into two groups (training cases and validation cases). Sensitivity and specificity of k-nearest neighbor (KNN), naive-bayes classifier (NB), artificial neural network (ANN), and support vector machine (SVM) in classifying the validation cases were calculated. ANN and SVM had the highest overall sensitivity with 90.6% and 92.0% respectively, followed by NB (76.0%) and KNN (73.3%). SVM and ANN showed high false negative rates for bilateral vestibulopathy cases (20.0% and 26.0%); while KNN and NB had high false negative rates for progressive supranuclear palsy cases (76.7% and 40.0%). Automated pattern recognition systems are able to identify pathological gait patterns and establish clinical diagnosis with good accuracy. SVM and ANN in particular differentiate gait patterns of several distinct oto-neurological disorders of gait with high sensitivity and specificity compared to KNN and NB. Both SVM and ANN appear to be a reliable diagnostic and management tool for disorders of gait. Copyright © 2015 Elsevier Ltd. All rights reserved.
Goshvarpour, Ateke; Goshvarpour, Atefeh
2018-04-30
Heart rate variability (HRV) analysis has become a widely used tool for monitoring pathological and psychological states in medical applications. In a typical classification problem, information fusion is a process whereby the effective combination of the data can achieve a more accurate system. The purpose of this article was to provide an accurate algorithm for classifying HRV signals in various psychological states. Therefore, a novel feature level fusion approach was proposed. First, using the theory of information, two similarity indicators of the signal were extracted, including correntropy and Cauchy-Schwarz divergence. Applying probabilistic neural network (PNN) and k-nearest neighbor (kNN), the performance of each index in the classification of meditators and non-meditators HRV signals was appraised. Then, three fusion rules, including division, product, and weighted sum rules were used to combine the information of both similarity measures. For the first time, we propose an algorithm to define the weights of each feature based on the statistical p-values. The performance of HRV classification using combined features was compared with the non-combined features. Totally, the accuracy of 100% was obtained for discriminating all states. The results showed the strong ability and proficiency of division and weighted sum rules in the improvement of the classifier accuracies.
Liu, Kai-Chun; Chan, Chia-Tai
2017-01-01
The proportion of the aging population is rapidly increasing around the world, which will cause stress on society and healthcare systems. In recent years, advances in technology have created new opportunities for automatic activities of daily living (ADL) monitoring to improve the quality of life and provide adequate medical service for the elderly. Such automatic ADL monitoring requires reliable ADL information on a fine-grained level, especially for the status of interaction between body gestures and the environment in the real-world. In this work, we propose a significant change spotting mechanism for periodic human motion segmentation during cleaning task performance. A novel approach is proposed based on the search for a significant change of gestures, which can manage critical technical issues in activity recognition, such as continuous data segmentation, individual variance, and category ambiguity. Three typical machine learning classification algorithms are utilized for the identification of the significant change candidate, including a Support Vector Machine (SVM), k-Nearest Neighbors (kNN), and Naive Bayesian (NB) algorithm. Overall, the proposed approach achieves 96.41% in the F1-score by using the SVM classifier. The results show that the proposed approach can fulfill the requirement of fine-grained human motion segmentation for automatic ADL monitoring. PMID:28106853
Diagnosis of Tempromandibular Disorders Using Local Binary Patterns.
Haghnegahdar, A A; Kolahi, S; Khojastepour, L; Tajeripour, F
2018-03-01
Temporomandibular joint disorder (TMD) might be manifested as structural changes in bone through modification, adaptation or direct destruction. We propose to use Local Binary Pattern (LBP) characteristics and histogram-oriented gradients on the recorded images as a diagnostic tool in TMD assessment. CBCT images of 66 patients (132 joints) with TMD and 66 normal cases (132 joints) were collected and 2 coronal cut prepared from each condyle, although images were limited to head of mandibular condyle. In order to extract features of images, first we use LBP and then histogram of oriented gradients. To reduce dimensionality, the linear algebra Singular Value Decomposition (SVD) is applied to the feature vectors matrix of all images. For evaluation, we used K nearest neighbor (K-NN), Support Vector Machine, Naïve Bayesian and Random Forest classifiers. We used Receiver Operating Characteristic (ROC) to evaluate the hypothesis. K nearest neighbor classifier achieves a very good accuracy (0.9242), moreover, it has desirable sensitivity (0.9470) and specificity (0.9015) results, when other classifiers have lower accuracy, sensitivity and specificity. We proposed a fully automatic approach to detect TMD using image processing techniques based on local binary patterns and feature extraction. K-NN has been the best classifier for our experiments in detecting patients from healthy individuals, by 92.42% accuracy, 94.70% sensitivity and 90.15% specificity. The proposed method can help automatically diagnose TMD at its initial stages.
Machine Learning and Computer Vision System for Phenotype Data Acquisition and Analysis in Plants.
Navarro, Pedro J; Pérez, Fernando; Weiss, Julia; Egea-Cortines, Marcos
2016-05-05
Phenomics is a technology-driven approach with promising future to obtain unbiased data of biological systems. Image acquisition is relatively simple. However data handling and analysis are not as developed compared to the sampling capacities. We present a system based on machine learning (ML) algorithms and computer vision intended to solve the automatic phenotype data analysis in plant material. We developed a growth-chamber able to accommodate species of various sizes. Night image acquisition requires near infrared lightning. For the ML process, we tested three different algorithms: k-nearest neighbour (kNN), Naive Bayes Classifier (NBC), and Support Vector Machine. Each ML algorithm was executed with different kernel functions and they were trained with raw data and two types of data normalisation. Different metrics were computed to determine the optimal configuration of the machine learning algorithms. We obtained a performance of 99.31% in kNN for RGB images and a 99.34% in SVM for NIR. Our results show that ML techniques can speed up phenomic data analysis. Furthermore, both RGB and NIR images can be segmented successfully but may require different ML algorithms for segmentation.
Development of machine learning models for diagnosis of glaucoma.
Kim, Seong Jae; Cho, Kyong Jin; Oh, Sejong
2017-01-01
The study aimed to develop machine learning models that have strong prediction power and interpretability for diagnosis of glaucoma based on retinal nerve fiber layer (RNFL) thickness and visual field (VF). We collected various candidate features from the examination of retinal nerve fiber layer (RNFL) thickness and visual field (VF). We also developed synthesized features from original features. We then selected the best features proper for classification (diagnosis) through feature evaluation. We used 100 cases of data as a test dataset and 399 cases of data as a training and validation dataset. To develop the glaucoma prediction model, we considered four machine learning algorithms: C5.0, random forest (RF), support vector machine (SVM), and k-nearest neighbor (KNN). We repeatedly composed a learning model using the training dataset and evaluated it by using the validation dataset. Finally, we got the best learning model that produces the highest validation accuracy. We analyzed quality of the models using several measures. The random forest model shows best performance and C5.0, SVM, and KNN models show similar accuracy. In the random forest model, the classification accuracy is 0.98, sensitivity is 0.983, specificity is 0.975, and AUC is 0.979. The developed prediction models show high accuracy, sensitivity, specificity, and AUC in classifying among glaucoma and healthy eyes. It will be used for predicting glaucoma against unknown examination records. Clinicians may reference the prediction results and be able to make better decisions. We may combine multiple learning models to increase prediction accuracy. The C5.0 model includes decision rules for prediction. It can be used to explain the reasons for specific predictions.
Zheng, Bin; Lu, Amy; Hardesty, Lara A; Sumkin, Jules H; Hakim, Christiane M; Ganott, Marie A; Gur, David
2006-01-01
The purpose of this study was to develop and test a method for selecting "visually similar" regions of interest depicting breast masses from a reference library to be used in an interactive computer-aided diagnosis (CAD) environment. A reference library including 1000 malignant mass regions and 2000 benign and CAD-generated false-positive regions was established. When a suspicious mass region is identified, the scheme segments the region and searches for similar regions from the reference library using a multifeature based k-nearest neighbor (KNN) algorithm. To improve selection of reference images, we added an interactive step. All actual masses in the reference library were subjectively rated on a scale from 1 to 9 as to their "visual margins speculations". When an observer identifies a suspected mass region during a case interpretation he/she first rates the margins and the computerized search is then limited only to regions rated as having similar levels of spiculation (within +/-1 scale difference). In an observer preference study including 85 test regions, two sets of the six "similar" reference regions selected by the KNN with and without the interactive step were displayed side by side with each test region. Four radiologists and five nonclinician observers selected the more appropriate ("similar") reference set in a two alternative forced choice preference experiment. All four radiologists and five nonclinician observers preferred the sets of regions selected by the interactive method with an average frequency of 76.8% and 74.6%, respectively. The overall preference for the interactive method was highly significant (p < 0.001). The study demonstrated that a simple interactive approach that includes subjectively perceived ratings of one feature alone namely, a rating of margin "spiculation," could substantially improve the selection of "visually similar" reference images.
Does rational selection of training and test sets improve the outcome of QSAR modeling?
Martin, Todd M; Harten, Paul; Young, Douglas M; Muratov, Eugene N; Golbraikh, Alexander; Zhu, Hao; Tropsha, Alexander
2012-10-22
Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.
Reuse of imputed data in microarray analysis increases imputation efficiency
Kim, Ki-Yeol; Kim, Byoung-Jin; Yi, Gwan-Su
2004-01-01
Background The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the methods was low and the validity of imputed values in these methods had not been fully checked. Results We developed a new cluster-based imputation method called sequential K-nearest neighbor (SKNN) method. This imputes the missing values sequentially from the gene having least missing values, and uses the imputed values for the later imputation. Although it uses the imputed values, the efficiency of this new method is greatly improved in its accuracy and computational complexity over the conventional KNN-based method and other methods based on maximum likelihood estimation. The performance of SKNN was in particular higher than other imputation methods for the data with high missing rates and large number of experiments. Application of Expectation Maximization (EM) to the SKNN method improved the accuracy, but increased computational time proportional to the number of iterations. The Multiple Imputation (MI) method, which is well known but not applied previously to microarray data, showed a similarly high accuracy as the SKNN method, with slightly higher dependency on the types of data sets. Conclusions Sequential reuse of imputed data in KNN-based imputation greatly increases the efficiency of imputation. The SKNN method should be practically useful to save the data of some microarray experiments which have high amounts of missing entries. The SKNN method generates reliable imputed values which can be used for further cluster-based analysis of microarray data. PMID:15504240
NASA Astrophysics Data System (ADS)
Adeniyi, D. A.; Wei, Z.; Yang, Y.
2017-10-01
Recommendation problem has been extensively studied by researchers in the field of data mining, database and information retrieval. This study presents the design and realisation of an automated, personalised news recommendations system based on Chi-square statistics-based K-nearest neighbour (χ2SB-KNN) model. The proposed χ2SB-KNN model has the potential to overcome computational complexity and information overloading problems, reduces runtime and speeds up execution process through the use of critical value of χ2 distribution. The proposed recommendation engine can alleviate scalability challenges through combined online pattern discovery and pattern matching for real-time recommendations. This work also showcases the development of a novel method of feature selection referred to as Data Discretisation-Based feature selection method. This is used for selecting the best features for the proposed χ2SB-KNN algorithm at the preprocessing stage of the classification procedures. The implementation of the proposed χ2SB-KNN model is achieved through the use of a developed in-house Java program on an experimental website called OUC newsreaders' website. Finally, we compared the performance of our system with two baseline methods which are traditional Euclidean distance K-nearest neighbour and Naive Bayesian techniques. The result shows a significant improvement of our method over the baseline methods studied.
On prognostic models, artificial intelligence and censored observations.
Anand, S S; Hamilton, P W; Hughes, J G; Bell, D A
2001-03-01
The development of prognostic models for assisting medical practitioners with decision making is not a trivial task. Models need to possess a number of desirable characteristics and few, if any, current modelling approaches based on statistical or artificial intelligence can produce models that display all these characteristics. The inability of modelling techniques to provide truly useful models has led to interest in these models being purely academic in nature. This in turn has resulted in only a very small percentage of models that have been developed being deployed in practice. On the other hand, new modelling paradigms are being proposed continuously within the machine learning and statistical community and claims, often based on inadequate evaluation, being made on their superiority over traditional modelling methods. We believe that for new modelling approaches to deliver true net benefits over traditional techniques, an evaluation centric approach to their development is essential. In this paper we present such an evaluation centric approach to developing extensions to the basic k-nearest neighbour (k-NN) paradigm. We use standard statistical techniques to enhance the distance metric used and a framework based on evidence theory to obtain a prediction for the target example from the outcome of the retrieved exemplars. We refer to this new k-NN algorithm as Censored k-NN (Ck-NN). This reflects the enhancements made to k-NN that are aimed at providing a means for handling censored observations within k-NN.
Supervised filters for EEG signal in naturally occurring epilepsy forecasting.
Muñoz-Almaraz, Francisco Javier; Zamora-Martínez, Francisco; Botella-Rocamora, Paloma; Pardo, Juan
2017-01-01
Nearly 1% of the global population has Epilepsy. Forecasting epileptic seizures with an acceptable confidence level, could improve the disease treatment and thus the lifestyle of the people who suffer it. To do that the electroencephalogram (EEG) signal is usually studied through spectral power band filtering, but this paper proposes an alternative novel method of preprocessing the EEG signal based on supervised filters. Such filters have been employed in a machine learning algorithm, such as the K-Nearest Neighbor (KNN), to improve the prediction of seizures. The proposed solution extends with this novel approach an algorithm that was submitted to win the third prize of an international Data Science challenge promoted by Kaggle contest platform and the American Epilepsy Society, the Epilepsy Foundation, National Institutes of Health (NIH) and Mayo Clinic. A formal description of these preprocessing methods is presented and a detailed analysis in terms of Receiver Operating Characteristics (ROC) curve and Area Under ROC curve is performed. The obtained results show statistical significant improvements when compared with the spectral power band filtering (PBF) typical baseline. A trend between performance and the dataset size is observed, suggesting that the supervised filters bring better information, compared to the conventional PBF filters, as the dataset grows in terms of monitored variables (sensors) and time length. The paper demonstrates a better accuracy in forecasting when new filters are employed and its main contribution is in the field of machine learning algorithms to develop more accurate predictive systems.
Supervised filters for EEG signal in naturally occurring epilepsy forecasting
2017-01-01
Nearly 1% of the global population has Epilepsy. Forecasting epileptic seizures with an acceptable confidence level, could improve the disease treatment and thus the lifestyle of the people who suffer it. To do that the electroencephalogram (EEG) signal is usually studied through spectral power band filtering, but this paper proposes an alternative novel method of preprocessing the EEG signal based on supervised filters. Such filters have been employed in a machine learning algorithm, such as the K-Nearest Neighbor (KNN), to improve the prediction of seizures. The proposed solution extends with this novel approach an algorithm that was submitted to win the third prize of an international Data Science challenge promoted by Kaggle contest platform and the American Epilepsy Society, the Epilepsy Foundation, National Institutes of Health (NIH) and Mayo Clinic. A formal description of these preprocessing methods is presented and a detailed analysis in terms of Receiver Operating Characteristics (ROC) curve and Area Under ROC curve is performed. The obtained results show statistical significant improvements when compared with the spectral power band filtering (PBF) typical baseline. A trend between performance and the dataset size is observed, suggesting that the supervised filters bring better information, compared to the conventional PBF filters, as the dataset grows in terms of monitored variables (sensors) and time length. The paper demonstrates a better accuracy in forecasting when new filters are employed and its main contribution is in the field of machine learning algorithms to develop more accurate predictive systems. PMID:28632737
An Integrated Ransac and Graph Based Mismatch Elimination Approach for Wide-Baseline Image Matching
NASA Astrophysics Data System (ADS)
Hasheminasab, M.; Ebadi, H.; Sedaghat, A.
2015-12-01
In this paper we propose an integrated approach in order to increase the precision of feature point matching. Many different algorithms have been developed as to optimizing the short-baseline image matching while because of illumination differences and viewpoints changes, wide-baseline image matching is so difficult to handle. Fortunately, the recent developments in the automatic extraction of local invariant features make wide-baseline image matching possible. The matching algorithms which are based on local feature similarity principle, using feature descriptor as to establish correspondence between feature point sets. To date, the most remarkable descriptor is the scale-invariant feature transform (SIFT) descriptor , which is invariant to image rotation and scale, and it remains robust across a substantial range of affine distortion, presence of noise, and changes in illumination. The epipolar constraint based on RANSAC (random sample consensus) method is a conventional model for mismatch elimination, particularly in computer vision. Because only the distance from the epipolar line is considered, there are a few false matches in the selected matching results based on epipolar geometry and RANSAC. Aguilariu et al. proposed Graph Transformation Matching (GTM) algorithm to remove outliers which has some difficulties when the mismatched points surrounded by the same local neighbor structure. In this study to overcome these limitations, which mentioned above, a new three step matching scheme is presented where the SIFT algorithm is used to obtain initial corresponding point sets. In the second step, in order to reduce the outliers, RANSAC algorithm is applied. Finally, to remove the remained mismatches, based on the adjacent K-NN graph, the GTM is implemented. Four different close range image datasets with changes in viewpoint are utilized to evaluate the performance of the proposed method and the experimental results indicate its robustness and capability.
Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.
In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less
Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric
Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.; ...
2015-10-09
In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less
Belekar, Vilas; Lingineni, Karthik; Garg, Prabha
2015-01-01
The breast cancer resistant protein (BCRP) is an important transporter and its inhibitors play an important role in cancer treatment by improving the oral bioavailability as well as blood brain barrier (BBB) permeability of anticancer drugs. In this work, a computational model was developed to predict the compounds as BCRP inhibitors or non-inhibitors. Various machine learning approaches like, support vector machine (SVM), k-nearest neighbor (k-NN) and artificial neural network (ANN) were used to develop the models. The Matthews correlation coefficients (MCC) of developed models using ANN, k-NN and SVM are 0.67, 0.71 and 0.77, and prediction accuracies are 85.2%, 88.3% and 90.8% respectively. The developed models were tested with a test set of 99 compounds and further validated with external set of 98 compounds. Distribution plot analysis and various machine learning models were also developed based on druglikeness descriptors. Applicability domain is used to check the prediction reliability of the new molecules.
Zhang, Y N
2017-01-01
Parkinson's disease (PD) is primarily diagnosed by clinical examinations, such as walking test, handwriting test, and MRI diagnostic. In this paper, we propose a machine learning based PD telediagnosis method for smartphone. Classification of PD using speech records is a challenging task owing to the fact that the classification accuracy is still lower than doctor-level. Here we demonstrate automatic classification of PD using time frequency features, stacked autoencoders (SAE), and K nearest neighbor (KNN) classifier. KNN classifier can produce promising classification results from useful representations which were learned by SAE. Empirical results show that the proposed method achieves better performance with all tested cases across classification tasks, demonstrating machine learning capable of classifying PD with a level of competence comparable to doctor. It concludes that a smartphone can therefore potentially provide low-cost PD diagnostic care. This paper also gives an implementation on browser/server system and reports the running time cost. Both advantages and disadvantages of the proposed telediagnosis system are discussed.
2017-01-01
Parkinson's disease (PD) is primarily diagnosed by clinical examinations, such as walking test, handwriting test, and MRI diagnostic. In this paper, we propose a machine learning based PD telediagnosis method for smartphone. Classification of PD using speech records is a challenging task owing to the fact that the classification accuracy is still lower than doctor-level. Here we demonstrate automatic classification of PD using time frequency features, stacked autoencoders (SAE), and K nearest neighbor (KNN) classifier. KNN classifier can produce promising classification results from useful representations which were learned by SAE. Empirical results show that the proposed method achieves better performance with all tested cases across classification tasks, demonstrating machine learning capable of classifying PD with a level of competence comparable to doctor. It concludes that a smartphone can therefore potentially provide low-cost PD diagnostic care. This paper also gives an implementation on browser/server system and reports the running time cost. Both advantages and disadvantages of the proposed telediagnosis system are discussed. PMID:29075547
Borghi, Giacomo; Tabacchini, Valerio; Schaart, Dennis R
2016-07-07
Gamma-ray detectors based on thick monolithic scintillator crystals can achieve spatial resolutions <2 mm full-width-at-half-maximum (FWHM) and coincidence resolving times (CRTs) better than 200 ps FWHM. Moreover, they provide high sensitivity and depth-of-interaction (DOI) information. While these are excellent characteristics for clinical time-of-flight (TOF) positron emission tomography (PET), the application of monolithic scintillators has so far been hampered by the lengthy and complex procedures needed for position- and time-of-interaction estimation. Here, the algorithms previously developed in our group are revised to make the calibration and operation of a large number of monolithic scintillator detectors in a TOF-PET system practical. In particular, the k-nearest neighbor (k-NN) classification method for x,y-position estimation is accelerated with an algorithm that quickly preselects only the most useful reference events, reducing the computation time for position estimation by a factor of ~200 compared to the previously published k-NN 1D method. Also, the procedures for estimating the DOI and time of interaction are revised to enable full detector calibration by means of fan-beam or flood irradiations only. Moreover, a new technique is presented to allow the use of events in which some of the photosensor pixel values and/or timestamps are missing (e.g. due to dead time), so as to further increase system sensitivity. The accelerated methods were tested on a monolithic scintillator detector specifically developed for clinical PET applications, consisting of a 32 mm × 32 mm × 22 mm LYSO : Ce crystal coupled to a digital photon counter (DPC) array. This resulted in a spatial resolution of 1.7 mm FWHM, an average DOI resolution of 3.7 mm FWHM, and a CRT of 214 ps. Moreover, the possibility of using events missing the information of up to 16 out of 64 photosensor pixels is shown. This results in only a small deterioration of the detector performance.
Latent Dirichlet Allocation (LDA) Model and kNN Algorithm to Classify Research Project Selection
NASA Astrophysics Data System (ADS)
Safi’ie, M. A.; Utami, E.; Fatta, H. A.
2018-03-01
Universitas Sebelas Maret has a teaching staff more than 1500 people, and one of its tasks is to carry out research. In the other side, the funding support for research and service is limited, so there is need to be evaluated to determine the Research proposal submission and devotion on society (P2M). At the selection stage, research proposal documents are collected as unstructured data and the data stored is very large. To extract information contained in the documents therein required text mining technology. This technology applied to gain knowledge to the documents by automating the information extraction. In this articles we use Latent Dirichlet Allocation (LDA) to the documents as a model in feature extraction process, to get terms that represent its documents. Hereafter we use k-Nearest Neighbour (kNN) algorithm to classify the documents based on its terms.
Kalegowda, Yogesh; Harmer, Sarah L
2012-03-20
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.
NASA Astrophysics Data System (ADS)
Chen, Yi-Ying; Chu, Chia-Ren; Li, Ming-Hsu
2012-10-01
SummaryIn this paper we present a semi-parametric multivariate gap-filling model for tower-based measurement of latent heat flux (LE). Two statistical techniques, the principal component analysis (PCA) and a nonlinear interpolation approach were integrated into this LE gap-filling model. The PCA was first used to resolve the multicollinearity relationships among various environmental variables, including radiation, soil moisture deficit, leaf area index, wind speed, etc. Two nonlinear interpolation methods, multiple regressions (MRS) and the K-nearest neighbors (KNNs) were examined with random selected flux gaps for both clear sky and nighttime/cloudy data to incorporate into this LE gap-filling model. Experimental results indicated that the KNN interpolation approach is able to provide consistent LE estimations while MRS presents over estimations during nighttime/cloudy. Rather than using empirical regression parameters, the KNN approach resolves the nonlinear relationship between the gap-filled LE flux and principal components with adaptive K values under different atmospheric states. The developed LE gap-filling model (PCA with KNN) works with a RMSE of 2.4 W m-2 (˜0.09 mm day-1) at a weekly time scale by adding 40% artificial flux gaps into original dataset. Annual evapotranspiration at this study site were estimated at 736 mm (1803 MJ) and 728 mm (1785 MJ) for year 2008 and 2009, respectively.
Silva, Carlos Alberto; Klauberg, Carine; Hudak, Andrew T; Vierling, Lee A; Liesenberg, Veraldo; Bernett, Luiz G; Scheraiber, Clewerson F; Schoeninger, Emerson R
2018-01-01
Accurate forest inventory is of great economic importance to optimize the entire supply chain management in pulp and paper companies. The aim of this study was to estimate stand dominate and mean heights (HD and HM) and tree density (TD) of Pinus taeda plantations located in South Brazil using in-situ measurements, airborne Light Detection and Ranging (LiDAR) data and the non- k-nearest neighbor (k-NN) imputation. Forest inventory attributes and LiDAR derived metrics were calculated at 53 regular sample plots and we used imputation models to retrieve the forest attributes at plot and landscape-levels. The best LiDAR-derived metrics to predict HD, HM and TD were H99TH, HSD, SKE and HMIN. The Imputation model using the selected metrics was more effective for retrieving height than tree density. The model coefficients of determination (adj.R2) and a root mean squared difference (RMSD) for HD, HM and TD were 0.90, 0.94, 0.38m and 6.99, 5.70, 12.92%, respectively. Our results show that LiDAR and k-NN imputation can be used to predict stand heights with high accuracy in Pinus taeda. However, furthers studies need to be realized to improve the accuracy prediction of TD and to evaluate and compare the cost of acquisition and processing of LiDAR data against the conventional inventory procedures.
Diagnosis of Tempromandibular Disorders Using Local Binary Patterns
Haghnegahdar, A.A.; Kolahi, S.; Khojastepour, L.; Tajeripour, F.
2018-01-01
Background: Temporomandibular joint disorder (TMD) might be manifested as structural changes in bone through modification, adaptation or direct destruction. We propose to use Local Binary Pattern (LBP) characteristics and histogram-oriented gradients on the recorded images as a diagnostic tool in TMD assessment. Material and Methods: CBCT images of 66 patients (132 joints) with TMD and 66 normal cases (132 joints) were collected and 2 coronal cut prepared from each condyle, although images were limited to head of mandibular condyle. In order to extract features of images, first we use LBP and then histogram of oriented gradients. To reduce dimensionality, the linear algebra Singular Value Decomposition (SVD) is applied to the feature vectors matrix of all images. For evaluation, we used K nearest neighbor (K-NN), Support Vector Machine, Naïve Bayesian and Random Forest classifiers. We used Receiver Operating Characteristic (ROC) to evaluate the hypothesis. Results: K nearest neighbor classifier achieves a very good accuracy (0.9242), moreover, it has desirable sensitivity (0.9470) and specificity (0.9015) results, when other classifiers have lower accuracy, sensitivity and specificity. Conclusion: We proposed a fully automatic approach to detect TMD using image processing techniques based on local binary patterns and feature extraction. K-NN has been the best classifier for our experiments in detecting patients from healthy individuals, by 92.42% accuracy, 94.70% sensitivity and 90.15% specificity. The proposed method can help automatically diagnose TMD at its initial stages. PMID:29732343
SubCellProt: predicting protein subcellular localization using machine learning approaches.
Garg, Prabha; Sharma, Virag; Chaudhari, Pradeep; Roy, Nilanjan
2009-01-01
High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.
Capurso, Daniel; Bengtsson, Henrik; Segal, Mark R.
2016-01-01
The spatial organization of the genome influences cellular function, notably gene regulation. Recent studies have assessed the three-dimensional (3D) co-localization of functional annotations (e.g. centromeres, long terminal repeats) using 3D genome reconstructions from Hi-C (genome-wide chromosome conformation capture) data; however, corresponding assessments for continuous functional genomic data (e.g. chromatin immunoprecipitation-sequencing (ChIP-seq) peak height) are lacking. Here, we demonstrate that applying bump hunting via the patient rule induction method (PRIM) to ChIP-seq data superposed on a Saccharomyces cerevisiae 3D genome reconstruction can discover ‘functional 3D hotspots’, regions in 3-space for which the mean ChIP-seq peak height is significantly elevated. For the transcription factor Swi6, the top hotspot by P-value contains MSB2 and ERG11 – known Swi6 target genes on different chromosomes. We verify this finding in a number of ways. First, this top hotspot is relatively stable under PRIM across parameter settings. Second, this hotspot is among the top hotspots by mean outcome identified by an alternative algorithm, k-Nearest Neighbor (k-NN) regression. Third, the distance between MSB2 and ERG11 is smaller than expected (by resampling) in two other 3D reconstructions generated via different normalization and reconstruction algorithms. This analytic approach can discover functional 3D hotspots and potentially reveal novel regulatory interactions. PMID:26869583
Wu, Stephen Gang; Wang, Yuxuan; Jiang, Wu; Oyetunde, Tolutola; Yao, Ruilian; Zhang, Xuehong; Shimizu, Kazuyuki; Tang, Yinjie J; Bao, Forrest Sheng
2016-04-01
13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species.
Wu, Stephen Gang; Wang, Yuxuan; Jiang, Wu; Oyetunde, Tolutola; Yao, Ruilian; Zhang, Xuehong; Shimizu, Kazuyuki; Tang, Yinjie J.; Bao, Forrest Sheng
2016-01-01
13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species. PMID:27092947
Khondoker, Mizanur; Dobson, Richard; Skirrow, Caroline; Simmons, Andrew; Stahl, Daniel
2016-10-01
Recent literature on the comparison of machine learning methods has raised questions about the neutrality, unbiasedness and utility of many comparative studies. Reporting of results on favourable datasets and sampling error in the estimated performance measures based on single samples are thought to be the major sources of bias in such comparisons. Better performance in one or a few instances does not necessarily imply so on an average or on a population level and simulation studies may be a better alternative for objectively comparing the performances of machine learning algorithms. We compare the classification performance of a number of important and widely used machine learning algorithms, namely the Random Forests (RF), Support Vector Machines (SVM), Linear Discriminant Analysis (LDA) and k-Nearest Neighbour (kNN). Using massively parallel processing on high-performance supercomputers, we compare the generalisation errors at various combinations of levels of several factors: number of features, training sample size, biological variation, experimental variation, effect size, replication and correlation between features. For smaller number of correlated features, number of features not exceeding approximately half the sample size, LDA was found to be the method of choice in terms of average generalisation errors as well as stability (precision) of error estimates. SVM (with RBF kernel) outperforms LDA as well as RF and kNN by a clear margin as the feature set gets larger provided the sample size is not too small (at least 20). The performance of kNN also improves as the number of features grows and outplays that of LDA and RF unless the data variability is too high and/or effect sizes are too small. RF was found to outperform only kNN in some instances where the data are more variable and have smaller effect sizes, in which cases it also provide more stable error estimates than kNN and LDA. Applications to a number of real datasets supported the findings from the simulation study. © The Author(s) 2013.
EEG-based mild depressive detection using feature selection methods and classifiers.
Li, Xiaowei; Hu, Bin; Sun, Shuting; Cai, Hanshu
2016-11-01
Depression has become a major health burden worldwide, and effectively detection of such disorder is a great challenge which requires latest technological tool, such as Electroencephalography (EEG). This EEG-based research seeks to find prominent frequency band and brain regions that are most related to mild depression, as well as an optimal combination of classification algorithms and feature selection methods which can be used in future mild depression detection. An experiment based on facial expression viewing task (Emo_block and Neu_block) was conducted, and EEG data of 37 university students were collected using a 128 channel HydroCel Geodesic Sensor Net (HCGSN). For discriminating mild depressive patients and normal controls, BayesNet (BN), Support Vector Machine (SVM), Logistic Regression (LR), k-nearest neighbor (KNN) and RandomForest (RF) classifiers were used. And BestFirst (BF), GreedyStepwise (GSW), GeneticSearch (GS), LinearForwordSelection (LFS) and RankSearch (RS) based on Correlation Features Selection (CFS) were applied for linear and non-linear EEG features selection. Independent Samples T-test with Bonferroni correction was used to find the significantly discriminant electrodes and features. Data mining results indicate that optimal performance is achieved using a combination of feature selection method GSW based on CFS and classifier KNN for beta frequency band. Accuracies achieved 92.00% and 98.00%, and AUC achieved 0.957 and 0.997, for Emo_block and Neu_block beta band data respectively. T-test results validate the effectiveness of selected features by search method GSW. Simplified EEG system with only FP1, FP2, F3, O2, T3 electrodes was also explored with linear features, which yielded accuracies of 91.70% and 96.00%, AUC of 0.952 and 0.972, for Emo_block and Neu_block respectively. Classification results obtained by GSW + KNN are encouraging and better than previously published results. In the spatial distribution of features, we find that left parietotemporal lobe in beta EEG frequency band has greater effect on mild depression detection. And fewer EEG channels (FP1, FP2, F3, O2 and T3) combined with linear features may be good candidates for usage in portable systems for mild depression detection. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Han, Te; Jiang, Dongxiang; Zhang, Xiaochen; Sun, Yankui
2017-03-27
Rotating machinery is widely used in industrial applications. With the trend towards more precise and more critical operating conditions, mechanical failures may easily occur. Condition monitoring and fault diagnosis (CMFD) technology is an effective tool to enhance the reliability and security of rotating machinery. In this paper, an intelligent fault diagnosis method based on dictionary learning and singular value decomposition (SVD) is proposed. First, the dictionary learning scheme is capable of generating an adaptive dictionary whose atoms reveal the underlying structure of raw signals. Essentially, dictionary learning is employed as an adaptive feature extraction method regardless of any prior knowledge. Second, the singular value sequence of learned dictionary matrix is served to extract feature vector. Generally, since the vector is of high dimensionality, a simple and practical principal component analysis (PCA) is applied to reduce dimensionality. Finally, the K -nearest neighbor (KNN) algorithm is adopted for identification and classification of fault patterns automatically. Two experimental case studies are investigated to corroborate the effectiveness of the proposed method in intelligent diagnosis of rotating machinery faults. The comparison analysis validates that the dictionary learning-based matrix construction approach outperforms the mode decomposition-based methods in terms of capacity and adaptability for feature extraction.
Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong
2017-06-19
A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification.
Chen, Xue; Li, Xiaohui; Yang, Sibo; Yu, Xin; Liu, Aichun
2018-01-01
Lymphoma is a significant cancer that affects the human lymphatic and hematopoietic systems. In this work, discrimination of lymphoma using laser-induced breakdown spectroscopy (LIBS) conducted on whole blood samples is presented. The whole blood samples collected from lymphoma patients and healthy controls are deposited onto standard quantitative filter papers and ablated with a 1064 nm Q-switched Nd:YAG laser. 16 atomic and ionic emission lines of calcium (Ca), iron (Fe), magnesium (Mg), potassium (K) and sodium (Na) are selected to discriminate the cancer disease. Chemometric methods, including principal component analysis (PCA), linear discriminant analysis (LDA) classification, and k nearest neighbor (kNN) classification are used to build the discrimination models. Both LDA and kNN models have achieved very good discrimination performances for lymphoma, with an accuracy of over 99.7%, a sensitivity of over 0.996, and a specificity of over 0.997. These results demonstrate that the whole-blood-based LIBS technique in combination with chemometric methods can serve as a fast, less invasive, and accurate method for detection and discrimination of human malignancies. PMID:29541503
Al-Qazzaz, Noor Kamal; Ali, Sawal; Ahmad, Siti Anom; Escudero, Javier
2017-07-01
The aim of the present study was to discriminate the electroencephalogram (EEG) of 5 patients with vascular dementia (VaD), 15 patients with stroke-related mild cognitive impairment (MCI), and 15 control normal subjects during a working memory (WM) task. We used independent component analysis (ICA) and wavelet transform (WT) as a hybrid preprocessing approach for EEG artifact removal. Three different features were extracted from the cleaned EEG signals: spectral entropy (SpecEn), permutation entropy (PerEn) and Tsallis entropy (TsEn). Two classification schemes were applied - support vector machine (SVM) and k-nearest neighbors (kNN) - with fuzzy neighborhood preserving analysis with QR-decomposition (FNPAQR) as a dimensionality reduction technique. The FNPAQR dimensionality reduction technique increased the SVM classification accuracy from 82.22% to 90.37% and from 82.6% to 86.67% for kNN. These results suggest that FNPAQR consistently improves the discrimination of VaD, MCI patients and control normal subjects and it could be a useful feature selection to help the identification of patients with VaD and MCI.
Como, F; Carnesecchi, E; Volani, S; Dorne, J L; Richardson, J; Bassan, A; Pavan, M; Benfenati, E
2017-01-01
Ecological risk assessment of plant protection products (PPPs) requires an understanding of both the toxicity and the extent of exposure to assess risks for a range of taxa of ecological importance including target and non-target species. Non-target species such as honey bees (Apis mellifera), solitary bees and bumble bees are of utmost importance because of their vital ecological services as pollinators of wild plants and crops. To improve risk assessment of PPPs in bee species, computational models predicting the acute and chronic toxicity of a range of PPPs and contaminants can play a major role in providing structural and physico-chemical properties for the prioritisation of compounds of concern and future risk assessments. Over the last three decades, scientific advisory bodies and the research community have developed toxicological databases and quantitative structure-activity relationship (QSAR) models that are proving invaluable to predict toxicity using historical data and reduce animal testing. This paper describes the development and validation of a k-Nearest Neighbor (k-NN) model using in-house software for the prediction of acute contact toxicity of pesticides on honey bees. Acute contact toxicity data were collected from different sources for 256 pesticides, which were divided into training and test sets. The k-NN models were validated with good prediction, with an accuracy of 70% for all compounds and of 65% for highly toxic compounds, suggesting that they might reliably predict the toxicity of structurally diverse pesticides and could be used to screen and prioritise new pesticides. Copyright © 2016 Elsevier Ltd. All rights reserved.
Ni, Li-Jun; Luan, Shao-Rong; Zhang, Li-Guo
2016-10-01
Because of the numerous varieties of herbal species and active ingredients in the traditional Chinese medicine(TCM),the traditional methods employed could hardly satisfy the current determination requirements of TCM.The present work proposed an idea to realize rapid determination of the quality of TCM based on near infrared(NIR)spectroscopy and internet sharing mode. Low cost and portable multi-source composite spectrometer was invented by our group for in-site fast measurement of spectra of TCM samples. The database could be set up by sharing spectra and quality detection data of TCM samples among TCM enterprises based on the internet platform.A novel method called as keeping same relationship between X and Y space based on K nearest neighbors(KNN-KSR for short)was applied to predict the contents of effective compounds of the samples. In addition,a comparative study between KNN-KSR and partial least squares(PLS)was conducted. Two datasets were applied to validate above idea:one was about 58 Ginkgo Folium samples samples measured with four near-infrared spectroscopy instruments and two multi-source composite spectrometers,another one was about 80 corn samples available online measured with three NIR instruments. The results show that the KNN-KSR method could obtain more reliable outcomes without correcting spectrum.However transforming the PLS models to other instruments could hardly acquire better predictive results until spectral calibration is performed. Meanwhile,the similar analysis results of total flavonoids and total lactones of Ginkgo Folium samples are achieved on the multi-source composite spectrometers and near-infrared spectroscopy instruments,and the prediction results of KNN-KSR are better than PLS. The idea proposed in present study is in urgent need of more samples spectra, and then to be verified by more case studies. Copyright© by the Chinese Pharmaceutical Association.
Predicting missing links in complex networks based on common neighbors and distance
Yang, Jinxuan; Zhang, Xiao-Dong
2016-01-01
The algorithms based on common neighbors metric to predict missing links in complex networks are very popular, but most of these algorithms do not account for missing links between nodes with no common neighbors. It is not accurate enough to reconstruct networks by using these methods in some cases especially when between nodes have less common neighbors. We proposed in this paper a new algorithm based on common neighbors and distance to improve accuracy of link prediction. Our proposed algorithm makes remarkable effect in predicting the missing links between nodes with no common neighbors and performs better than most existing currently used methods for a variety of real-world networks without increasing complexity. PMID:27905526
Scalable Nearest Neighbor Algorithms for High Dimensional Data.
Muja, Marius; Lowe, David G
2014-11-01
For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
Assessment of various supervised learning algorithms using different performance metrics
NASA Astrophysics Data System (ADS)
Susheel Kumar, S. M.; Laxkar, Deepak; Adhikari, Sourav; Vijayarajan, V.
2017-11-01
Our work brings out comparison based on the performance of supervised machine learning algorithms on a binary classification task. The supervised machine learning algorithms which are taken into consideration in the following work are namely Support Vector Machine(SVM), Decision Tree(DT), K Nearest Neighbour (KNN), Naïve Bayes(NB) and Random Forest(RF). This paper mostly focuses on comparing the performance of above mentioned algorithms on one binary classification task by analysing the Metrics such as Accuracy, F-Measure, G-Measure, Precision, Misclassification Rate, False Positive Rate, True Positive Rate, Specificity, Prevalence.
Near-Neighbor Algorithms for Processing Bearing Data
1989-05-10
neighbor algorithms need not be universally more cost -effective than brute force methods. While the data access time of near-neighbor techniques scales with...the number of objects N better than brute force, the cost of setting up the data structure could scale worse than (Continues) 20...for the near neighbors NN2 1 (i). Depending on the particular NN algorithm, the cost of accessing near neighbors for each ai E S1 scales as either N
An Approach for Forest Inventory in Canada's Northern Boreal region, Northwest Territories
NASA Astrophysics Data System (ADS)
Mahoney, C.; Hopkinson, C.; Hall, R.; Filiatrault, M.
2017-12-01
The northern extent of Canada's northern boreal forest is largely inaccessible resulting in logistical, financial, and human challenges with respect to obtaining concise and accurate forest resource inventory (FRI) attributes such as stand height, aboveground biomass and forest carbon stocks. This challenge is further exacerbated by mandated government resource management and reporting of key attributes with respect to assessing impacts of natural disturbances, monitoring wildlife habitat and establishing policies to mitigate effects of climate change. This study presents a framework methodology utilized to inventory canopy height and crown closure over a 420,000 km2 area in Canada's Northwest Territories (NWT) by integrating field, LiDAR and satellite remote sensing data. Attributes are propagated from available field to coincident airborne LiDAR thru to satellite laser altimetry footprints. A quality controlled form of the latter are then submitted to a k-nearest neighbor (kNN) imputation algorithm to produce a continuous map of each attribute on a 30 m grid. The resultant kNN stand height (r=0.62, p=0.00) and crown closure (r=0.64, p=0.00) products were identified as statistically similar to a comprehensive independent airborne LiDAR source. Regional uncertainty can be produced with each attribute to identify areas of potential improvement through future strategic data acquisitions or the fine tuning of model parameters. This study's framework concept was developed to inform Natural Resources Canada - Canadian Forest Service's Multisource Vegetation Inventory and update vast regions of Canada's northern forest inventories, however, its applicability can be generalized to any environment. Not only can such a framework approach incorporate other data sources (such as Synthetic Aperture Radar) to potentially better characterize forest attributes, but it can also utilize future Earth observation mission data (for example ICESat-2) to monitor forest dynamics and the status, health and sustainability of Canada's northern boreal regions as areas where detailed inventory information is typically not available.
Si, Jian-min; Luo, A-li; Wu, Fu-zhao; Wu, Yi-hong
2015-03-01
There are many valuable rare and unusual objects in spectra dataset of Sloan Digital Sky Survey (SDSS) Data Release eight (DR8), such as special white dwarfs (DZ, DQ, DC), carbon stars, white dwarf main-sequence binaries (WDMS), cataclysmic variable (CV) stars and so on, so it is extremely significant to search for rare and unusual celestial objects from massive spectra dataset. A novel algorithm based on Kernel dense estimation and K-nearest neighborhoods (KNN) has been presented, and applied to search for rare and unusual celestial objects from 546 383 stellar spectra of SDSS DR8. Their densities are estimated using Gaussian kernel density estimation, the top 5 000 spectra in descend order by their densities are selected as rare objects, and the top 300 000 spectra in ascend order by their densities are selected as normal objects. Then, KNN were used to classify the rest objects, and simultaneously K nearest neighbors of the 5 000 rare spectra are also selected as rare objects. As a result, there are totally 21 193 spectra selected as initial rare spectra, which include error spectra caused by deletion, redden, bad calibration, spectra consisting of different physically irrelevant components, planetary nebulas, QSOs, special white dwarfs (DZ, DQ, DC), carbon stars, white dwarf main-sequence binaries (WDMS), cataclysmic variable (CV) stars and so on. By cross identification with SIMBAD, NED, ADS and major literature, it is found that three DZ white dwarfs, one WDMS, two CVs with company of G-type star, three CVs candidates, six DC white dwarfs, one DC white dwarf candidate and one BL Lacertae (BL lac) candidate are our new findings. We also have found one special DA white dwarf with emission lines of Ca II triple and Mg I, and one unknown object whose spectrum looks like a late M star with emission lines and its image looks like a galaxy or nebula.
Mirjankar, Nikhil S; Fraga, Carlos G; Carman, April J; Moran, James J
2016-02-02
Chemical attribution signatures (CAS) for chemical threat agents (CTAs), such as cyanides, are being investigated to provide an evidentiary link between CTAs and specific sources to support criminal investigations and prosecutions. Herein, stocks of KCN and NaCN were analyzed for trace anions by high performance ion chromatography (HPIC), carbon stable isotope ratio (δ(13)C) by isotope ratio mass spectrometry (IRMS), and trace elements by inductively coupled plasma optical emission spectroscopy (ICP-OES). The collected analytical data were evaluated using hierarchical cluster analysis (HCA), Fisher-ratio (F-ratio), interval partial least-squares (iPLS), genetic algorithm-based partial least-squares (GAPLS), partial least-squares discriminant analysis (PLSDA), K nearest neighbors (KNN), and support vector machines discriminant analysis (SVMDA). HCA of anion impurity profiles from multiple cyanide stocks from six reported countries of origin resulted in cyanide samples clustering into three groups, independent of the associated alkali metal (K or Na). The three groups were independently corroborated by HCA of cyanide elemental profiles and corresponded to countries each having one known solid cyanide factory: Czech Republic, Germany, and United States. Carbon stable isotope measurements resulted in two clusters: Germany and United States (the single Czech stock grouped with United States stocks). Classification errors for two validation studies using anion impurity profiles collected over five years on different instruments were as low as zero for KNN and SVMDA, demonstrating the excellent reliability associated with using anion impurities for matching a cyanide sample to its factory using our current cyanide stocks. Variable selection methods reduced errors for those classification methods having errors greater than zero; iPLS-forward selection and F-ratio typically provided the lowest errors. Finally, using anion profiles to classify cyanides to a specific stock or stock group for a subset of United States stocks resulted in cross-validation errors ranging from 0 to 5.3%.
Nukala, Bhargava Teja; Nakano, Taro; Rodriguez, Amanda; Tsay, Jerry; Lopez, Jerry; Nguyen, Tam Q; Zupancic, Steven; Lie, Donald Y C
2016-11-29
Gait analysis using wearable wireless sensors can be an economical, convenient and effective way to provide diagnostic and clinical information for various health-related issues. In this work, our custom designed low-cost wireless gait analysis sensor that contains a basic inertial measurement unit (IMU) was used to collect the gait data for four patients diagnosed with balance disorders and additionally three normal subjects, each performing the Dynamic Gait Index (DGI) tests while wearing the custom wireless gait analysis sensor (WGAS). The small WGAS includes a tri-axial accelerometer integrated circuit (IC), two gyroscopes ICs and a Texas Instruments (TI) MSP430 microcontroller and is worn by each subject at the T4 position during the DGI tests. The raw gait data are wirelessly transmitted from the WGAS to a near-by PC for real-time gait data collection and analysis. In order to perform successful classification of patients vs. normal subjects, we used several different classification algorithms, such as the back propagation artificial neural network (BP-ANN), support vector machine (SVM), k -nearest neighbors (KNN) and binary decision trees (BDT), based on features extracted from the raw gait data of the gyroscopes and accelerometers. When the range was used as the input feature, the overall classification accuracy obtained is 100% with BP-ANN, 98% with SVM, 96% with KNN and 94% using BDT. Similar high classification accuracy results were also achieved when the standard deviation or other values were used as input features to these classifiers. These results show that gait data collected from our very low-cost wearable wireless gait sensor can effectively differentiate patients with balance disorders from normal subjects in real time using various classifiers, the success of which may eventually lead to accurate and objective diagnosis of abnormal human gaits and their underlying etiologies in the future, as more patient data are being collected.
2014-01-01
Background Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information for the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a differential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for patients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and K-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database. Results The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The pulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction pathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the pre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately into the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique. The statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are significantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19% and 98.26%, respectively. Conclusion Although the data used to train and test the classifiers are limited, the classification accuracies found are satisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals from pathological and normal subjects obtained from the RALE database. PMID:24970564
Palaniappan, Rajkumar; Sundaraj, Kenneth; Sundaraj, Sebastian
2014-06-27
Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information for the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a differential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for patients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and K-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database. The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The pulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction pathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the pre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately into the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique. The statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are significantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19% and 98.26%, respectively. Although the data used to train and test the classifiers are limited, the classification accuracies found are satisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals from pathological and normal subjects obtained from the RALE database.
NASA Astrophysics Data System (ADS)
Stevens, Jeffrey
The past decade has seen the emergence of many hyperspectral image (HSI) analysis algorithms based on graph theory and derived manifold-coordinates. Yet, despite the growing number of algorithms, there has been limited study of the graphs constructed from spectral data themselves. Which graphs are appropriate for various HSI analyses--and why? This research aims to begin addressing these questions as the performance of graph-based techniques is inextricably tied to the graphical model constructed from the spectral data. We begin with a literature review providing a survey of spectral graph construction techniques currently used by the hyperspectral community, starting with simple constructs demonstrating basic concepts and then incrementally adding components to derive more complex approaches. Throughout this development, we discuss algorithm advantages and disadvantages for different types of hyperspectral analysis. A focus is provided on techniques influenced by spectral density through which the concept of community structure arises. Through the use of simulated and real HSI data, we demonstrate density-based edge allocation produces more uniform nearest neighbor lists than non-density based techniques through increasing the number of intracluster edges, facilitating higher k-nearest neighbor (k-NN) classification performance. Imposing the common mutuality constraint to symmetrify adjacency matrices is demonstrated to be beneficial in most circumstances, especially in rural (less cluttered) scenes. Many complex adaptive edge-reweighting techniques are shown to slightly degrade nearest-neighbor list characteristics. Analysis suggests this condition is possibly attributable to the validity of characterizing spectral density by a single variable representing data scale for each pixel. Additionally, it is shown that imposing mutuality hurts the performance of adaptive edge-allocation techniques or any technique that aims to assign a low number of edges (<10) to any pixel. A simple k bias addresses this problem. Many of the adaptive edge-reweighting techniques are based on the concept of codensity, so we explore codensity properties as they relate to density-based edge reweighting. We find that codensity may not be the best estimator of local scale due to variations in cluster density, so we introduce and compare two inherently density-weighted graph construction techniques from the data mining literature: shared nearest neighbors (SNN) and mutual proximity (MP). MP and SNN are not reliant upon a codensity measure, hence are not susceptible to its shortcomings. Neither has been used for hyperspectral analyses, so this presents the first study of these techniques on HSI data. We demonstrate MP and SNN can offer better performance, but in general none of the reweighting techniques improve the quality of these spectral graphs in our neighborhood structure tests. As such, these complex adaptive edge-reweighting techniques may need to be modified to increase their effectiveness. During this investigation, we probe deeper into properties of high-dimensional data and introduce the concept of concentration of measure (CoM)--the degradation in the efficacy of many common distance measures with increasing dimensionality--as it relates to spectral graph construction. CoM exists in pairwise distances between HSI pixels, but not to the degree experienced in random data of the same extrinsic dimension; a characteristic we demonstrate is due to the rich correlation and cluster structure present in HSI data. CoM can lead to hubness--a condition wherein some nodes have short distances (high similarities) to an exceptionally large number of nodes. We study hub presence in 49 HSI datasets of varying resolutions, altitudes, and spectral bands to demonstrate hubness effects are negligible in a k-NN classification example (generalized counting scenarios), but we note its impact on methods that use edge weights to derive manifold coordinates or splitting clusters based on spectral graph theory requires more investigation. Many of these new graph-related quantities can be exploited to demonstrate new techniques for HSI classification and anomaly detection. We present an initial exploration into this relatively new and exciting field based on an enhanced Schroedinger Eigenmap classification example and compare results to the current state-of-the-art approach. We produce equivalent results, but demonstrate different types of misclassifications, opening the door to combine the best of both approaches to achieve truly superior performance. A separate less mature hubness-assisted anomaly detector (HAAD) is also presented.
Introducing two Random Forest based methods for cloud detection in remote sensing images
NASA Astrophysics Data System (ADS)
Ghasemian, Nafiseh; Akhoondzadeh, Mehdi
2018-07-01
Cloud detection is a necessary phase in satellite images processing to retrieve the atmospheric and lithospheric parameters. Currently, some cloud detection methods based on Random Forest (RF) model have been proposed but they do not consider both spectral and textural characteristics of the image. Furthermore, they have not been tested in the presence of snow/ice. In this paper, we introduce two RF based algorithms, Feature Level Fusion Random Forest (FLFRF) and Decision Level Fusion Random Forest (DLFRF) to incorporate visible, infrared (IR) and thermal spectral and textural features (FLFRF) including Gray Level Co-occurrence Matrix (GLCM) and Robust Extended Local Binary Pattern (RELBP_CI) or visible, IR and thermal classifiers (DLFRF) for highly accurate cloud detection on remote sensing images. FLFRF first fuses visible, IR and thermal features. Thereafter, it uses the RF model to classify pixels to cloud, snow/ice and background or thick cloud, thin cloud and background. DLFRF considers visible, IR and thermal features (both spectral and textural) separately and inserts each set of features to RF model. Then, it holds vote matrix of each run of the model. Finally, it fuses the classifiers using the majority vote method. To demonstrate the effectiveness of the proposed algorithms, 10 Terra MODIS and 15 Landsat 8 OLI/TIRS images with different spatial resolutions are used in this paper. Quantitative analyses are based on manually selected ground truth data. Results show that after adding RELBP_CI to input feature set cloud detection accuracy improves. Also, the average cloud kappa values of FLFRF and DLFRF on MODIS images (1 and 0.99) are higher than other machine learning methods, Linear Discriminate Analysis (LDA), Classification And Regression Tree (CART), K Nearest Neighbor (KNN) and Support Vector Machine (SVM) (0.96). The average snow/ice kappa values of FLFRF and DLFRF on MODIS images (1 and 0.85) are higher than other traditional methods. The quantitative values on Landsat 8 images show similar trend. Consequently, while SVM and K-nearest neighbor show overestimation in predicting cloud and snow/ice pixels, our Random Forest (RF) based models can achieve higher cloud, snow/ice kappa values on MODIS and thin cloud, thick cloud and snow/ice kappa values on Landsat 8 images. Our algorithms predict both thin and thick cloud on Landsat 8 images while the existing cloud detection algorithm, Fmask cannot discriminate them. Compared to the state-of-the-art methods, our algorithms have acquired higher average cloud and snow/ice kappa values for different spatial resolutions.
Fast Nonparametric Machine Learning Algorithms for High-Dimensional Massive Data and Applications
2006-03-01
know the probability of that from Lemma 2. Using the union bound, we know that for any query q, the probability that i-am-feeling-lucky search algorithm...and each point in a d-dimensional space, a naive k-NN search needs to do a linear scan of T for every single query q, and thus the computational time...algorithm based on partition trees with priority search , and give an expected query time O((1/)d log n). But the constant in the O((1/)d log n
Capurso, Daniel; Bengtsson, Henrik; Segal, Mark R
2016-03-18
The spatial organization of the genome influences cellular function, notably gene regulation. Recent studies have assessed the three-dimensional (3D) co-localization of functional annotations (e.g. centromeres, long terminal repeats) using 3D genome reconstructions from Hi-C (genome-wide chromosome conformation capture) data; however, corresponding assessments for continuous functional genomic data (e.g. chromatin immunoprecipitation-sequencing (ChIP-seq) peak height) are lacking. Here, we demonstrate that applying bump hunting via the patient rule induction method (PRIM) to ChIP-seq data superposed on a Saccharomyces cerevisiae 3D genome reconstruction can discover 'functional 3D hotspots', regions in 3-space for which the mean ChIP-seq peak height is significantly elevated. For the transcription factor Swi6, the top hotspot by P-value contains MSB2 and ERG11 - known Swi6 target genes on different chromosomes. We verify this finding in a number of ways. First, this top hotspot is relatively stable under PRIM across parameter settings. Second, this hotspot is among the top hotspots by mean outcome identified by an alternative algorithm, k-Nearest Neighbor (k-NN) regression. Third, the distance between MSB2 and ERG11 is smaller than expected (by resampling) in two other 3D reconstructions generated via different normalization and reconstruction algorithms. This analytic approach can discover functional 3D hotspots and potentially reveal novel regulatory interactions. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Modeling occupancy distribution in large spaces with multi-feature classification algorithm
Wang, Wei; Chen, Jiayu; Hong, Tianzhen
2018-04-07
We present that occupancy information enables robust and flexible control of heating, ventilation, and air-conditioning (HVAC) systems in buildings. In large spaces, multiple HVAC terminals are typically installed to provide cooperative services for different thermal zones, and the occupancy information determines the cooperation among terminals. However, a person count at room-level does not adequately optimize HVAC system operation due to the movement of occupants within the room that creates uneven load distribution. Without accurate knowledge of the occupants’ spatial distribution, the uneven distribution of occupants often results in under-cooling/heating or over-cooling/heating in some thermal zones. Therefore, the lack of high-resolutionmore » occupancy distribution is often perceived as a bottleneck for future improvements to HVAC operation efficiency. To fill this gap, this study proposes a multi-feature k-Nearest-Neighbors (k-NN) classification algorithm to extract occupancy distribution through reliable, low-cost Bluetooth Low Energy (BLE) networks. An on-site experiment was conducted in a typical office of an institutional building to demonstrate the proposed methods, and the experiment outcomes of three case studies were examined to validate detection accuracy. One method based on City Block Distance (CBD) was used to measure the distance between detected occupancy distribution and ground truth and assess the results of occupancy distribution. Finally, the results show the accuracy when CBD = 1 is over 71.4% and the accuracy when CBD = 2 can reach up to 92.9%.« less
Modeling occupancy distribution in large spaces with multi-feature classification algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Wei; Chen, Jiayu; Hong, Tianzhen
We present that occupancy information enables robust and flexible control of heating, ventilation, and air-conditioning (HVAC) systems in buildings. In large spaces, multiple HVAC terminals are typically installed to provide cooperative services for different thermal zones, and the occupancy information determines the cooperation among terminals. However, a person count at room-level does not adequately optimize HVAC system operation due to the movement of occupants within the room that creates uneven load distribution. Without accurate knowledge of the occupants’ spatial distribution, the uneven distribution of occupants often results in under-cooling/heating or over-cooling/heating in some thermal zones. Therefore, the lack of high-resolutionmore » occupancy distribution is often perceived as a bottleneck for future improvements to HVAC operation efficiency. To fill this gap, this study proposes a multi-feature k-Nearest-Neighbors (k-NN) classification algorithm to extract occupancy distribution through reliable, low-cost Bluetooth Low Energy (BLE) networks. An on-site experiment was conducted in a typical office of an institutional building to demonstrate the proposed methods, and the experiment outcomes of three case studies were examined to validate detection accuracy. One method based on City Block Distance (CBD) was used to measure the distance between detected occupancy distribution and ground truth and assess the results of occupancy distribution. Finally, the results show the accuracy when CBD = 1 is over 71.4% and the accuracy when CBD = 2 can reach up to 92.9%.« less
NASA Astrophysics Data System (ADS)
Runyan, T. E.; Wood, W. T.; Palmsten, M. L.; Zhang, R.
2016-12-01
Gas hydrates, specifically methane hydrates, are sparsely sampled on a global scale, and their accumulation is difficult to predict geospatially. Several attempts have been made at estimating global inventories, and to some extent geospatial distribution, using geospatial extrapoltions guided with geophysical and geochemical methods. Our objective is to quantitatively predict the geospatial likelihood of encountering methane hydrates, with uncertainty. Predictions could be incorporated into analyses of drilling hazards as well as climate change. We use global data sets (including water depth, temperature, pressure, TOC, sediment thickness, and heat flow) as parameters to train a k-nearest neighbor (KNN) machine learning technique. The KNN is unsupervised and non-parametric, we do not provide any interpretive influence on prior probability distribution, so our results are strictly data driven. We have selected as test sites several locations where gas hydrates have been well studied, each with significantly different geologic settings.These include: The Blake Ridge (U.S. East Coast), Hydrate Ridge (U.S. West Coast), and the Gulf of Mexico. We then use KNN to quantify similarities between these sites, and determine, via the distance in parameter space, what is the likelihood and uncertainty of encountering gas hydrate anywhere in the world. Here we are operating under the assumption that the distance in parameter space is proportional to the probability of the occurrence of gas hydrate. We then compare these global similarity maps made from our several test sites to identify the geologic (geophyisical, bio-geochemical) parameters best suited for predicting gas hydrate occurrence.
Detection of Cardiac Abnormalities from Multilead ECG using Multiscale Phase Alternation Features.
Tripathy, R K; Dandapat, S
2016-06-01
The cardiac activities such as the depolarization and the relaxation of atria and ventricles are observed in electrocardiogram (ECG). The changes in the morphological features of ECG are the symptoms of particular heart pathology. It is a cumbersome task for medical experts to visually identify any subtle changes in the morphological features during 24 hours of ECG recording. Therefore, the automated analysis of ECG signal is a need for accurate detection of cardiac abnormalities. In this paper, a novel method for automated detection of cardiac abnormalities from multilead ECG is proposed. The method uses multiscale phase alternation (PA) features of multilead ECG and two classifiers, k-nearest neighbor (KNN) and fuzzy KNN for classification of bundle branch block (BBB), myocardial infarction (MI), heart muscle defect (HMD) and healthy control (HC). The dual tree complex wavelet transform (DTCWT) is used to decompose the ECG signal of each lead into complex wavelet coefficients at different scales. The phase of the complex wavelet coefficients is computed and the PA values at each wavelet scale are used as features for detection and classification of cardiac abnormalities. A publicly available multilead ECG database (PTB database) is used for testing of the proposed method. The experimental results show that, the proposed multiscale PA features and the fuzzy KNN classifier have better performance for detection of cardiac abnormalities with sensitivity values of 78.12 %, 80.90 % and 94.31 % for BBB, HMD and MI classes. The sensitivity value of proposed method for MI class is compared with the state-of-art techniques from multilead ECG.
Yang, Jiaojiao; Guo, Qian; Li, Wenjie; Wang, Suhong; Zou, Ling
2016-04-01
This paper aims to assist the individual clinical diagnosis of children with attention-deficit/hyperactivity disorder using electroencephalogram signal detection method.Firstly,in our experiments,we obtained and studied the electroencephalogram signals from fourteen attention-deficit/hyperactivity disorder children and sixteen typically developing children during the classic interference control task of Simon-spatial Stroop,and we completed electroencephalogram data preprocessing including filtering,segmentation,removal of artifacts and so on.Secondly,we selected the subset electroencephalogram electrodes using principal component analysis(PCA)method,and we collected the common channels of the optimal electrodes which occurrence rates were more than 90%in each kind of stimulation.We then extracted the latency(200~450ms)mean amplitude features of the common electrodes.Finally,we used the k-nearest neighbor(KNN)classifier based on Euclidean distance and the support vector machine(SVM)classifier based on radial basis kernel function to classify.From the experiment,at the same kind of interference control task,the attention-deficit/hyperactivity disorder children showed lower correct response rates and longer reaction time.The N2 emerged in prefrontal cortex while P2 presented in the inferior parietal area when all kinds of stimuli demonstrated.Meanwhile,the children with attention-deficit/hyperactivity disorder exhibited markedly reduced N2 and P2amplitude compared to typically developing children.KNN resulted in better classification accuracy than SVM classifier,and the best classification rate was 89.29%in StI task.The results showed that the electroencephalogram signals were different in the brain regions of prefrontal cortex and inferior parietal cortex between attention-deficit/hyperactivity disorder and typically developing children during the interference control task,which provided a scientific basis for the clinical diagnosis of attention-deficit/hyperactivity disorder individuals.
NASA Astrophysics Data System (ADS)
Lee, T. R.; Wood, W. T.; Dale, J.
2017-12-01
Empirical and theoretical models of sub-seafloor organic matter transformation, degradation and methanogenesis require estimates of initial seafloor total organic carbon (TOC). This subsurface methane, under the appropriate geophysical and geochemical conditions may manifest as methane hydrate deposits. Despite the importance of seafloor TOC, actual observations of TOC in the world's oceans are sparse and large regions of the seafloor yet remain unmeasured. To provide an estimate in areas where observations are limited or non-existent, we have implemented interpolation techniques that rely on existing data sets. Recent geospatial analyses have provided accurate accounts of global geophysical and geochemical properties (e.g. crustal heat flow, seafloor biomass, porosity) through machine learning interpolation techniques. These techniques find correlations between the desired quantity (in this case TOC) and other quantities (predictors, e.g. bathymetry, distance from coast, etc.) that are more widely known. Predictions (with uncertainties) of seafloor TOC in regions lacking direct observations are made based on the correlations. Global distribution of seafloor TOC at 1 x 1 arc-degree resolution was estimated from a dataset of seafloor TOC compiled by Seiter et al. [2004] and a non-parametric (i.e. data-driven) machine learning algorithm, specifically k-nearest neighbors (KNN). Built-in predictor selection and a ten-fold validation technique generated statistically optimal estimates of seafloor TOC and uncertainties. In addition, inexperience was estimated. Inexperience is effectively the distance in parameter space to the single nearest neighbor, and it indicates geographic locations where future data collection would most benefit prediction accuracy. These improved geospatial estimates of TOC in data deficient areas will provide new constraints on methane production and subsequent methane hydrate accumulation.
Özdemir, Merve Erkınay; Telatar, Ziya; Eroğul, Osman; Tunca, Yusuf
2018-05-01
Dysmorphic syndromes have different facial malformations. These malformations are significant to an early diagnosis of dysmorphic syndromes and contain distinctive information for face recognition. In this study we define the certain features of each syndrome by considering facial malformations and classify Fragile X, Hurler, Prader Willi, Down, Wolf Hirschhorn syndromes and healthy groups automatically. The reference points are marked on the face images and ratios between the points' distances are taken into consideration as features. We suggest a neural network based hierarchical decision tree structure in order to classify the syndrome types. We also implement k-nearest neighbor (k-NN) and artificial neural network (ANN) classifiers to compare classification accuracy with our hierarchical decision tree. The classification accuracy is 50, 73 and 86.7% with k-NN, ANN and hierarchical decision tree methods, respectively. Then, the same images are shown to a clinical expert who achieve a recognition rate of 46.7%. We develop an efficient system to recognize different syndrome types automatically in a simple, non-invasive imaging data, which is independent from the patient's age, sex and race at high accuracy. The promising results indicate that our method can be used for pre-diagnosis of the dysmorphic syndromes by clinical experts.
Breathing Pattern Interpretation as an Alternative and Effective Voice Communication Solution.
Elsahar, Yasmin; Bouazza-Marouf, Kaddour; Kerr, David; Gaur, Atul; Kaushik, Vipul; Hu, Sijung
2018-05-15
Augmentative and alternative communication (AAC) systems tend to rely on the interpretation of purposeful gestures for interaction. Existing AAC methods could be cumbersome and limit the solutions in terms of versatility. The study aims to interpret breathing patterns (BPs) to converse with the outside world by means of a unidirectional microphone and researches breathing-pattern interpretation (BPI) to encode messages in an interactive manner with minimal training. We present BP processing work with (1) output synthesized machine-spoken words (SMSW) along with single-channel Weiner filtering (WF) for signal de-noising, and (2) k -nearest neighbor ( k-NN ) classification of BPs associated with embedded dynamic time warping (DTW). An approved protocol to collect analogue modulated BP sets belonging to 4 distinct classes with 10 training BPs per class and 5 live BPs per class was implemented with 23 healthy subjects. An 86% accuracy of k-NN classification was obtained with decreasing error rates of 17%, 14%, and 11% for the live classifications of classes 2, 3, and 4, respectively. The results express a systematic reliability of 89% with increased familiarity. The outcomes from the current AAC setup recommend a durable engineering solution directly beneficial to the sufferers.
Exploring Sampling in the Detection of Multicategory EEG Signals
Siuly, Siuly; Kabir, Enamul; Wang, Hua; Zhang, Yanchun
2015-01-01
The paper presents a structure based on samplings and machine leaning techniques for the detection of multicategory EEG signals where random sampling (RS) and optimal allocation sampling (OS) are explored. In the proposed framework, before using the RS and OS scheme, the entire EEG signals of each class are partitioned into several groups based on a particular time period. The RS and OS schemes are used in order to have representative observations from each group of each category of EEG data. Then all of the selected samples by the RS from the groups of each category are combined in a one set named RS set. In the similar way, for the OS scheme, an OS set is obtained. Then eleven statistical features are extracted from the RS and OS set, separately. Finally this study employs three well-known classifiers: k-nearest neighbor (k-NN), multinomial logistic regression with a ridge estimator (MLR), and support vector machine (SVM) to evaluate the performance for the RS and OS feature set. The experimental outcomes demonstrate that the RS scheme well represents the EEG signals and the k-NN with the RS is the optimum choice for detection of multicategory EEG signals. PMID:25977705
Aided diagnosis methods of breast cancer based on machine learning
NASA Astrophysics Data System (ADS)
Zhao, Yue; Wang, Nian; Cui, Xiaoyu
2017-08-01
In the field of medicine, quickly and accurately determining whether the patient is malignant or benign is the key to treatment. In this paper, K-Nearest Neighbor, Linear Discriminant Analysis, Logistic Regression were applied to predict the classification of thyroid,Her-2,PR,ER,Ki67,metastasis and lymph nodes in breast cancer, in order to recognize the benign and malignant breast tumors and achieve the purpose of aided diagnosis of breast cancer. The results showed that the highest classification accuracy of LDA was 88.56%, while the classification effect of KNN and Logistic Regression were better than that of LDA, the best accuracy reached 96.30%.
Analysis of the hand vein pattern for people recognition
NASA Astrophysics Data System (ADS)
Castro-Ortega, R.; Toxqui-Quitl, C.; Cristóbal, G.; Marcos, J. Victor; Padilla-Vivanco, A.; Hurtado Pérez, R.
2015-09-01
The shape of the hand vascular pattern contains useful and unique features that can be used for identifying and authenticating people, with applications in access control, medicine and financial services. In this work, an optical system for the image acquisition of the hand vascular pattern is implemented. It consists of a CCD camera with sensitivity in the IR and a light source with emission in the 880 nm. The IR radiation interacts with the desoxyhemoglobin, hemoglobin and water present in the blood of the veins, making possible to see the vein pattern underneath skin. The segmentation of the Region Of Interest (ROI) is achieved using geometrical moments locating the centroid of an image. For enhancement of the vein pattern we use the technique of Histogram Equalization and Contrast Limited Adaptive Histogram Equalization (CLAHE). In order to remove unnecessary information such as body hair and skinfolds, a low pass filter is implemented. A method based on geometric moments is used to obtain the invariant descriptors of the input images. The classification task is achieved using Artificial Neural Networks (ANN) and K-Nearest Neighbors (K-nn) algorithms. Experimental results using our database show a percentage of correct classification, higher of 86.36% with ANN for 912 images of 38 people with 12 versions each one.
Feature extraction and classification of clouds in high resolution panchromatic satellite imagery
NASA Astrophysics Data System (ADS)
Sharghi, Elan
The development of sophisticated remote sensing sensors is rapidly increasing, and the vast amount of satellite imagery collected is too much to be analyzed manually by a human image analyst. It has become necessary for a tool to be developed to automate the job of an image analyst. This tool would need to intelligently detect and classify objects of interest through computer vision algorithms. Existing software called the Rapid Image Exploitation Resource (RAPIER®) was designed by engineers at Space and Naval Warfare Systems Center Pacific (SSC PAC) to perform exactly this function. This software automatically searches for anomalies in the ocean and reports the detections as a possible ship object. However, if the image contains a high percentage of cloud coverage, a high number of false positives are triggered by the clouds. The focus of this thesis is to explore various feature extraction and classification methods to accurately distinguish clouds from ship objects. An examination of a texture analysis method, line detection using the Hough transform, and edge detection using wavelets are explored as possible feature extraction methods. The features are then supplied to a K-Nearest Neighbors (KNN) or Support Vector Machine (SVM) classifier. Parameter options for these classifiers are explored and the optimal parameters are determined.
Predicting hepatotoxicity using ToxCast in vitro bioactivity and ...
Background: The U.S. EPA ToxCastTM program is screening thousands of environmental chemicals for bioactivity using hundreds of high-throughput in vitro assays to build predictive models of toxicity. We represented chemicals based on bioactivity and chemical structure descriptors then used supervised machine learning to predict their hepatotoxic effects.Results: A set of 677 chemicals were represented by 711 in vitro bioactivity descriptors (from ToxCast assays), 4,376 chemical structure descriptors (from QikProp, OpenBabel, PADEL, and PubChem), and three hepatotoxicity categories (from animal studies). Hepatotoxicants were defined by rat liver histopathology observed after chronic chemical testing and grouped into hypertrophy (161), injury (101) and proliferative lesions (99). Classifiers were built using six machine learning algorithms: linear discriminant analysis (LDA), Naïve Bayes (NB), support vector classification (SVM), classification and regression trees (CART), k-nearest neighbors (KNN) and an ensemble of classifiers (ENSMB). Classifiers of hepatotoxicity were built using chemical structure, ToxCast bioactivity, and a hybrid representation. Predictive performance was evaluated using 10-fold cross-validation testing and in-loop, filter-based, feature subset selection. Hybrid classifiers had the best balanced accuracy for predicting hypertrophy (0.78±0.08), injury (0.73±0.10) and proliferative lesions (0.72±0.09). Though chemical and bioactivity class
A novel mobile-cloud system for capturing and analyzing wheelchair maneuvering data: A pilot study.
Fu, Jicheng; Jones, Maria; Liu, Tao; Hao, Wei; Yan, Yuqing; Qian, Gang; Jan, Yih-Kuen
2016-01-01
The purpose of this pilot study was to provide a new approach for capturing and analyzing wheelchair maneuvering data, which are critical for evaluating wheelchair users' activity levels. We proposed a mobile-cloud (MC) system, which incorporated the emerging mobile and cloud computing technologies. The MC system employed smartphone sensors to collect wheelchair maneuvering data and transmit them to the cloud for storage and analysis. A k-nearest neighbor (KNN) machine-learning algorithm was developed to mitigate the impact of sensor noise and recognize wheelchair maneuvering patterns. We conducted 30 trials in an indoor setting, where each trial contained 10 bouts (i.e., periods of continuous wheelchair movement). We also verified our approach in a different building. Different from existing approaches that require sensors to be attached to wheelchairs' wheels, we placed the smartphone into a smartphone holder attached to the wheelchair. Experimental results illustrate that our approach correctly identified all 300 bouts. Compared to existing approaches, our approach was easier to use while achieving similar accuracy in analyzing the accumulated movement time and maximum period of continuous movement (p > 0.8). Overall, the MC system provided a feasible way to ease the data collection process and generated accurate analysis results for evaluating activity levels.
Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong
2017-01-01
A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification. PMID:28629202
Han, Te; Jiang, Dongxiang; Zhang, Xiaochen; Sun, Yankui
2017-01-01
Rotating machinery is widely used in industrial applications. With the trend towards more precise and more critical operating conditions, mechanical failures may easily occur. Condition monitoring and fault diagnosis (CMFD) technology is an effective tool to enhance the reliability and security of rotating machinery. In this paper, an intelligent fault diagnosis method based on dictionary learning and singular value decomposition (SVD) is proposed. First, the dictionary learning scheme is capable of generating an adaptive dictionary whose atoms reveal the underlying structure of raw signals. Essentially, dictionary learning is employed as an adaptive feature extraction method regardless of any prior knowledge. Second, the singular value sequence of learned dictionary matrix is served to extract feature vector. Generally, since the vector is of high dimensionality, a simple and practical principal component analysis (PCA) is applied to reduce dimensionality. Finally, the K-nearest neighbor (KNN) algorithm is adopted for identification and classification of fault patterns automatically. Two experimental case studies are investigated to corroborate the effectiveness of the proposed method in intelligent diagnosis of rotating machinery faults. The comparison analysis validates that the dictionary learning-based matrix construction approach outperforms the mode decomposition-based methods in terms of capacity and adaptability for feature extraction. PMID:28346385
Toxicity challenges in environmental chemicals: Prediction of ...
Physiologically based pharmacokinetic (PBPK) models bridge the gap between in vitro assays and in vivo effects by accounting for the adsorption, distribution, metabolism, and excretion of xenobiotics, which is especially useful in the assessment of human toxicity. Quantitative structure-activity relationships (QSAR) serve as a vital tool for the high-throughput prediction of chemical-specific PBPK parameters, such as the fraction of a chemical unbound by plasma protein (Fub). The presented work explores the merit of utilizing experimental pharmaceutical Fub data for the construction of a universal QSAR model, in order to compensate for the limited range of high-quality experimental Fub data for environmentally relevant chemicals, such as pollutants, pesticides, and consumer products. Independent QSAR models were constructed with three machine-learning algorithms, k nearest neighbors (kNN), random forest (RF), and support vector machine (SVM) regression, from a large pharmaceutical training set (~1000) and assessed with independent test sets of pharmaceuticals (~200) and environmentally relevant chemicals in the ToxCast program (~400). Small descriptor sets yielded the optimal balance of model complexity and performance, providing insight into the biochemical factors of plasma protein binding, while preventing over fitting to the training set. Overlaps in chemical space between pharmaceutical and environmental compounds were considered through applicability of do
Informing the Human Plasma Protein Binding of ...
The free fraction of a xenobiotic in plasma (Fub) is an important determinant of chemical adsorption, distribution, metabolism, elimination, and toxicity, yet experimental plasma protein binding data is scarce for environmentally relevant chemicals. The presented work explores the merit of utilizing available pharmaceutical data to predict Fub for environmentally relevant chemicals via machine learning techniques. Quantitative structure-activity relationship (QSAR) models were constructed with k nearest neighbors (kNN), support vector machines (SVM), and random forest (RF) machine learning algorithms from a training set of 1045 pharmaceuticals. The models were then evaluated with independent test sets of pharmaceuticals (200 compounds) and environmentally relevant ToxCast chemicals (406 total, in two groups of 238 and 168 compounds). The selection of a minimal feature set of 10-15 2D molecular descriptors allowed for both informative feature interpretation and practical applicability domain assessment via a bounded box of descriptor ranges and principal component analysis. The diverse pharmaceutical and environmental chemical sets exhibit similarities in terms of chemical space (99-82% overlap), as well as comparable bias and variance in constructed learning curves. All the models exhibit significant predictability with mean absolute errors (MAE) in the range of 0.10-0.18 Fub. The models performed best for highly bound chemicals (MAE 0.07-0.12), neutrals (MAE 0
A Novel Mobile-Cloud System for Capturing and Analyzing Wheelchair Maneuvering Data: A Pilot Study
Fu, Jicheng; Jones, Maria; Liu, Tao; Hao, Wei; Yan, Yuqing; Qian, Gang; Jan, Yih-Kuen
2016-01-01
The purpose of this pilot study was to provide a new approach for capturing and analyzing wheelchair maneuvering data, which are critical for evaluating wheelchair users’ activity levels. We proposed a mobile-cloud (MC) system, which incorporated the emerging mobile and cloud computing technologies. The MC system employed smartphone sensors to collect wheelchair maneuvering data and transmit them to the cloud for storage and analysis. A K-Nearest-Neighbor (KNN) machine-learning algorithm was developed to mitigate the impact of sensor noise and recognize wheelchair maneuvering patterns. We conducted 30 trials in an indoor setting, where each trial contained 10 bouts (i.e., periods of continuous wheelchair movement). We also verified our approach in a different building. Different from existing approaches that require sensors to be attached to wheelchairs’ wheels, we placed the smartphone into a smartphone holder attached to the wheelchair. Experimental results illustrate that our approach correctly identified all 300 bouts. Compared to existing approaches, our approach was easier to use while achieving similar accuracy in analyzing the accumulated movement time and maximum period of continuous movement (p > 0.8). Overall, the MC system provided a feasible way to ease the data collection process, and generated accurate analysis results for evaluating activity levels. PMID:26479684
Al-Qazzaz, Noor Kamal; Ali, Sawal Hamid Bin Mohd; Ahmad, Siti Anom; Islam, Mohd Shabiul; Escudero, Javier
2018-01-01
Stroke survivors are more prone to developing cognitive impairment and dementia. Dementia detection is a challenge for supporting personalized healthcare. This study analyzes the electroencephalogram (EEG) background activity of 5 vascular dementia (VaD) patients, 15 stroke-related patients with mild cognitive impairment (MCI), and 15 control healthy subjects during a working memory (WM) task. The objective of this study is twofold. First, it aims to enhance the discrimination of VaD, stroke-related MCI patients, and control subjects using fuzzy neighborhood preserving analysis with QR-decomposition (FNPAQR); second, it aims to extract and investigate the spectral features that characterize the post-stroke dementia patients compared to the control subjects. Nineteen channels were recorded and analyzed using the independent component analysis and wavelet analysis (ICA-WT) denoising technique. Using ANOVA, linear spectral power including relative powers (RP) and power ratio were calculated to test whether the EEG dominant frequencies were slowed down in VaD and stroke-related MCI patients. Non-linear features including permutation entropy (PerEn) and fractal dimension (FD) were used to test the degree of irregularity and complexity, which was significantly lower in patients with VaD and stroke-related MCI than that in control subjects (ANOVA; p ˂ 0.05). This study is the first to use fuzzy neighborhood preserving analysis with QR-decomposition (FNPAQR) dimensionality reduction technique with EEG background activity of dementia patients. The impairment of post-stroke patients was detected using support vector machine (SVM) and k-nearest neighbors (kNN) classifiers. A comparative study has been performed to check the effectiveness of using FNPAQR dimensionality reduction technique with the SVM and kNN classifiers. FNPAQR with SVM and kNN obtained 91.48 and 89.63% accuracy, respectively, whereas without using the FNPAQR exhibited 70 and 67.78% accuracy for SVM and kNN, respectively, in classifying VaD, stroke-related MCI, and control patients, respectively. Therefore, EEG could be a reliable index for inspecting concise markers that are sensitive to VaD and stroke-related MCI patients compared to control healthy subjects.
Wang, Huazhen; Liu, Xin; Lv, Bing; Yang, Fan; Hong, Yanzhu
2014-01-01
Objective Chronic Fatigue (CF) still remains unclear about its etiology, pathophysiology, nomenclature and diagnostic criteria in the medical community. Traditional Chinese medicine (TCM) adopts a unique diagnostic method, namely ‘bian zheng lun zhi’ or syndrome differentiation, to diagnose the CF with a set of syndrome factors, which can be regarded as the Multi-Label Learning (MLL) problem in the machine learning literature. To obtain an effective and reliable diagnostic tool, we use Conformal Predictor (CP), Random Forest (RF) and Problem Transformation method (PT) for the syndrome differentiation of CF. Methods and Materials In this work, using PT method, CP-RF is extended to handle MLL problem. CP-RF applies RF to measure the confidence level (p-value) of each label being the true label, and then selects multiple labels whose p-values are larger than the pre-defined significance level as the region prediction. In this paper, we compare the proposed CP-RF with typical CP-NBC(Naïve Bayes Classifier), CP-KNN(K-Nearest Neighbors) and ML-KNN on CF dataset, which consists of 736 cases. Specifically, 95 symptoms are used to identify CF, and four syndrome factors are employed in the syndrome differentiation, including ‘spleen deficiency’, ‘heart deficiency’, ‘liver stagnation’ and ‘qi deficiency’. The Results CP-RF demonstrates an outstanding performance beyond CP-NBC, CP-KNN and ML-KNN under the general metrics of subset accuracy, hamming loss, one-error, coverage, ranking loss and average precision. Furthermore, the performance of CP-RF remains steady at the large scale of confidence levels from 80% to 100%, which indicates its robustness to the threshold determination. In addition, the confidence evaluation provided by CP is valid and well-calibrated. Conclusion CP-RF not only offers outstanding performance but also provides valid confidence evaluation for the CF syndrome differentiation. It would be well applicable to TCM practitioners and facilitate the utilities of objective, effective and reliable computer-based diagnosis tool. PMID:24918430
Classification of EEG Signals Based on Pattern Recognition Approach.
Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed
2017-01-01
Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a "pattern recognition" approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90-7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11-89.63% and 91.60-81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy.
Classification of EEG Signals Based on Pattern Recognition Approach
Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed
2017-01-01
Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a “pattern recognition” approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90–7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11–89.63% and 91.60–81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy. PMID:29209190
Classification of diabetic retinopathy using fractal dimension analysis of eye fundus image
NASA Astrophysics Data System (ADS)
Safitri, Diah Wahyu; Juniati, Dwi
2017-08-01
Diabetes Mellitus (DM) is a metabolic disorder when pancreas produce inadequate insulin or a condition when body resist insulin action, so the blood glucose level is high. One of the most common complications of diabetes mellitus is diabetic retinopathy which can lead to a vision problem. Diabetic retinopathy can be recognized by an abnormality in eye fundus. Those abnormalities are characterized by microaneurysms, hemorrhage, hard exudate, cotton wool spots, and venous's changes. The diabetic retinopathy is classified depends on the conditions of abnormality in eye fundus, that is grade 1 if there is a microaneurysm only in the eye fundus; grade 2, if there are a microaneurysm and a hemorrhage in eye fundus; and grade 3: if there are microaneurysm, hemorrhage, and neovascularization in the eye fundus. This study proposed a method and a process of eye fundus image to classify of diabetic retinopathy using fractal analysis and K-Nearest Neighbor (KNN). The first phase was image segmentation process using green channel, CLAHE, morphological opening, matched filter, masking, and morphological opening binary image. After segmentation process, its fractal dimension was calculated using box-counting method and the values of fractal dimension were analyzed to make a classification of diabetic retinopathy. Tests carried out by used k-fold cross validation method with k=5. In each test used 10 different grade K of KNN. The accuracy of the result of this method is 89,17% with K=3 or K=4, it was the best results than others K value. Based on this results, it can be concluded that the classification of diabetic retinopathy using fractal analysis and KNN had a good performance.
Fall Detection System for the Elderly Based on the Classification of Shimmer Sensor Prototype Data
Ahmed, Moiz; Mehmood, Nadeem; Mehmood, Amir; Rizwan, Kashif
2017-01-01
Objectives Falling in the elderly is considered a major cause of death. In recent years, ambient and wireless sensor platforms have been extensively used in developed countries for the detection of falls in the elderly. However, we believe extra efforts are required to address this issue in developing countries, such as Pakistan, where most deaths due to falls are not even reported. Considering this, in this paper, we propose a fall detection system prototype that s based on the classification on real time shimmer sensor data. Methods We first developed a data set, ‘SMotion’ of certain postures that could lead to falls in the elderly by using a body area network of Shimmer sensors and categorized the items in this data set into age and weight groups. We developed a feature selection and classification system using three classifiers, namely, support vector machine (SVM), K-nearest neighbor (KNN), and neural network (NN). Finally, a prototype was fabricated to generate alerts to caregivers, health experts, or emergency services in case of fall. Results To evaluate the proposed system, SVM, KNN, and NN were used. The results of this study identified KNN as the most accurate classifier with maximum accuracy of 96% for age groups and 93% for weight groups. Conclusions In this paper, a classification-based fall detection system is proposed. For this purpose, the SMotion data set was developed and categorized into two groups (age and weight groups). The proposed fall detection system for the elderly is implemented through a body area sensor network using third-generation sensors. The evaluation results demonstrate the reasonable performance of the proposed fall detection prototype system in the tested scenarios. PMID:28875049
Automated detection and recognition of wildlife using thermal cameras.
Christiansen, Peter; Steen, Kim Arild; Jørgensen, Rasmus Nyholm; Karstoft, Henrik
2014-07-30
In agricultural mowing operations, thousands of animals are injured or killed each year, due to the increased working widths and speeds of agricultural machinery. Detection and recognition of wildlife within the agricultural fields is important to reduce wildlife mortality and, thereby, promote wildlife-friendly farming. The work presented in this paper contributes to the automated detection and classification of animals in thermal imaging. The methods and results are based on top-view images taken manually from a lift to motivate work towards unmanned aerial vehicle-based detection and recognition. Hot objects are detected based on a threshold dynamically adjusted to each frame. For the classification of animals, we propose a novel thermal feature extraction algorithm. For each detected object, a thermal signature is calculated using morphological operations. The thermal signature describes heat characteristics of objects and is partly invariant to translation, rotation, scale and posture. The discrete cosine transform (DCT) is used to parameterize the thermal signature and, thereby, calculate a feature vector, which is used for subsequent classification. Using a k-nearest-neighbor (kNN) classifier, animals are discriminated from non-animals with a balanced classification accuracy of 84.7% in an altitude range of 3-10 m and an accuracy of 75.2% for an altitude range of 10-20 m. To incorporate temporal information in the classification, a tracking algorithm is proposed. Using temporal information improves the balanced classification accuracy to 93.3% in an altitude range 3-10 of meters and 77.7% in an altitude range of 10-20 m.
Supervised pixel classification for segmenting geographic atrophy in fundus autofluorescene images
NASA Astrophysics Data System (ADS)
Hu, Zhihong; Medioni, Gerard G.; Hernandez, Matthias; Sadda, SriniVas R.
2014-03-01
Age-related macular degeneration (AMD) is the leading cause of blindness in people over the age of 65. Geographic atrophy (GA) is a manifestation of the advanced or late-stage of the AMD, which may result in severe vision loss and blindness. Techniques to rapidly and precisely detect and quantify GA lesions would appear to be of important value in advancing the understanding of the pathogenesis of GA and the management of GA progression. The purpose of this study is to develop an automated supervised pixel classification approach for segmenting GA including uni-focal and multi-focal patches in fundus autofluorescene (FAF) images. The image features include region wise intensity (mean and variance) measures, gray level co-occurrence matrix measures (angular second moment, entropy, and inverse difference moment), and Gaussian filter banks. A k-nearest-neighbor (k-NN) pixel classifier is applied to obtain a GA probability map, representing the likelihood that the image pixel belongs to GA. A voting binary iterative hole filling filter is then applied to fill in the small holes. Sixteen randomly chosen FAF images were obtained from sixteen subjects with GA. The algorithm-defined GA regions are compared with manual delineation performed by certified graders. Two-fold cross-validation is applied for the evaluation of the classification performance. The mean Dice similarity coefficients (DSC) between the algorithm- and manually-defined GA regions are 0.84 +/- 0.06 for one test and 0.83 +/- 0.07 for the other test and the area correlations between them are 0.99 (p < 0.05) and 0.94 (p < 0.05) respectively.
The Development of Mobile Application to Introduce Historical Monuments in Manado
NASA Astrophysics Data System (ADS)
Rupilu, Moshe Markhasi; Suyoto; Santoso, Albertus Joko
2018-02-01
Learning the historical value of a monument is important because it preserves cultural and historical values, as well as expanding our personal insight. In Indonesia, particularly in Manado, North Sulawesi, there are many monuments. The monuments are erected for history, religion, culture and past war, however these aren't written in detail in the monuments. To get information on specific monument, manual search was required, i.e. asking related people or sources. Based on the problem, the development of an application which can utilize LBS (Location Based Service) method and some algorithmic methods specifically designed for mobile devices such as Smartphone, was required so that information on every monument in Manado can be displayed in detail using GPS coordinate. The application was developed by KNN method with K-means algorithm and collaborative filtering to recommend monument information to tourist. Tourists will get recommended options filtered by distance. Then, this method was also used to look for the closest monument from user. KNN algorithm determines the closest location by making comparisons according to calculation of longitude and latitude of several monuments tourist wants to visit. With this application, tourists who want to know and find information on monuments in Manado can do them easily and quickly because monument information is recommended directly to user without having to make selection. Moreover, tourist can see recommended monument information and search several monuments in Manado in real time.
NASA Astrophysics Data System (ADS)
Taha, Z.; Razman, M. A. M.; Ghani, A. S. Abdul; Majeed, A. P. P. Abdul; Musa, R. M.; Adnan, F. A.; Sallehudin, M. F.; Mukai, Y.
2018-04-01
Fish Hunger behaviour is essential in determining the fish feeding routine, particularly for fish farmers. The inability to provide accurate feeding routines (under-feeding or over-feeding) may lead the death of the fish and consequently inhibits the quantity of the fish produced. Moreover, the excessive food that is not consumed by the fish will be dissolved in the water and accordingly reduce the water quality through the reduction of oxygen quantity. This problem also leads the death of the fish or even spur fish diseases. In the present study, a correlation of Barramundi fish-school behaviour with hunger condition through the hybrid data integration of image processing technique is established. The behaviour is clustered with respect to the position of the school size as well as the school density of the fish before feeding, during feeding and after feeding. The clustered fish behaviour is then classified through k-Nearest Neighbour (k-NN) learning algorithm. Three different variations of the algorithm namely cosine, cubic and weighted are assessed on its ability to classify the aforementioned fish hunger behaviour. It was found from the study that the weighted k-NN variation provides the best classification with an accuracy of 86.5%. Therefore, it could be concluded that the proposed integration technique may assist fish farmers in ascertaining fish feeding routine.
A multilevel-skin neighbor list algorithm for molecular dynamics simulation
NASA Astrophysics Data System (ADS)
Zhang, Chenglong; Zhao, Mingcan; Hou, Chaofeng; Ge, Wei
2018-01-01
Searching of the interaction pairs and organization of the interaction processes are important steps in molecular dynamics (MD) algorithms and are critical to the overall efficiency of the simulation. Neighbor lists are widely used for these steps, where thicker skin can reduce the frequency of list updating but is discounted by more computation in distance check for the particle pairs. In this paper, we propose a new neighbor-list-based algorithm with a precisely designed multilevel skin which can reduce unnecessary computation on inter-particle distances. The performance advantages over traditional methods are then analyzed against the main simulation parameters on Intel CPUs and MICs (many integrated cores), and are clearly demonstrated. The algorithm can be generalized for various discrete simulations using neighbor lists.
Electromechanical properties of engineered lead free potassium sodium niobate based materials =
NASA Astrophysics Data System (ADS)
Rafiq, Muhammad Asif
K0.5Na0.5NbO3 (KNN), is the most promising lead free material for substituting lead zirconate titanate (PZT) which is still the market leader used for sensors and actuators. To make KNN a real competitor, it is necessary to understand and to improve its properties. This goal is pursued in the present work via different approaches aiming to study KNN intrinsic properties and then to identify appropriate strategies like doping and texturing for designing better KNN materials for an intended application. Hence, polycrystalline KNN ceramics (undoped, non-stoichiometric; NST and doped), high-quality KNN single crystals and textured KNN based ceramics were successfully synthesized and characterized in this work. Polycrystalline undoped, non-stoichiometric (NST) and Mn doped KNN ceramics were prepared by conventional ceramic processing. Structure, microstructure and electrical properties were measured. It was observed that the window for mono-phasic compositions was very narrow for both NST ceramics and Mn doped ceramics. For NST ceramics the variation of A/B ratio influenced the polarization (P-E) hysteresis loop and better piezoelectric and dielectric responses could be found for small stoichiometry deviations (A/B = 0.97). Regarding Mn doping, as compared to undoped KNN which showed leaky polarization (P-E) hysteresis loops, B-site Mn doped ceramics showed a well saturated, less-leaky hysteresis loop and a significant properties improvement. Impedance spectroscopy was used to assess the role of Mn and a relation between charge transport - defects and ferroelectric response in K0.5Na0.5NbO3 (KNN) and Mn doped KNN ceramics could be established. At room temperature the conduction in KNN which is associated with holes transport is suppressed by Mn doping. Hence Mn addition increases the resistivity of the ceramic, which proved to be very helpful for improving the saturation of the P-E loop. At high temperatures the conduction is dominated by the motion of ionized oxygen vacancies whose concentration increases with Mn doping. Single crystals of potassium sodium niobate (KNN) were grown by a modified high temperature flux method. A boron-modified flux was used to obtain the crystals at a relatively low temperature. XRD, EDS and ICP analysis proved the chemical and crystallographic quality of the crystals. The grown KNN crystals exhibit higher dielectric permittivity (29,100) at the tetragonal-to-cubic phase transition temperature, higher remnant polarization (19.4 ?C/cm2) and piezoelectric coefficient (160 pC/N) when compared with the standard KNN ceramics. KNN single crystals domain structure was characterized for the first time by piezoforce response microscopy. It could be observed that - oriented potassium sodium niobate (KNN) single crystals reveal a long range ordered domain pattern of parallel 180° domains with zig-zag 90° domains. From the comparison of KNN Single crystals to ceramics, It is argued that the presence in KNN single crystal (and absence in KNN ceramics) of such a long range order specific domain pattern that is its fingerprint accounts for the improved properties of single crystals. These results have broad implications for the expanded use of KNN materials, by establishing a relation between the domain patterns and the dielectric and ferroelectric response of single crystals and ceramics and by indicating ways of achieving maximised properties in KNN materials. (Abstract shortened by ProQuest.).
Classification accuracies of physical activities using smartphone motion sensors.
Wu, Wanmin; Dasgupta, Sanjoy; Ramirez, Ernesto E; Peterson, Carlyn; Norman, Gregory J
2012-10-05
Over the past few years, the world has witnessed an unprecedented growth in smartphone use. With sensors such as accelerometers and gyroscopes on board, smartphones have the potential to enhance our understanding of health behavior, in particular physical activity or the lack thereof. However, reliable and valid activity measurement using only a smartphone in situ has not been realized. To examine the validity of the iPod Touch (Apple, Inc.) and particularly to understand the value of using gyroscopes for classifying types of physical activity, with the goal of creating a measurement and feedback system that easily integrates into individuals' daily living. We collected accelerometer and gyroscope data for 16 participants on 13 activities with an iPod Touch, a device that has essentially the same sensors and computing platform as an iPhone. The 13 activities were sitting, walking, jogging, and going upstairs and downstairs at different paces. We extracted time and frequency features, including mean and variance of acceleration and gyroscope on each axis, vector magnitude of acceleration, and fast Fourier transform magnitude for each axis of acceleration. Different classifiers were compared using the Waikato Environment for Knowledge Analysis (WEKA) toolkit, including C4.5 (J48) decision tree, multilayer perception, naive Bayes, logistic, k-nearest neighbor (kNN), and meta-algorithms such as boosting and bagging. The 10-fold cross-validation protocol was used. Overall, the kNN classifier achieved the best accuracies: 52.3%-79.4% for up and down stair walking, 91.7% for jogging, 90.1%-94.1% for walking on a level ground, and 100% for sitting. A 2-second sliding window size with a 1-second overlap worked the best. Adding gyroscope measurements proved to be more beneficial than relying solely on accelerometer readings for all activities (with improvement ranging from 3.1% to 13.4%). Common categories of physical activity and sedentary behavior (walking, jogging, and sitting) can be recognized with high accuracies using both the accelerometer and gyroscope onboard the iPod touch or iPhone. This suggests the potential of developing just-in-time classification and feedback tools on smartphones.
Development of a sonar-based object recognition system
NASA Astrophysics Data System (ADS)
Ecemis, Mustafa Ihsan
2001-02-01
Sonars are used extensively in mobile robotics for obstacle detection, ranging and avoidance. However, these range-finding applications do not exploit the full range of information carried in sonar echoes. In addition, mobile robots need robust object recognition systems. Therefore, a simple and robust object recognition system using ultrasonic sensors may have a wide range of applications in robotics. This dissertation develops and analyzes an object recognition system that uses ultrasonic sensors of the type commonly found on mobile robots. Three principal experiments are used to test the sonar recognition system: object recognition at various distances, object recognition during unconstrained motion, and softness discrimination. The hardware setup, consisting of an inexpensive Polaroid sonar and a data acquisition board, is described first. The software for ultrasound signal generation, echo detection, data collection, and data processing is then presented. Next, the dissertation describes two methods to extract information from the echoes, one in the frequency domain and the other in the time domain. The system uses the fuzzy ARTMAP neural network to recognize objects on the basis of the information content of their echoes. In order to demonstrate that the performance of the system does not depend on the specific classification method being used, the K- Nearest Neighbors (KNN) Algorithm is also implemented. KNN yields a test accuracy similar to fuzzy ARTMAP in all experiments. Finally, the dissertation describes a method for extracting features from the envelope function in order to reduce the dimension of the input vector used by the classifiers. Decreasing the size of the input vectors reduces the memory requirements of the system and makes it run faster. It is shown that this method does not affect the performance of the system dramatically and is more appropriate for some tasks. The results of these experiments demonstrate that sonar can be used to develop a low-cost, low-computation system for real-time object recognition tasks on mobile robots. This system differs from all previous approaches in that it is relatively simple, robust, fast, and inexpensive.
Classification of vegetation types in military region
NASA Astrophysics Data System (ADS)
Gonçalves, Miguel; Silva, Jose Silvestre; Bioucas-Dias, Jose
2015-10-01
In decision-making process regarding planning and execution of military operations, the terrain is a determining factor. Aerial photographs are a source of vital information for the success of an operation in hostile region, namely when the cartographic information behind enemy lines is scarce or non-existent. The objective of present work is the development of a tool capable of processing aerial photos. The methodology implemented starts with feature extraction, followed by the application of an automatic selector of features. The next step, using the k-fold cross validation technique, estimates the input parameters for the following classifiers: Sparse Multinomial Logist Regression (SMLR), K Nearest Neighbor (KNN), Linear Classifier using Principal Component Expansion on the Joint Data (PCLDC) and Multi-Class Support Vector Machine (MSVM). These classifiers were used in two different studies with distinct objectives: discrimination of vegetation's density and identification of vegetation's main components. It was found that the best classifier on the first approach is the Sparse Logistic Multinomial Regression (SMLR). On the second approach, the implemented methodology applied to high resolution images showed that the better performance was achieved by KNN classifier and PCLDC. Comparing the two approaches there is a multiscale issue, in which for different resolutions, the best solution to the problem requires different classifiers and the extraction of different features.
Application of texture analysis method for mammogram density classification
NASA Astrophysics Data System (ADS)
Nithya, R.; Santhi, B.
2017-07-01
Mammographic density is considered a major risk factor for developing breast cancer. This paper proposes an automated approach to classify breast tissue types in digital mammogram. The main objective of the proposed Computer-Aided Diagnosis (CAD) system is to investigate various feature extraction methods and classifiers to improve the diagnostic accuracy in mammogram density classification. Texture analysis methods are used to extract the features from the mammogram. Texture features are extracted by using histogram, Gray Level Co-Occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Difference Matrix (GLDM), Local Binary Pattern (LBP), Entropy, Discrete Wavelet Transform (DWT), Wavelet Packet Transform (WPT), Gabor transform and trace transform. These extracted features are selected using Analysis of Variance (ANOVA). The features selected by ANOVA are fed into the classifiers to characterize the mammogram into two-class (fatty/dense) and three-class (fatty/glandular/dense) breast density classification. This work has been carried out by using the mini-Mammographic Image Analysis Society (MIAS) database. Five classifiers are employed namely, Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). Experimental results show that ANN provides better performance than LDA, NB, KNN and SVM classifiers. The proposed methodology has achieved 97.5% accuracy for three-class and 99.37% for two-class density classification.
Automated diagnosis of dry eye using infrared thermography images
NASA Astrophysics Data System (ADS)
Acharya, U. Rajendra; Tan, Jen Hong; Koh, Joel E. W.; Sudarshan, Vidya K.; Yeo, Sharon; Too, Cheah Loon; Chua, Chua Kuang; Ng, E. Y. K.; Tong, Louis
2015-07-01
Dry Eye (DE) is a condition of either decreased tear production or increased tear film evaporation. Prolonged DE damages the cornea causing the corneal scarring, thinning and perforation. There is no single uniform diagnosis test available to date; combinations of diagnostic tests are to be performed to diagnose DE. The current diagnostic methods available are subjective, uncomfortable and invasive. Hence in this paper, we have developed an efficient, fast and non-invasive technique for the automated identification of normal and DE classes using infrared thermography images. The features are extracted from nonlinear method called Higher Order Spectra (HOS). Features are ranked using t-test ranking strategy. These ranked features are fed to various classifiers namely, K-Nearest Neighbor (KNN), Nave Bayesian Classifier (NBC), Decision Tree (DT), Probabilistic Neural Network (PNN), and Support Vector Machine (SVM) to select the best classifier using minimum number of features. Our proposed system is able to identify the DE and normal classes automatically with classification accuracy of 99.8%, sensitivity of 99.8%, and specificity if 99.8% for left eye using PNN and KNN classifiers. And we have reported classification accuracy of 99.8%, sensitivity of 99.9%, and specificity if 99.4% for right eye using SVM classifier with polynomial order 2 kernel.
Deng, Changjian; Lv, Kun; Shi, Debo; Yang, Bo; Yu, Song; He, Zhiyi; Yan, Jia
2018-06-12
In this paper, a novel feature selection and fusion framework is proposed to enhance the discrimination ability of gas sensor arrays for odor identification. Firstly, we put forward an efficient feature selection method based on the separability and the dissimilarity to determine the feature selection order for each type of feature when increasing the dimension of selected feature subsets. Secondly, the K-nearest neighbor (KNN) classifier is applied to determine the dimensions of the optimal feature subsets for different types of features. Finally, in the process of establishing features fusion, we come up with a classification dominance feature fusion strategy which conducts an effective basic feature. Experimental results on two datasets show that the recognition rates of Database I and Database II achieve 97.5% and 80.11%, respectively, when k = 1 for KNN classifier and the distance metric is correlation distance (COR), which demonstrates the superiority of the proposed feature selection and fusion framework in representing signal features. The novel feature selection method proposed in this paper can effectively select feature subsets that are conducive to the classification, while the feature fusion framework can fuse various features which describe the different characteristics of sensor signals, for enhancing the discrimination ability of gas sensors and, to a certain extent, suppressing drift effect.
Aquino, Francisco W B; Pereira-Filho, Edenir R
2015-03-01
Because of their short life span and high production and consumption rates, mobile phones are one of the contributors to WEEE (waste electrical and electronic equipment) growth in many countries. If incorrectly managed, the hazardous materials used in the assembly of these devices can pollute the environment and pose dangers for workers involved in the recycling of these materials. In this study, 144 polymer fragments originating from 50 broken or obsolete mobile phones were analyzed via laser-induced breakdown spectroscopy (LIBS) without previous treatment. The coated polymers were mainly characterized by the presence of Ag, whereas the uncoated polymers were related to the presence of Al, K, Na, Si and Ti. Classification models were proposed using black and white polymers separately in order to identify the manufacturer and origin using KNN (K-nearest neighbor), SIMCA (Soft Independent Modeling of Class Analogy) and PLS-DA (Partial Least Squares for Discriminant Analysis). For the black polymers the percentage of correct predictions was, in average, 58% taking into consideration the models for manufacturer and origin identification. In the case of white polymers, the percentage of correct predictions ranged from 72.8% (PLS-DA) to 100% (KNN). Copyright © 2014 Elsevier B.V. All rights reserved.
Aesthetic preference recognition of 3D shapes using EEG.
Chew, Lin Hou; Teo, Jason; Mountstephens, James
2016-04-01
Recognition and identification of aesthetic preference is indispensable in industrial design. Humans tend to pursue products with aesthetic values and make buying decisions based on their aesthetic preferences. The existence of neuromarketing is to understand consumer responses toward marketing stimuli by using imaging techniques and recognition of physiological parameters. Numerous studies have been done to understand the relationship between human, art and aesthetics. In this paper, we present a novel preference-based measurement of user aesthetics using electroencephalogram (EEG) signals for virtual 3D shapes with motion. The 3D shapes are designed to appear like bracelets, which is generated by using the Gielis superformula. EEG signals were collected by using a medical grade device, the B-Alert X10 from advance brain monitoring, with a sampling frequency of 256 Hz and resolution of 16 bits. The signals obtained when viewing 3D bracelet shapes were decomposed into alpha, beta, theta, gamma and delta rhythm by using time-frequency analysis, then classified into two classes, namely like and dislike by using support vector machines and K-nearest neighbors (KNN) classifiers respectively. Classification accuracy of up to 80 % was obtained by using KNN with the alpha, theta and delta rhythms as the features extracted from frontal channels, Fz, F3 and F4 to classify two classes, like and dislike.
Primary mass discrimination of high energy cosmic rays using PNN and k-NN methods
NASA Astrophysics Data System (ADS)
Rastegarzadeh, G.; Nemati, M.
2018-02-01
Probabilistic neural network (PNN) and k-Nearest Neighbors (k-NN) methods are widely used data classification techniques. In this paper, these two methods have been used to classify the Extensive Air Shower (EAS) data sets which were simulated using the CORSIKA code for three primary cosmic rays. The primaries are proton, oxygen and iron nuclei at energies of 100 TeV-10 PeV. This study is performed in the following of the investigations into the primary cosmic ray mass sensitive observables. We propose a new approach for measuring the mass sensitive observables of EAS in order to improve the primary mass separation. In this work, the EAS observables measurement has performed locally instead of total measurements. Also the relationships between the included number of observables in the classification methods and the prediction accuracy have been investigated. We have shown that the local measurements and inclusion of more mass sensitive observables in the classification processes can improve the classifying quality and also we have shown that muons and electrons energy density can be considered as primary mass sensitive observables in primary mass classification. Also it must be noted that this study is performed for Tehran observation level without considering the details of any certain EAS detection array.
Simulating ensembles of source water quality using a K-nearest neighbor resampling approach.
Towler, Erin; Rajagopalan, Balaji; Seidel, Chad; Summers, R Scott
2009-03-01
Climatological, geological, and water management factors can cause significant variability in surface water quality. As drinking water quality standards become more stringent, the ability to quantify the variability of source water quality becomes more important for decision-making and planning in water treatment for regulatory compliance. However, paucity of long-term water quality data makes it challenging to apply traditional simulation techniques. To overcome this limitation, we have developed and applied a robust nonparametric K-nearest neighbor (K-nn) bootstrap approach utilizing the United States Environmental Protection Agency's Information Collection Rule (ICR) data. In this technique, first an appropriate "feature vector" is formed from the best available explanatory variables. The nearest neighbors to the feature vector are identified from the ICR data and are resampled using a weight function. Repetition of this results in water quality ensembles, and consequently the distribution and the quantification of the variability. The main strengths of the approach are its flexibility, simplicity, and the ability to use a large amount of spatial data with limited temporal extent to provide water quality ensembles for any given location. We demonstrate this approach by applying it to simulate monthly ensembles of total organic carbon for two utilities in the U.S. with very different watersheds and to alkalinity and bromide at two other U.S. utilities.
Neighbor Discovery Algorithm in Wireless Local Area Networks Using Multi-beam Directional Antennas
NASA Astrophysics Data System (ADS)
Wang, Jin; Peng, Wei; Liu, Song
2017-10-01
Neighbor discovery is an important step for Wireless Local Area Networks (WLAN) and the use of multi-beam directional antennas can greatly improve the network performance. However, most neighbor discovery algorithms in WLAN, based on multi-beam directional antennas, can only work effectively in synchronous system but not in asynchro-nous system. And collisions at AP remain a bottleneck for neighbor discovery. In this paper, we propose two asynchrono-us neighbor discovery algorithms: asynchronous hierarchical scanning (AHS) and asynchronous directional scanning (ADS) algorithm. Both of them are based on three-way handshaking mechanism. AHS and ADS reduce collisions at AP to have a good performance in a hierarchical way and directional way respectively. In the end, the performance of the AHS and ADS are tested on OMNeT++. Moreover, it is analyzed that different application scenarios and the factors how to affect the performance of these algorithms. The simulation results show that AHS is suitable for the densely populated scenes around AP while ADS is suitable for that most of the neighborhood nodes are far from AP.
Classification of fMRI resting-state maps using machine learning techniques: A comparative study
NASA Astrophysics Data System (ADS)
Gallos, Ioannis; Siettos, Constantinos
2017-11-01
We compare the efficiency of Principal Component Analysis (PCA) and nonlinear learning manifold algorithms (ISOMAP and Diffusion maps) for classifying brain maps between groups of schizophrenia patients and healthy from fMRI scans during a resting-state experiment. After a standard pre-processing pipeline, we applied spatial Independent component analysis (ICA) to reduce (a) noise and (b) spatial-temporal dimensionality of fMRI maps. On the cross-correlation matrix of the ICA components, we applied PCA, ISOMAP and Diffusion Maps to find an embedded low-dimensional space. Finally, support-vector-machines (SVM) and k-NN algorithms were used to evaluate the performance of the algorithms in classifying between the two groups.
Application of artificial neural networks to chemostratigraphy
NASA Astrophysics Data System (ADS)
Malmgren, BjöRn A.; Nordlund, Ulf
1996-08-01
Artificial neural networks, a branch of artificial intelligence, are computer systems formed by a number of simple, highly interconnected processing units that have the ability to learn a set of target vectors from a set of associated input signals. Neural networks learn by self-adjusting a set of parameters, using some pertinent algorithm to minimize the error between the desired output and network output. We explore the potential of this approach in solving a problem involving classification of geochemical data. The data, taken from the literature, are derived from four late Quaternary zones of volcanic ash of basaltic and rhyolithic origin from the Norwegian Sea. These ash layers span the oxygen isotope zones 1, 5, 7, and 11, respectively (last 420,000 years). The data consist of nine geochemical variables (oxides) determined in each of 183 samples. We employed a three-layer back propagation neural network to assess its efficiency to optimally differentiate samples from the four ash zones on the basis of their geochemical composition. For comparison, three statistical pattern recognition techniques, linear discriminant analysis, the k-nearest neighbor (k-NN) technique, and SIMCA (soft independent modeling of class analogy), were applied to the same data. All of these showed considerably higher error rates than the artificial neural network, indicating that the back propagation network was indeed more powerful in correctly classifying the ash particles to the appropriate zone on the basis of their geochemical composition.
Acharya, U Rajendra; Bhat, Shreya; Koh, Joel E W; Bhandary, Sulatha V; Adeli, Hojjat
2017-09-01
Glaucoma is an optic neuropathy defined by characteristic damage to the optic nerve and accompanying visual field deficits. Early diagnosis and treatment are critical to prevent irreversible vision loss and ultimate blindness. Current techniques for computer-aided analysis of the optic nerve and retinal nerve fiber layer (RNFL) are expensive and require keen interpretation by trained specialists. Hence, an automated system is highly desirable for a cost-effective and accurate screening for the diagnosis of glaucoma. This paper presents a new methodology and a computerized diagnostic system. Adaptive histogram equalization is used to convert color images to grayscale images followed by convolution of these images with Leung-Malik (LM), Schmid (S), and maximum response (MR4 and MR8) filter banks. The basic microstructures in typical images are called textons. The convolution process produces textons. Local configuration pattern (LCP) features are extracted from these textons. The significant features are selected using a sequential floating forward search (SFFS) method and ranked using the statistical t-test. Finally, various classifiers are used for classification of images into normal and glaucomatous classes. A high classification accuracy of 95.8% is achieved using six features obtained from the LM filter bank and the k-nearest neighbor (kNN) classifier. A glaucoma integrative index (GRI) is also formulated to obtain a reliable and effective system. Copyright © 2017 Elsevier Ltd. All rights reserved.
Developing robust arsenic awareness prediction models using machine learning algorithms.
Singh, Sushant K; Taylor, Robert W; Rahman, Mohammad Mahmudur; Pradhan, Biswajeet
2018-04-01
Arsenic awareness plays a vital role in ensuring the sustainability of arsenic mitigation technologies. Thus far, however, few studies have dealt with the sustainability of such technologies and its associated socioeconomic dimensions. As a result, arsenic awareness prediction has not yet been fully conceptualized. Accordingly, this study evaluated arsenic awareness among arsenic-affected communities in rural India, using a structured questionnaire to record socioeconomic, demographic, and other sociobehavioral factors with an eye to assessing their association with and influence on arsenic awareness. First a logistic regression model was applied and its results compared with those produced by six state-of-the-art machine-learning algorithms (Support Vector Machine [SVM], Kernel-SVM, Decision Tree [DT], k-Nearest Neighbor [k-NN], Naïve Bayes [NB], and Random Forests [RF]) as measured by their accuracy at predicting arsenic awareness. Most (63%) of the surveyed population was found to be arsenic-aware. Significant arsenic awareness predictors were divided into three types: (1) socioeconomic factors: caste, education level, and occupation; (2) water and sanitation behavior factors: number of family members involved in water collection, distance traveled and time spent for water collection, places for defecation, and materials used for handwashing after defecation; and (3) social capital and trust factors: presence of anganwadi and people's trust in other community members, NGOs, and private agencies. Moreover, individuals' having higher social network positively contributed to arsenic awareness in the communities. Results indicated that both the SVM and the RF algorithms outperformed at overall prediction of arsenic awareness-a nonlinear classification problem. Lower-caste, less educated, and unemployed members of the population were found to be the most vulnerable, requiring immediate arsenic mitigation. To this end, local social institutions and NGOs could play a crucial role in arsenic awareness and outreach programs. Use of SVM or RF or a combination of the two, together with use of a larger sample size, could enhance the accuracy of arsenic awareness prediction. Copyright © 2018 Elsevier Ltd. All rights reserved.
Development of a Compton camera for prompt-gamma medical imaging
NASA Astrophysics Data System (ADS)
Aldawood, S.; Thirolf, P. G.; Miani, A.; Böhmer, M.; Dedes, G.; Gernhäuser, R.; Lang, C.; Liprandi, S.; Maier, L.; Marinšek, T.; Mayerhofer, M.; Schaart, D. R.; Lozano, I. Valencia; Parodi, K.
2017-11-01
A Compton camera-based detector system for photon detection from nuclear reactions induced by proton (or heavier ion) beams is under development at LMU Munich, targeting the online range verification of the particle beam in hadron therapy via prompt-gamma imaging. The detector is designed to be capable to reconstruct the photon source origin not only from the Compton scattering kinematics of the primary photon, but also to allow for tracking of the secondary Compton-scattered electrons, thus enabling a γ-source reconstruction also from incompletely absorbed photon events. The Compton camera consists of a monolithic LaBr3:Ce scintillation crystal, read out by a multi-anode PMT acting as absorber, preceded by a stacked array of 6 double-sided silicon strip detectors as scatterers. The detector components have been characterized both under offline and online conditions. The LaBr3:Ce crystal exhibits an excellent time and energy resolution. Using intense collimated 137Cs and 60Co sources, the monolithic scintillator was scanned on a fine 2D grid to generate a reference library of light amplitude distributions that allows for reconstructing the photon interaction position using a k-Nearest Neighbour (k-NN) algorithm. Systematic studies were performed to investigate the performance of the reconstruction algorithm, revealing an improvement of the spatial resolution with increasing photon energy to an optimum value of 3.7(1)mm at 1.33 MeV, achieved with the Categorical Average Pattern (CAP) modification of the k-NN algorithm.
Evaluation of normalization methods for cDNA microarray data by k-NN classification
Wu, Wei; Xing, Eric P; Myers, Connie; Mian, I Saira; Bissell, Mina J
2005-01-01
Background Non-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification. Results Ten location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using NONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatial-dependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error. Conclusion Using LOOCV error of k-NNs as the evaluation criterion, three double-bias-removal normalization strategies, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, outperform other strategies for removing spatial effect, intensity effect and scale differences from cDNA microarray data. The apparent sensitivity of k-NN LOOCV classification error to dye biases suggests that this criterion provides an informative measure for evaluating normalization methods. All the computational tools used in this study were implemented using the R language for statistical computing and graphics. PMID:16045803
Evaluation of normalization methods for cDNA microarray data by k-NN classification.
Wu, Wei; Xing, Eric P; Myers, Connie; Mian, I Saira; Bissell, Mina J
2005-07-26
Non-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification. Ten location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using NONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatial-dependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error. Using LOOCV error of k-NNs as the evaluation criterion, three double-bias-removal normalization strategies, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, outperform other strategies for removing spatial effect, intensity effect and scale differences from cDNA microarray data. The apparent sensitivity of k-NN LOOCV classification error to dye biases suggests that this criterion provides an informative measure for evaluating normalization methods. All the computational tools used in this study were implemented using the R language for statistical computing and graphics.
Fast Demand Forecast of Electric Vehicle Charging Stations for Cell Phone Application
DOE Office of Scientific and Technical Information (OSTI.GOV)
Majidpour, Mostafa; Qiu, Charlie; Chung, Ching-Yen
This paper describes the core cellphone application algorithm which has been implemented for the prediction of energy consumption at Electric Vehicle (EV) Charging Stations at UCLA. For this interactive user application, the total time of accessing database, processing the data and making the prediction, needs to be within a few seconds. We analyze four relatively fast Machine Learning based time series prediction algorithms for our prediction engine: Historical Average, kNearest Neighbor, Weighted k-Nearest Neighbor, and Lazy Learning. The Nearest Neighbor algorithm (k Nearest Neighbor with k=1) shows better performance and is selected to be the prediction algorithm implemented for themore » cellphone application. Two applications have been designed on top of the prediction algorithm: one predicts the expected available energy at the station and the other one predicts the expected charging finishing time. The total time, including accessing the database, data processing, and prediction is about one second for both applications.« less
Zhao, Yu-Xiang; Chou, Chien-Hsing
2016-01-01
In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters. PMID:27314346
Modeling ready biodegradability of fragrance materials.
Ceriani, Lidia; Papa, Ester; Kovarich, Simona; Boethling, Robert; Gramatica, Paola
2015-06-01
In the present study, quantitative structure activity relationships were developed for predicting ready biodegradability of approximately 200 heterogeneous fragrance materials. Two classification methods, classification and regression tree (CART) and k-nearest neighbors (kNN), were applied to perform the modeling. The models were validated with multiple external prediction sets, and the structural applicability domain was verified by the leverage approach. The best models had good sensitivity (internal ≥80%; external ≥68%), specificity (internal ≥80%; external 73%), and overall accuracy (≥75%). Results from the comparison with BIOWIN global models, based on group contribution method, show that specific models developed in the present study perform better in prediction than BIOWIN6, in particular for the correct classification of not readily biodegradable fragrance materials. © 2015 SETAC.
DL-ADR: a novel deep learning model for classifying genomic variants into adverse drug reactions.
Liang, Zhaohui; Huang, Jimmy Xiangji; Zeng, Xing; Zhang, Gang
2016-08-10
Genomic variations are associated with the metabolism and the occurrence of adverse reactions of many therapeutic agents. The polymorphisms on over 2000 locations of cytochrome P450 enzymes (CYP) due to many factors such as ethnicity, mutations, and inheritance attribute to the diversity of response and side effects of various drugs. The associations of the single nucleotide polymorphisms (SNPs), the internal pharmacokinetic patterns and the vulnerability of specific adverse reactions become one of the research interests of pharmacogenomics. The conventional genomewide association studies (GWAS) mainly focuses on the relation of single or multiple SNPs to a specific risk factors which are a one-to-many relation. However, there are no robust methods to establish a many-to-many network which can combine the direct and indirect associations between multiple SNPs and a serial of events (e.g. adverse reactions, metabolic patterns, prognostic factors etc.). In this paper, we present a novel deep learning model based on generative stochastic networks and hidden Markov chain to classify the observed samples with SNPs on five loci of two genes (CYP2D6 and CYP1A2) respectively to the vulnerable population of 14 types of adverse reactions. A supervised deep learning model is proposed in this study. The revised generative stochastic networks (GSN) model with transited by the hidden Markov chain is used. The data of the training set are collected from clinical observation. The training set is composed of 83 observations of blood samples with the genotypes respectively on CYP2D6*2, *10, *14 and CYP1A2*1C, *1 F. The samples are genotyped by the polymerase chain reaction (PCR) method. A hidden Markov chain is used as the transition operator to simulate the probabilistic distribution. The model can perform learning at lower cost compared to the conventional maximal likelihood method because the transition distribution is conditional on the previous state of the hidden Markov chain. A least square loss (LASSO) algorithm and a k-Nearest Neighbors (kNN) algorithm are used as the baselines for comparison and to evaluate the performance of our proposed deep learning model. There are 53 adverse reactions reported during the observation. They are assigned to 14 categories. In the comparison of classification accuracy, the deep learning model shows superiority over the LASSO and kNN model with a rate over 80 %. In the comparison of reliability, the deep learning model shows the best stability among the three models. Machine learning provides a new method to explore the complex associations among genomic variations and multiple events in pharmacogenomics studies. The new deep learning algorithm is capable of classifying various SNPs to the corresponding adverse reactions. We expect that as more genomic variations are added as features and more observations are made, the deep learning model can improve its performance and can act as a black-box but reliable verifier for other GWAS studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boutilier, Justin J., E-mail: j.boutilier@mail.utoronto.ca; Lee, Taewoo; Craig, Tim
Purpose: To develop and evaluate the clinical applicability of advanced machine learning models that simultaneously predict multiple optimization objective function weights from patient geometry for intensity-modulated radiation therapy of prostate cancer. Methods: A previously developed inverse optimization method was applied retrospectively to determine optimal objective function weights for 315 treated patients. The authors used an overlap volume ratio (OV) of bladder and rectum for different PTV expansions and overlap volume histogram slopes (OVSR and OVSB for the rectum and bladder, respectively) as explanatory variables that quantify patient geometry. Using the optimal weights as ground truth, the authors trained and appliedmore » three prediction models: logistic regression (LR), multinomial logistic regression (MLR), and weighted K-nearest neighbor (KNN). The population average of the optimal objective function weights was also calculated. Results: The OV at 0.4 cm and OVSR at 0.1 cm features were found to be the most predictive of the weights. The authors observed comparable performance (i.e., no statistically significant difference) between LR, MLR, and KNN methodologies, with LR appearing to perform the best. All three machine learning models outperformed the population average by a statistically significant amount over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and dose to the bladder, rectum, CTV, and PTV. When comparing the weights directly, the LR model predicted bladder and rectum weights that had, on average, a 73% and 74% relative improvement over the population average weights, respectively. The treatment plans resulting from the LR weights had, on average, a rectum V70Gy that was 35% closer to the clinical plan and a bladder V70Gy that was 29% closer, compared to the population average weights. Similar results were observed for all other clinical metrics. Conclusions: The authors demonstrated that the KNN and MLR weight prediction methodologies perform comparably to the LR model and can produce clinical quality treatment plans by simultaneously predicting multiple weights that capture trade-offs associated with sparing multiple OARs.« less
Integrated Sensing Processor, Phase 2
2005-12-01
performance analysis for several baseline classifiers including neural nets, linear classifiers, and kNN classifiers. Use of CCDR as a preprocessing step...below the level of the benchmark non-linear classifier for this problem ( kNN ). Furthermore, the CCDR preconditioned kNN achieved a 10% improvement over...the benchmark kNN without CCDR. Finally, we found an important connection between intrinsic dimension estimation via entropic graphs and the optimal
Tang, Yunwei; Jing, Linhai; Li, Hui; Liu, Qingjie; Yan, Qi; Li, Xiuxia
2016-01-01
This study explores the ability of WorldView-2 (WV-2) imagery for bamboo mapping in a mountainous region in Sichuan Province, China. A large area of this place is covered by shadows in the image, and only a few sampled points derived were useful. In order to identify bamboos based on sparse training data, the sample size was expanded according to the reflectance of multispectral bands selected using the principal component analysis (PCA). Then, class separability based on the training data was calculated using a feature space optimization method to select the features for classification. Four regular object-based classification methods were applied based on both sets of training data. The results show that the k-nearest neighbor (k-NN) method produced the greatest accuracy. A geostatistically-weighted k-NN classifier, accounting for the spatial correlation between classes, was then applied to further increase the accuracy. It achieved 82.65% and 93.10% of the producer’s and user’s accuracies respectively for the bamboo class. The canopy densities were estimated to explain the result. This study demonstrates that the WV-2 image can be used to identify small patches of understory bamboos given limited known samples, and the resulting bamboo distribution facilitates the assessments of the habitats of giant pandas. PMID:27879661
Dynamic analysis environment for nuclear forensic analyses
NASA Astrophysics Data System (ADS)
Stork, C. L.; Ummel, C. C.; Stuart, D. S.; Bodily, S.; Goldblum, B. L.
2017-01-01
A Dynamic Analysis Environment (DAE) software package is introduced to facilitate group inclusion/exclusion method testing, evaluation and comparison for pre-detonation nuclear forensics applications. Employing DAE, the multivariate signatures of a questioned material can be compared to the signatures for different, known groups, enabling the linking of the questioned material to its potential process, location, or fabrication facility. Advantages of using DAE for group inclusion/exclusion include built-in query tools for retrieving data of interest from a database, the recording and documentation of all analysis steps, a clear visualization of the analysis steps intelligible to a non-expert, and the ability to integrate analysis tools developed in different programming languages. Two group inclusion/exclusion methods are implemented in DAE: principal component analysis, a parametric feature extraction method, and k nearest neighbors, a nonparametric pattern recognition method. Spent Fuel Isotopic Composition (SFCOMPO), an open source international database of isotopic compositions for spent nuclear fuels (SNF) from 14 reactors, is used to construct PCA and KNN models for known reactor groups, and 20 simulated SNF samples are utilized in evaluating the performance of these group inclusion/exclusion models. For all 20 simulated samples, PCA in conjunction with the Q statistic correctly excludes a large percentage of reactor groups and correctly includes the true reactor of origination. Employing KNN, 14 of the 20 simulated samples are classified to their true reactor of origination.
Mutual proximity graphs for improved reachability in music recommendation.
Flexer, Arthur; Stevens, Jeff
2018-01-01
This paper is concerned with the impact of hubness, a general problem of machine learning in high-dimensional spaces, on a real-world music recommendation system based on visualisation of a k-nearest neighbour (knn) graph. Due to a problem of measuring distances in high dimensions, hub objects are recommended over and over again while anti-hubs are nonexistent in recommendation lists, resulting in poor reachability of the music catalogue. We present mutual proximity graphs, which are an alternative to knn and mutual knn graphs, and are able to avoid hub vertices having abnormally high connectivity. We show that mutual proximity graphs yield much better graph connectivity resulting in improved reachability compared to knn graphs, mutual knn graphs and mutual knn graphs enhanced with minimum spanning trees, while simultaneously reducing the negative effects of hubness.
Mutual proximity graphs for improved reachability in music recommendation
Flexer, Arthur; Stevens, Jeff
2018-01-01
This paper is concerned with the impact of hubness, a general problem of machine learning in high-dimensional spaces, on a real-world music recommendation system based on visualisation of a k-nearest neighbour (knn) graph. Due to a problem of measuring distances in high dimensions, hub objects are recommended over and over again while anti-hubs are nonexistent in recommendation lists, resulting in poor reachability of the music catalogue. We present mutual proximity graphs, which are an alternative to knn and mutual knn graphs, and are able to avoid hub vertices having abnormally high connectivity. We show that mutual proximity graphs yield much better graph connectivity resulting in improved reachability compared to knn graphs, mutual knn graphs and mutual knn graphs enhanced with minimum spanning trees, while simultaneously reducing the negative effects of hubness. PMID:29348779
Study of parameters of the nearest neighbour shared algorithm on clustering documents
NASA Astrophysics Data System (ADS)
Mustika Rukmi, Alvida; Budi Utomo, Daryono; Imro’atus Sholikhah, Neni
2018-03-01
Document clustering is one way of automatically managing documents, extracting of document topics and fastly filtering information. Preprocess of clustering documents processed by textmining consists of: keyword extraction using Rapid Automatic Keyphrase Extraction (RAKE) and making the document as concept vector using Latent Semantic Analysis (LSA). Furthermore, the clustering process is done so that the documents with the similarity of the topic are in the same cluster, based on the preprocesing by textmining performed. Shared Nearest Neighbour (SNN) algorithm is a clustering method based on the number of "nearest neighbors" shared. The parameters in the SNN Algorithm consist of: k nearest neighbor documents, ɛ shared nearest neighbor documents and MinT minimum number of similar documents, which can form a cluster. Characteristics The SNN algorithm is based on shared ‘neighbor’ properties. Each cluster is formed by keywords that are shared by the documents. SNN algorithm allows a cluster can be built more than one keyword, if the value of the frequency of appearing keywords in document is also high. Determination of parameter values on SNN algorithm affects document clustering results. The higher parameter value k, will increase the number of neighbor documents from each document, cause similarity of neighboring documents are lower. The accuracy of each cluster is also low. The higher parameter value ε, caused each document catch only neighbor documents that have a high similarity to build a cluster. It also causes more unclassified documents (noise). The higher the MinT parameter value cause the number of clusters will decrease, since the number of similar documents can not form clusters if less than MinT. Parameter in the SNN Algorithm determine performance of clustering result and the amount of noise (unclustered documents ). The Silhouette coeffisient shows almost the same result in many experiments, above 0.9, which means that SNN algorithm works well with different parameter values.
Li, Qu; Yao, Min; Yang, Jianhua; Xu, Ning
2014-01-01
Online friend recommendation is a fast developing topic in web mining. In this paper, we used SVD matrix factorization to model user and item feature vector and used stochastic gradient descent to amend parameter and improve accuracy. To tackle cold start problem and data sparsity, we used KNN model to influence user feature vector. At the same time, we used graph theory to partition communities with fairly low time and space complexity. What is more, matrix factorization can combine online and offline recommendation. Experiments showed that the hybrid recommendation algorithm is able to recommend online friends with good accuracy.
Parametric classification of handvein patterns based on texture features
NASA Astrophysics Data System (ADS)
Al Mahafzah, Harbi; Imran, Mohammad; Supreetha Gowda H., D.
2018-04-01
In this paper, we have developed Biometric recognition system adopting hand based modality Handvein,which has the unique pattern for each individual and it is impossible to counterfeit and fabricate as it is an internal feature. We have opted in choosing feature extraction algorithms such as LBP-visual descriptor, LPQ-blur insensitive texture operator, Log-Gabor-Texture descriptor. We have chosen well known classifiers such as KNN and SVM for classification. We have experimented and tabulated results of single algorithm recognition rate for Handvein under different distance measures and kernel options. The feature level fusion is carried out which increased the performance level.
Multiobjective immune algorithm with nondominated neighbor-based selection.
Gong, Maoguo; Jiao, Licheng; Du, Haifeng; Bo, Liefeng
2008-01-01
Abstract Nondominated Neighbor Immune Algorithm (NNIA) is proposed for multiobjective optimization by using a novel nondominated neighbor-based selection technique, an immune inspired operator, two heuristic search operators, and elitism. The unique selection technique of NNIA only selects minority isolated nondominated individuals in the population. The selected individuals are then cloned proportionally to their crowding-distance values before heuristic search. By using the nondominated neighbor-based selection and proportional cloning, NNIA pays more attention to the less-crowded regions of the current trade-off front. We compare NNIA with NSGA-II, SPEA2, PESA-II, and MISA in solving five DTLZ problems, five ZDT problems, and three low-dimensional problems. The statistical analysis based on three performance metrics including the coverage of two sets, the convergence metric, and the spacing, show that the unique selection method is effective, and NNIA is an effective algorithm for solving multiobjective optimization problems. The empirical study on NNIA's scalability with respect to the number of objectives shows that the new algorithm scales well along the number of objectives.
Distance-Based Phylogenetic Methods Around a Polytomy.
Davidson, Ruth; Sullivant, Seth
2014-01-01
Distance-based phylogenetic algorithms attempt to solve the NP-hard least-squares phylogeny problem by mapping an arbitrary dissimilarity map representing biological data to a tree metric. The set of all dissimilarity maps is a Euclidean space properly containing the space of all tree metrics as a polyhedral fan. Outputs of distance-based tree reconstruction algorithms such as UPGMA and neighbor-joining are points in the maximal cones in the fan. Tree metrics with polytomies lie at the intersections of maximal cones. A phylogenetic algorithm divides the space of all dissimilarity maps into regions based upon which combinatorial tree is reconstructed by the algorithm. Comparison of phylogenetic methods can be done by comparing the geometry of these regions. We use polyhedral geometry to compare the local nature of the subdivisions induced by least-squares phylogeny, UPGMA, and neighbor-joining when the true tree has a single polytomy with exactly four neighbors. Our results suggest that in some circumstances, UPGMA and neighbor-joining poorly match least-squares phylogeny.
NASA Astrophysics Data System (ADS)
Poyatos, Rafael; Sus, Oliver; Vilà-Cabrera, Albert; Vayreda, Jordi; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi
2016-04-01
Plant functional traits are increasingly being used in ecosystem ecology thanks to the growing availability of large ecological databases. However, these databases usually contain a large fraction of missing data because measuring plant functional traits systematically is labour-intensive and because most databases are compilations of datasets with different sampling designs. As a result, within a given database, there is an inevitable variability in the number of traits available for each data entry and/or the species coverage in a given geographical area. The presence of missing data may severely bias trait-based analyses, such as the quantification of trait covariation or trait-environment relationships and may hamper efforts towards trait-based modelling of ecosystem biogeochemical cycles. Several data imputation (i.e. gap-filling) methods have been recently tested on compiled functional trait databases, but the performance of imputation methods applied to a functional trait database with a regular spatial sampling has not been thoroughly studied. Here, we assess the effects of data imputation on five tree functional traits (leaf biomass to sapwood area ratio, foliar nitrogen, maximum height, specific leaf area and wood density) in the Ecological and Forest Inventory of Catalonia, an extensive spatial database (covering 31900 km2). We tested the performance of species mean imputation, single imputation by the k-nearest neighbors algorithm (kNN) and a multiple imputation method, Multivariate Imputation with Chained Equations (MICE) at different levels of missing data (10%, 30%, 50%, and 80%). We also assessed the changes in imputation performance when additional predictors (species identity, climate, forest structure, spatial structure) were added in kNN and MICE imputations. We evaluated the imputed datasets using a battery of indexes describing departure from the complete dataset in trait distribution, in the mean prediction error, in the correlation matrix and in selected bivariate trait relationships. MICE yielded imputations which better preserved the variability and covariance structure of the data and provided an estimate of between-imputation uncertainty. We found that adding species identity as a predictor in MICE and kNN improved imputation for all traits, but adding climate did not lead to any appreciable improvement. However, forest structure and spatial structure did reduce imputation errors in maximum height and in leaf biomass to sapwood area ratios, respectively. Although species mean imputations showed the lowest error for 3 out the 5 studied traits, dataset-averaged errors were lowest for MICE imputations with all additional predictors, when missing data levels were 50% or lower. Species mean imputations always resulted in larger errors in the correlation matrix and appreciably altered the studied bivariate trait relationships. In conclusion, MICE imputations using species identity, climate, forest structure and spatial structure as predictors emerged as the most suitable method of the ones tested here, but it was also evident that imputation performance deteriorates at high levels of missing data (80%).
A Robust False Matching Points Detection Method for Remote Sensing Image Registration
NASA Astrophysics Data System (ADS)
Shan, X. J.; Tang, P.
2015-04-01
Given the influences of illumination, imaging angle, and geometric distortion, among others, false matching points still occur in all image registration algorithms. Therefore, false matching points detection is an important step in remote sensing image registration. Random Sample Consensus (RANSAC) is typically used to detect false matching points. However, RANSAC method cannot detect all false matching points in some remote sensing images. Therefore, a robust false matching points detection method based on Knearest- neighbour (K-NN) graph (KGD) is proposed in this method to obtain robust and high accuracy result. The KGD method starts with the construction of the K-NN graph in one image. K-NN graph can be first generated for each matching points and its K nearest matching points. Local transformation model for each matching point is then obtained by using its K nearest matching points. The error of each matching point is computed by using its transformation model. Last, L matching points with largest error are identified false matching points and removed. This process is iterative until all errors are smaller than the given threshold. In addition, KGD method can be used in combination with other methods, such as RANSAC. Several remote sensing images with different resolutions and terrains are used in the experiment. We evaluate the performance of KGD method, RANSAC + KGD method, RANSAC, and Graph Transformation Matching (GTM). The experimental results demonstrate the superior performance of the KGD and RANSAC + KGD methods.
Automatic Fibrosis Quantification By Using a k-NN Classificator
2001-10-25
Fluthrope, “Stages in fiber breakdown in duchenne muscular dystrophy ,” J. Neurol. Sci., vol. 24, pp. 179– 186, 1975. [6] F. Cornelio and I. Dones, “ Muscle ...an automatic algorithm to measure fibrosis in muscle sections of mdx mice, a mutant species used as a model of the Duchenne dystrophy . The al- gorithm...fiber degeneration and necro- sis in muscular dystrophy and other muscle diseases: cytochem- ical and immunocytochemical data,” Ann. Neurol., vol. 16
Forecasting municipal solid waste generation using artificial intelligence modelling approaches.
Abbasi, Maryam; El Hanandeh, Ali
2016-10-01
Municipal solid waste (MSW) management is a major concern to local governments to protect human health, the environment and to preserve natural resources. The design and operation of an effective MSW management system requires accurate estimation of future waste generation quantities. The main objective of this study was to develop a model for accurate forecasting of MSW generation that helps waste related organizations to better design and operate effective MSW management systems. Four intelligent system algorithms including support vector machine (SVM), adaptive neuro-fuzzy inference system (ANFIS), artificial neural network (ANN) and k-nearest neighbours (kNN) were tested for their ability to predict monthly waste generation in the Logan City Council region in Queensland, Australia. Results showed artificial intelligence models have good prediction performance and could be successfully applied to establish municipal solid waste forecasting models. Using machine learning algorithms can reliably predict monthly MSW generation by training with waste generation time series. In addition, results suggest that ANFIS system produced the most accurate forecasts of the peaks while kNN was successful in predicting the monthly averages of waste quantities. Based on the results, the total annual MSW generated in Logan City will reach 9.4×10(7)kg by 2020 while the peak monthly waste will reach 9.37×10(6)kg. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bennet, Jaison; Ganaprakasam, Chilambuchelvan Arul; Arputharaj, Kannan
2014-01-01
Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.
Human action recognition based on kinematic similarity in real time
Chen, Longting; Luo, Ailing; Zhang, Sicong
2017-01-01
Human action recognition using 3D pose data has gained a growing interest in the field of computer robotic interfaces and pattern recognition since the availability of hardware to capture human pose. In this paper, we propose a fast, simple, and powerful method of human action recognition based on human kinematic similarity. The key to this method is that the action descriptor consists of joints position, angular velocity and angular acceleration, which can meet the different individual sizes and eliminate the complex normalization. The angular parameters of joints within a short sliding time window (approximately 5 frames) around the current frame are used to express each pose frame of human action sequence. Moreover, three modified KNN (k-nearest-neighbors algorithm) classifiers are employed in our method: one for achieving the confidence of every frame in the training step, one for estimating the frame label of each descriptor, and one for classifying actions. Additional estimating of the frame’s time label makes it possible to address single input frames. This approach can be used on difficult, unsegmented sequences. The proposed method is efficient and can be run in real time. The research shows that many public datasets are irregularly segmented, and a simple method is provided to regularize the datasets. The approach is tested on some challenging datasets such as MSR-Action3D, MSRDailyActivity3D, and UTD-MHAD. The results indicate our method achieves a higher accuracy. PMID:29073131
Construction accident narrative classification: An evaluation of text mining techniques.
Goh, Yang Miang; Ubeynarayana, C U
2017-11-01
Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Han; Chen, Xuefeng; Du, Zhaohui; Li, Xiang; Yan, Ruqiang
2016-04-01
Fault information of aero-engine bearings presents two particular phenomena, i.e., waveform distortion and impulsive feature frequency band dispersion, which leads to a challenging problem for current techniques of bearing fault diagnosis. Moreover, although many progresses of sparse representation theory have been made in feature extraction of fault information, the theory also confronts inevitable performance degradation due to the fact that relatively weak fault information has not sufficiently prominent and sparse representations. Therefore, a novel nonlocal sparse model (coined NLSM) and its algorithm framework has been proposed in this paper, which goes beyond simple sparsity by introducing more intrinsic structures of feature information. This work adequately exploits the underlying prior information that feature information exhibits nonlocal self-similarity through clustering similar signal fragments and stacking them together into groups. Within this framework, the prior information is transformed into a regularization term and a sparse optimization problem, which could be solved through block coordinate descent method (BCD), is formulated. Additionally, the adaptive structural clustering sparse dictionary learning technique, which utilizes k-Nearest-Neighbor (kNN) clustering and principal component analysis (PCA) learning, is adopted to further enable sufficient sparsity of feature information. Moreover, the selection rule of regularization parameter and computational complexity are described in detail. The performance of the proposed framework is evaluated through numerical experiment and its superiority with respect to the state-of-the-art method in the field is demonstrated through the vibration signals of experimental rig of aircraft engine bearings.
A traveling salesman approach for predicting protein functions.
Johnson, Olin; Liu, Jing
2006-10-12
Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm 1 on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems.
A traveling salesman approach for predicting protein functions
Johnson, Olin; Liu, Jing
2006-01-01
Background Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Results Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm [1] on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Conclusion Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems. PMID:17147783
Evaluation of Data Processing Techniques for Unobtrusive Gait Authentication
2014-03-01
scatter plot depicting the performance of kNN , by TER, on all experimental mixtures...30 Table 9. Mean TER of SVM and kNN performance with different voting parameters...performance on XYZ-axis data. ...........................................................51 Table 19. kNN and SVM results in back pocket carrying
Fabrication of transparent lead-free KNN glass ceramics by incorporation method
2012-01-01
The incorporation method was employed to produce potassium sodium niobate [KNN] (K0.5Na0.5NbO3) glass ceramics from the KNN-SiO2 system. This incorporation method combines a simple mixed-oxide technique for producing KNN powder and a conventional melt-quenching technique to form the resulting glass. KNN was calcined at 800°C and subsequently mixed with SiO2 in the KNN:SiO2 ratio of 75:25 (mol%). The successfully produced optically transparent glass was then subjected to a heat treatment schedule at temperatures ranging from 525°C -575°C for crystallization. All glass ceramics of more than 40% transmittance crystallized into KNN nanocrystals that were rectangular in shape and dispersed well throughout the glass matrix. The crystal size and crystallinity were found to increase with increasing heat treatment temperature, which in turn plays an important role in controlling the properties of the glass ceramics, including physical, optical, and dielectric properties. The transparency of the glass samples decreased with increasing crystal size. The maximum room temperature dielectric constant (εr) was as high as 474 at 10 kHz with an acceptable low loss (tanδ) around 0.02 at 10 kHz. PMID:22340426
NASA Astrophysics Data System (ADS)
Teye, Ernest; Huang, Xingyi; Dai, Huang; Chen, Quansheng
2013-10-01
Quick, accurate and reliable technique for discrimination of cocoa beans according to geographical origin is essential for quality control and traceability management. This current study presents the application of Near Infrared Spectroscopy technique and multivariate classification for the differentiation of Ghana cocoa beans. A total of 194 cocoa bean samples from seven cocoa growing regions were used. Principal component analysis (PCA) was used to extract relevant information from the spectral data and this gave visible cluster trends. The performance of four multivariate classification methods: Linear discriminant analysis (LDA), K-nearest neighbors (KNN), Back propagation artificial neural network (BPANN) and Support vector machine (SVM) were compared. The performances of the models were optimized by cross validation. The results revealed that; SVM model was superior to all the mathematical methods with a discrimination rate of 100% in both the training and prediction set after preprocessing with Mean centering (MC). BPANN had a discrimination rate of 99.23% for the training set and 96.88% for prediction set. While LDA model had 96.15% and 90.63% for the training and prediction sets respectively. KNN model had 75.01% for the training set and 72.31% for prediction set. The non-linear classification methods used were superior to the linear ones. Generally, the results revealed that NIR Spectroscopy coupled with SVM model could be used successfully to discriminate cocoa beans according to their geographical origins for effective quality assurance.
MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.
Liu, Ke; Peng, Shengwen; Wu, Junqiu; Zhai, Chengxiang; Mamitsuka, Hiroshi; Zhu, Shanfeng
2015-06-15
Medical Subject Headings (MeSHs) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications of biomedical information retrieval and text mining. To reduce the time and financial cost of manual annotation, NLM has developed a software package, Medical Text Indexer (MTI), for assisting MeSH annotation, which uses k-nearest neighbors (KNN), pattern matching and indexing rules. Other types of information, such as prediction by MeSH classifiers (trained separately), can also be used for automatic MeSH annotation. However, existing methods cannot effectively integrate multiple evidence for MeSH annotation. We propose a novel framework, MeSHLabeler, to integrate multiple evidence for accurate MeSH annotation by using 'learning to rank'. Evidence includes numerous predictions from MeSH classifiers, KNN, pattern matching, MTI and the correlation between different MeSH terms, etc. Each MeSH classifier is trained independently, and thus prediction scores from different classifiers are incomparable. To address this issue, we have developed an effective score normalization procedure to improve the prediction accuracy. MeSHLabeler won the first place in Task 2A of 2014 BioASQ challenge, achieving the Micro F-measure of 0.6248 for 9,040 citations provided by the BioASQ challenge. Note that this accuracy is around 9.15% higher than 0.5724, obtained by MTI. The software is available upon request. © The Author 2015. Published by Oxford University Press.
MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence
Liu, Ke; Peng, Shengwen; Wu, Junqiu; Zhai, Chengxiang; Mamitsuka, Hiroshi; Zhu, Shanfeng
2015-01-01
Motivation: Medical Subject Headings (MeSHs) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications of biomedical information retrieval and text mining. To reduce the time and financial cost of manual annotation, NLM has developed a software package, Medical Text Indexer (MTI), for assisting MeSH annotation, which uses k-nearest neighbors (KNN), pattern matching and indexing rules. Other types of information, such as prediction by MeSH classifiers (trained separately), can also be used for automatic MeSH annotation. However, existing methods cannot effectively integrate multiple evidence for MeSH annotation. Methods: We propose a novel framework, MeSHLabeler, to integrate multiple evidence for accurate MeSH annotation by using ‘learning to rank’. Evidence includes numerous predictions from MeSH classifiers, KNN, pattern matching, MTI and the correlation between different MeSH terms, etc. Each MeSH classifier is trained independently, and thus prediction scores from different classifiers are incomparable. To address this issue, we have developed an effective score normalization procedure to improve the prediction accuracy. Results: MeSHLabeler won the first place in Task 2A of 2014 BioASQ challenge, achieving the Micro F-measure of 0.6248 for 9,040 citations provided by the BioASQ challenge. Note that this accuracy is around 9.15% higher than 0.5724, obtained by MTI. Availability and implementation: The software is available upon request. Contact: zhusf@fudan.edu.cn PMID:26072501
Detecting falls with wearable sensors using machine learning techniques.
Özdemir, Ahmet Turan; Barshan, Billur
2014-06-18
Falls are a serious public health problem and possibly life threatening for people in fall risk groups. We develop an automated fall detection system with wearable motion sensor units fitted to the subjects' body at six different positions. Each unit comprises three tri-axial devices (accelerometer, gyroscope, and magnetometer/compass). Fourteen volunteers perform a standardized set of movements including 20 voluntary falls and 16 activities of daily living (ADLs), resulting in a large dataset with 2520 trials. To reduce the computational complexity of training and testing the classifiers, we focus on the raw data for each sensor in a 4 s time window around the point of peak total acceleration of the waist sensor, and then perform feature extraction and reduction. Most earlier studies on fall detection employ rule-based approaches that rely on simple thresholding of the sensor outputs. We successfully distinguish falls from ADLs using six machine learning techniques (classifiers): the k-nearest neighbor (k-NN) classifier, least squares method (LSM), support vector machines (SVM), Bayesian decision making (BDM), dynamic time warping (DTW), and artificial neural networks (ANNs). We compare the performance and the computational complexity of the classifiers and achieve the best results with the k-NN classifier and LSM, with sensitivity, specificity, and accuracy all above 99%. These classifiers also have acceptable computational requirements for training and testing. Our approach would be applicable in real-world scenarios where data records of indeterminate length, containing multiple activities in sequence, are recorded.
Karacavus, Seyhan; Yılmaz, Bülent; Tasdemir, Arzu; Kayaaltı, Ömer; Kaya, Eser; İçer, Semra; Ayyıldız, Oguzhan
2018-04-01
We investigated the association between the textural features obtained from 18 F-FDG images, metabolic parameters (SUVmax , SUVmean, MTV, TLG), and tumor histopathological characteristics (stage and Ki-67 proliferation index) in non-small cell lung cancer (NSCLC). The FDG-PET images of 67 patients with NSCLC were evaluated. MATLAB technical computing language was employed in the extraction of 137 features by using first order statistics (FOS), gray-level co-occurrence matrix (GLCM), gray-level run length matrix (GLRLM), and Laws' texture filters. Textural features and metabolic parameters were statistically analyzed in terms of good discrimination power between tumor stages, and selected features/parameters were used in the automatic classification by k-nearest neighbors (k-NN) and support vector machines (SVM). We showed that one textural feature (gray-level nonuniformity, GLN) obtained using GLRLM approach and nine textural features using Laws' approach were successful in discriminating all tumor stages, unlike metabolic parameters. There were significant correlations between Ki-67 index and some of the textural features computed using Laws' method (r = 0.6, p = 0.013). In terms of automatic classification of tumor stage, the accuracy was approximately 84% with k-NN classifier (k = 3) and SVM, using selected five features. Texture analysis of FDG-PET images has a potential to be an objective tool to assess tumor histopathological characteristics. The textural features obtained using Laws' approach could be useful in the discrimination of tumor stage.
Quantum chemical and statistical study of megazol-derived compounds with trypanocidal activity
NASA Astrophysics Data System (ADS)
Rosselli, F. P.; Albuquerque, C. N.; da Silva, A. B. F.
In this work we performed a structure-activity relationship (SAR) study with the aim to correlate molecular properties of the megazol compound and 10 of its analogs with the biological activity against Trypanosoma cruzi (trypanocidal or antichagasic activity) presented by these molecules. The biological activity indication was obtained from in vitro tests and the molecular properties (variables or descriptors) were obtained from the optimized chemical structures by using the PM3 semiempirical method. It was calculated ˜80 molecular properties selected among steric, constitutional, electronic, and lipophilicity properties. In order to reduce dimensionality and investigate which subset of variables (descriptors) would be more effective in classifying the compounds studied, according to their degree of trypanocidal activity, we employed statistical methodologies (pattern recognition and classification techniques) such as principal component analysis (PCA), hierarchical cluster analysis (HCA), K-nearest neighbor (KNN), and discriminant function analysis (DFA). These methods showed that the descriptors molecular mass (MM), energy of the second lowest unoccupied molecular orbital (LUMO+1), charge on the first nitrogen at substituent 2 (qN'), dihedral angles (D1 and D2), bond length between atom C4 and its substituent (L4), Moriguchi octanol-partition coefficient (MLogP), and length-to-breadth ratio (L/Bw) were the variables responsible for the separation between active and inactive compounds against T. cruzi. Afterwards, the PCA, KNN, and DFA models built in this work were used to perform trypanocidal activity predictions for eight new megazol analog compounds.
NASA Astrophysics Data System (ADS)
Heidari, Morteza; Zargari Khuzani, Abolfazl; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qian, Wei; Zheng, Bin
2018-03-01
Both conventional and deep machine learning has been used to develop decision-support tools applied in medical imaging informatics. In order to take advantages of both conventional and deep learning approach, this study aims to investigate feasibility of applying a locally preserving projection (LPP) based feature regeneration algorithm to build a new machine learning classifier model to predict short-term breast cancer risk. First, a computer-aided image processing scheme was used to segment and quantify breast fibro-glandular tissue volume. Next, initially computed 44 image features related to the bilateral mammographic tissue density asymmetry were extracted. Then, an LLP-based feature combination method was applied to regenerate a new operational feature vector using a maximal variance approach. Last, a k-nearest neighborhood (KNN) algorithm based machine learning classifier using the LPP-generated new feature vectors was developed to predict breast cancer risk. A testing dataset involving negative mammograms acquired from 500 women was used. Among them, 250 were positive and 250 remained negative in the next subsequent mammography screening. Applying to this dataset, LLP-generated feature vector reduced the number of features from 44 to 4. Using a leave-onecase-out validation method, area under ROC curve produced by the KNN classifier significantly increased from 0.62 to 0.68 (p < 0.05) and odds ratio was 4.60 with a 95% confidence interval of [3.16, 6.70]. Study demonstrated that this new LPP-based feature regeneration approach enabled to produce an optimal feature vector and yield improved performance in assisting to predict risk of women having breast cancer detected in the next subsequent mammography screening.
Griffanti, Ludovica; Zamboni, Giovanna; Khan, Aamira; Li, Linxin; Bonifacio, Guendalina; Sundaresan, Vaanathi; Schulz, Ursula G; Kuker, Wilhelm; Battaglini, Marco; Rothwell, Peter M; Jenkinson, Mark
2016-11-01
Reliable quantification of white matter hyperintensities of presumed vascular origin (WMHs) is increasingly needed, given the presence of these MRI findings in patients with several neurological and vascular disorders, as well as in elderly healthy subjects. We present BIANCA (Brain Intensity AbNormality Classification Algorithm), a fully automated, supervised method for WMH detection, based on the k-nearest neighbour (k-NN) algorithm. Relative to previous k-NN based segmentation methods, BIANCA offers different options for weighting the spatial information, local spatial intensity averaging, and different options for the choice of the number and location of the training points. BIANCA is multimodal and highly flexible so that the user can adapt the tool to their protocol and specific needs. We optimised and validated BIANCA on two datasets with different MRI protocols and patient populations (a "predominantly neurodegenerative" and a "predominantly vascular" cohort). BIANCA was first optimised on a subset of images for each dataset in terms of overlap and volumetric agreement with a manually segmented WMH mask. The correlation between the volumes extracted with BIANCA (using the optimised set of options), the volumes extracted from the manual masks and visual ratings showed that BIANCA is a valid alternative to manual segmentation. The optimised set of options was then applied to the whole cohorts and the resulting WMH volume estimates showed good correlations with visual ratings and with age. Finally, we performed a reproducibility test, to evaluate the robustness of BIANCA, and compared BIANCA performance against existing methods. Our findings suggest that BIANCA, which will be freely available as part of the FSL package, is a reliable method for automated WMH segmentation in large cross-sectional cohort studies. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Voice based gender classification using machine learning
NASA Astrophysics Data System (ADS)
Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.
2017-11-01
Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.
Chen, Wei; Yu, Zunxiong; Pang, Jinshan; Yu, Peng; Tan, Guoxin; Ning, Chengyun
2017-01-01
The discovery of piezoelectricity in natural bone has attracted extensive research in emulating biological electricity for various tissue regeneration. Here, we carried out experiments to build biocompatible potassium sodium niobate (KNN) ceramics. Then, influence substrate surface charges on bovine serum albumin (BSA) protein adsorption and cell proliferation on KNN ceramics surfaces was investigated. KNN ceramics with piezoelectric constant of ~93 pC/N and relative density of ~93% were fabricated. The adsorption of protein on the positive surfaces (Ps) and negative surfaces (Ns) of KNN ceramics with piezoelectric constant of ~93 pC/N showed greater protein adsorption capacity than that on non-polarized surfaces (NPs). Biocompatibility of KNN ceramics was verified through cell culturing and live/dead cell staining of MC3T3. The cells experiment showed enhanced cell growth on the positive surfaces (Ps) and negative surfaces (Ns) compared to non-polarized surfaces (NPs). These results revealed that KNN ceramics had great potential to be used to understand the effect of surface potential on cells processes and would benefit future research in designing piezoelectric materials for tissue regeneration. PMID:28772704
Chen, Wei; Yu, Zunxiong; Pang, Jinshan; Yu, Peng; Tan, Guoxin; Ning, Chengyun
2017-03-26
The discovery of piezoelectricity in natural bone has attracted extensive research in emulating biological electricity for various tissue regeneration. Here, we carried out experiments to build biocompatible potassium sodium niobate (KNN) ceramics. Then, influence substrate surface charges on bovine serum albumin (BSA) protein adsorption and cell proliferation on KNN ceramics surfaces was investigated. KNN ceramics with piezoelectric constant of ~93 pC/N and relative density of ~93% were fabricated. The adsorption of protein on the positive surfaces (Ps) and negative surfaces (Ns) of KNN ceramics with piezoelectric constant of ~93 pC/N showed greater protein adsorption capacity than that on non-polarized surfaces (NPs). Biocompatibility of KNN ceramics was verified through cell culturing and live/dead cell staining of MC3T3. The cells experiment showed enhanced cell growth on the positive surfaces (Ps) and negative surfaces (Ns) compared to non-polarized surfaces (NPs). These results revealed that KNN ceramics had great potential to be used to understand the effect of surface potential on cells processes and would benefit future research in designing piezoelectric materials for tissue regeneration.
Li, Yanfei; Tian, Yun
2018-01-01
The development of network technology and the popularization of image capturing devices have led to a rapid increase in the number of digital images available, and it is becoming increasingly difficult to identify a desired image from among the massive number of possible images. Images usually contain rich semantic information, and people usually understand images at a high semantic level. Therefore, achieving the ability to use advanced technology to identify the emotional semantics contained in images to enable emotional semantic image classification remains an urgent issue in various industries. To this end, this study proposes an improved OCC emotion model that integrates personality and mood factors for emotional modelling to describe the emotional semantic information contained in an image. The proposed classification system integrates the k-Nearest Neighbour (KNN) algorithm with the Support Vector Machine (SVM) algorithm. The MapReduce parallel programming model was used to adapt the KNN-SVM algorithm for parallel implementation in the Hadoop cluster environment, thereby achieving emotional semantic understanding for the classification of a massive collection of images. For training and testing, 70,000 scene images were randomly selected from the SUN Database. The experimental results indicate that users with different personalities show overall consistency in their emotional understanding of the same image. For a training sample size of 50,000, the classification accuracies for different emotional categories targeted at users with different personalities were approximately 95%, and the training time was only 1/5 of that required for the corresponding algorithm with a single-node architecture. Furthermore, the speedup of the system also showed a linearly increasing tendency. Thus, the experiments achieved a good classification effect and can lay a foundation for classification in terms of additional types of emotional image semantics, thereby demonstrating the practical significance of the proposed model. PMID:29320579
Cao, Jianfang; Li, Yanfei; Tian, Yun
2018-01-01
The development of network technology and the popularization of image capturing devices have led to a rapid increase in the number of digital images available, and it is becoming increasingly difficult to identify a desired image from among the massive number of possible images. Images usually contain rich semantic information, and people usually understand images at a high semantic level. Therefore, achieving the ability to use advanced technology to identify the emotional semantics contained in images to enable emotional semantic image classification remains an urgent issue in various industries. To this end, this study proposes an improved OCC emotion model that integrates personality and mood factors for emotional modelling to describe the emotional semantic information contained in an image. The proposed classification system integrates the k-Nearest Neighbour (KNN) algorithm with the Support Vector Machine (SVM) algorithm. The MapReduce parallel programming model was used to adapt the KNN-SVM algorithm for parallel implementation in the Hadoop cluster environment, thereby achieving emotional semantic understanding for the classification of a massive collection of images. For training and testing, 70,000 scene images were randomly selected from the SUN Database. The experimental results indicate that users with different personalities show overall consistency in their emotional understanding of the same image. For a training sample size of 50,000, the classification accuracies for different emotional categories targeted at users with different personalities were approximately 95%, and the training time was only 1/5 of that required for the corresponding algorithm with a single-node architecture. Furthermore, the speedup of the system also showed a linearly increasing tendency. Thus, the experiments achieved a good classification effect and can lay a foundation for classification in terms of additional types of emotional image semantics, thereby demonstrating the practical significance of the proposed model.
Gunavathi, Chellamuthu; Premalatha, Kandasamy
2014-01-01
Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.
Sintering of Lead-Free Piezoelectric Sodium Potassium Niobate Ceramics
Malič, Barbara; Koruza, Jurij; Hreščak, Jitka; Bernard, Janez; Wang, Ke; Fisher, John G.; Benčan, Andreja
2015-01-01
The potassium sodium niobate, K0.5Na0.5NbO3, solid solution (KNN) is considered as one of the most promising, environment-friendly, lead-free candidates to replace highly efficient, lead-based piezoelectrics. Since the first reports of KNN, it has been recognized that obtaining phase-pure materials with a high density and a uniform, fine-grained microstructure is a major challenge. For this reason the present paper reviews the different methods for consolidating KNN ceramics. The difficulties involved in the solid-state synthesis of KNN powder, i.e., obtaining phase purity, the stoichiometry of the perovskite phase, and the chemical homogeneity, are discussed. The solid-state sintering of stoichiometric KNN is characterized by poor densification and an extremely narrow sintering-temperature range, which is close to the solidus temperature. A study of the initial sintering stage revealed that coarsening of the microstructure without densification contributes to a reduction of the driving force for sintering. The influences of the (K + Na)/Nb molar ratio, the presence of a liquid phase, chemical modifications (doping, complex solid solutions) and different atmospheres (i.e., defect chemistry) on the sintering are discussed. Special sintering techniques, such as pressure-assisted sintering and spark-plasma sintering, can be effective methods for enhancing the density of KNN ceramics. The sintering behavior of KNN is compared to that of a representative piezoelectric lead zirconate titanate (PZT). PMID:28793702
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Lingyan, E-mail: l.y.wang@mail.xjtu.edu.cn, E-mail: wren@mail.xjtu.edu.cn; Ren, Wei, E-mail: l.y.wang@mail.xjtu.edu.cn, E-mail: wren@mail.xjtu.edu.cn; Shi, Peng
Lead-free ferroelectric un-doped and doped K{sub 0.5}Na{sub 0.5}NbO{sub 3} (KNN) films with different amounts of manganese (Mn) were prepared by a chemical solution deposition method. The thicknesses of all films are about 1.6 μm. Their phase, microstructure, leakage current behavior, and electrical properties were investigated. With increasing the amounts of Mn, the crystallinity became worse. Fortunately, the electrical properties were improved due to the decreased leakage current density after Mn-doping. The study on leakage behaviors shows that the dominant conduction mechanism at low electric field in the un-doped KNN film is ohmic mode and that at high electric field is space-charge-limitedmore » and Pool-Frenkel emission. After Mn doping, the dominant conduction mechanism at high electric field of KNN films changed single space-charge-limited. However, the introduction of higher amount of Mn into the KNN film would lead to a changed conduction mechanism from space-charge-limited to ohmic mode. Consequently, there exists an optimal amount of Mn doping of 2.0 mol. %. The 2.0 mol. % Mn doped KNN film shows the lowest leakage current density and the best electrical properties. With the secondary ion mass spectroscopies and x-ray photoelectron spectroscopy analyses, the homogeneous distribution in the KNN films and entrance of Mn element in the lattice of KNN perovskite structure were also confirmed.« less
Integration of multimodal RNA-seq data for prediction of kidney cancer survival
Schwartzi, Matt; Parkl, Martin; Phanl, John H.; Wang., May D.
2016-01-01
Kidney cancer is of prominent concern in modern medicine. Predicting patient survival is critical to patient awareness and developing a proper treatment regimens. Previous prediction models built upon molecular feature analysis are limited to just gene expression data. In this study we investigate the difference in predicting five year survival between unimodal and multimodal analysis of RNA-seq data from gene, exon, junction, and isoform modalities. Our preliminary findings report higher predictive accuracy-as measured by area under the ROC curve (AUC)-for multimodal learning when compared to unimodal learning with both support vector machine (SVM) and k-nearest neighbor (KNN) methods. The results of this study justify further research on the use of multimodal RNA-seq data to predict survival for other cancer types using a larger sample size and additional machine learning methods. PMID:27532026
Local Subspace Classifier with Transform-Invariance for Image Classification
NASA Astrophysics Data System (ADS)
Hotta, Seiji
A family of linear subspace classifiers called local subspace classifier (LSC) outperforms the k-nearest neighbor rule (kNN) and conventional subspace classifiers in handwritten digit classification. However, LSC suffers very high sensitivity to image transformations because it uses projection and the Euclidean distances for classification. In this paper, I present a combination of a local subspace classifier (LSC) and a tangent distance (TD) for improving accuracy of handwritten digit recognition. In this classification rule, we can deal with transform-invariance easily because we are able to use tangent vectors for approximation of transformations. However, we cannot use tangent vectors in other type of images such as color images. Hence, kernel LSC (KLSC) is proposed for incorporating transform-invariance into LSC via kernel mapping. The performance of the proposed methods is verified with the experiments on handwritten digit and color image classification.
Fuzzy Temporal Logic Based Railway Passenger Flow Forecast Model
Dou, Fei; Jia, Limin; Wang, Li; Xu, Jie; Huang, Yakun
2014-01-01
Passenger flow forecast is of essential importance to the organization of railway transportation and is one of the most important basics for the decision-making on transportation pattern and train operation planning. Passenger flow of high-speed railway features the quasi-periodic variations in a short time and complex nonlinear fluctuation because of existence of many influencing factors. In this study, a fuzzy temporal logic based passenger flow forecast model (FTLPFFM) is presented based on fuzzy logic relationship recognition techniques that predicts the short-term passenger flow for high-speed railway, and the forecast accuracy is also significantly improved. An applied case that uses the real-world data illustrates the precision and accuracy of FTLPFFM. For this applied case, the proposed model performs better than the k-nearest neighbor (KNN) and autoregressive integrated moving average (ARIMA) models. PMID:25431586
A comparison of 12 algorithms for matching on the propensity score.
Austin, Peter C
2014-03-15
Propensity-score matching is increasingly being used to reduce the confounding that can occur in observational studies examining the effects of treatments or interventions on outcomes. We used Monte Carlo simulations to examine the following algorithms for forming matched pairs of treated and untreated subjects: optimal matching, greedy nearest neighbor matching without replacement, and greedy nearest neighbor matching without replacement within specified caliper widths. For each of the latter two algorithms, we examined four different sub-algorithms defined by the order in which treated subjects were selected for matching to an untreated subject: lowest to highest propensity score, highest to lowest propensity score, best match first, and random order. We also examined matching with replacement. We found that (i) nearest neighbor matching induced the same balance in baseline covariates as did optimal matching; (ii) when at least some of the covariates were continuous, caliper matching tended to induce balance on baseline covariates that was at least as good as the other algorithms; (iii) caliper matching tended to result in estimates of treatment effect with less bias compared with optimal and nearest neighbor matching; (iv) optimal and nearest neighbor matching resulted in estimates of treatment effect with negligibly less variability than did caliper matching; (v) caliper matching had amongst the best performance when assessed using mean squared error; (vi) the order in which treated subjects were selected for matching had at most a modest effect on estimation; and (vii) matching with replacement did not have superior performance compared with caliper matching without replacement. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.
A comparison of 12 algorithms for matching on the propensity score
Austin, Peter C
2014-01-01
Propensity-score matching is increasingly being used to reduce the confounding that can occur in observational studies examining the effects of treatments or interventions on outcomes. We used Monte Carlo simulations to examine the following algorithms for forming matched pairs of treated and untreated subjects: optimal matching, greedy nearest neighbor matching without replacement, and greedy nearest neighbor matching without replacement within specified caliper widths. For each of the latter two algorithms, we examined four different sub-algorithms defined by the order in which treated subjects were selected for matching to an untreated subject: lowest to highest propensity score, highest to lowest propensity score, best match first, and random order. We also examined matching with replacement. We found that (i) nearest neighbor matching induced the same balance in baseline covariates as did optimal matching; (ii) when at least some of the covariates were continuous, caliper matching tended to induce balance on baseline covariates that was at least as good as the other algorithms; (iii) caliper matching tended to result in estimates of treatment effect with less bias compared with optimal and nearest neighbor matching; (iv) optimal and nearest neighbor matching resulted in estimates of treatment effect with negligibly less variability than did caliper matching; (v) caliper matching had amongst the best performance when assessed using mean squared error; (vi) the order in which treated subjects were selected for matching had at most a modest effect on estimation; and (vii) matching with replacement did not have superior performance compared with caliper matching without replacement. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24123228
CD-HPF: New habitability score via data analytic modeling
NASA Astrophysics Data System (ADS)
Bora, K.; Saha, S.; Agrawal, S.; Safonova, M.; Routh, S.; Narasimhamurthy, A.
2016-10-01
The search for life on the planets outside the Solar System can be broadly classified into the following: looking for Earth-like conditions or the planets similar to the Earth (Earth similarity), and looking for the possibility of life in a form known or unknown to us (habitability). The two frequently used indices, Earth Similarity Index (ESI) and Planetary Habitability Index (PHI), describe heuristic methods to score habitability in the efforts to categorize different exoplanets (or exomoons). ESI, in particular, considers Earth as the reference frame for habitability, and is a quick screening tool to categorize and measure physical similarity of any planetary body with the Earth. The PHI assesses the potential habitability of any given planet, and is based on the essential requirements of known life: presence of a stable and protected substrate, energy, appropriate chemistry and a liquid medium. We propose here a different metric, a Cobb-Douglas Habitability Score (CDHS), based on Cobb-Douglas habitability production function (CD-HPF), which computes the habitability score by using measured and estimated planetary input parameters. As an initial set, we used radius, density, escape velocity and surface temperature of a planet. The values of the input parameters are normalized to the Earth Units (EU). The proposed metric, with exponents accounting for metric elasticity, is endowed with analytical properties that ensure global optima, and scales up to accommodate finitely many input parameters. The model is elastic, and, as we discovered, the standard PHI turns out to be a special case of the CDHS. Computed CDHS scores are fed to K-NN (K-Nearest Neighbor) classification algorithm with probabilistic herding that facilitates the assignment of exoplanets to appropriate classes via supervised feature learning methods, producing granular clusters of habitability. The proposed work describes a decision-theoretical model using the power of convex optimization and algorithmic machine learning.
Completing the Results of the 2013 Boston Marathon
Hammerling, Dorit; Cefalu, Matthew; Cisewski, Jessi; Dominici, Francesca; Parmigiani, Giovanni; Paulson, Charles; Smith, Richard L.
2014-01-01
The 2013 Boston marathon was disrupted by two bombs placed near the finish line. The bombs resulted in three deaths and several hundred injuries. Of lesser concern, in the immediate aftermath, was the fact that nearly 6,000 runners failed to finish the race. We were approached by the marathon's organizers, the Boston Athletic Association (BAA), and asked to recommend a procedure for projecting finish times for the runners who could not complete the race. With assistance from the BAA, we created a dataset consisting of all the runners in the 2013 race who reached the halfway point but failed to finish, as well as all runners from the 2010 and 2011 Boston marathons. The data consist of split times from each of the 5 km sections of the course, as well as the final 2.2 km (from 40 km to the finish). The statistical objective is to predict the missing split times for the runners who failed to finish in 2013. We set this problem in the context of the matrix completion problem, examples of which include imputing missing data in DNA microarray experiments, and the Netflix prize problem. We propose five prediction methods and create a validation dataset to measure their performance by mean squared error and other measures. The best method used local regression based on a K-nearest-neighbors algorithm (KNN method), though several other methods produced results of similar quality. We show how the results were used to create projected times for the 2013 runners and discuss potential for future application of the same methodology. We present the whole project as an example of reproducible research, in that we are able to make the full data and all the algorithms we have used publicly available, which may facilitate future research extending the methods or proposing completely different approaches. PMID:24727904
Completing the results of the 2013 Boston marathon.
Hammerling, Dorit; Cefalu, Matthew; Cisewski, Jessi; Dominici, Francesca; Parmigiani, Giovanni; Paulson, Charles; Smith, Richard L
2014-01-01
The 2013 Boston marathon was disrupted by two bombs placed near the finish line. The bombs resulted in three deaths and several hundred injuries. Of lesser concern, in the immediate aftermath, was the fact that nearly 6,000 runners failed to finish the race. We were approached by the marathon's organizers, the Boston Athletic Association (BAA), and asked to recommend a procedure for projecting finish times for the runners who could not complete the race. With assistance from the BAA, we created a dataset consisting of all the runners in the 2013 race who reached the halfway point but failed to finish, as well as all runners from the 2010 and 2011 Boston marathons. The data consist of split times from each of the 5 km sections of the course, as well as the final 2.2 km (from 40 km to the finish). The statistical objective is to predict the missing split times for the runners who failed to finish in 2013. We set this problem in the context of the matrix completion problem, examples of which include imputing missing data in DNA microarray experiments, and the Netflix prize problem. We propose five prediction methods and create a validation dataset to measure their performance by mean squared error and other measures. The best method used local regression based on a K-nearest-neighbors algorithm (KNN method), though several other methods produced results of similar quality. We show how the results were used to create projected times for the 2013 runners and discuss potential for future application of the same methodology. We present the whole project as an example of reproducible research, in that we are able to make the full data and all the algorithms we have used publicly available, which may facilitate future research extending the methods or proposing completely different approaches.
Chen, Zhao; Cao, Yanfeng; He, Shuaibing; Qiao, Yanjiang
2018-01-01
Action (" gongxiao " in Chinese) of traditional Chinese medicine (TCM) is the high recapitulation for therapeutic and health-preserving effects under the guidance of TCM theory. TCM-defined herbal properties (" yaoxing " in Chinese) had been used in this research. TCM herbal property (TCM-HP) is the high generalization and summary for actions, both of which come from long-term effective clinical practice in two thousands of years in China. However, the specific relationship between TCM-HP and action of TCM is complex and unclear from a scientific perspective. The research about this is conducive to expound the connotation of TCM-HP theory and is of important significance for the development of the TCM-HP theory. One hundred and thirty-three herbs including 88 heat-clearing herbs (HCHs) and 45 blood-activating stasis-resolving herbs (BAHRHs) were collected from reputable TCM literatures, and their corresponding TCM-HPs/actions information were collected from Chinese pharmacopoeia (2015 edition). The Kennard-Stone (K-S) algorithm was used to split 133 herbs into 100 calibration samples and 33 validation samples. Then, machine learning methods including supported vector machine (SVM), k-nearest neighbor (kNN) and deep learning methods including deep belief network (DBN), convolutional neutral network (CNN) were adopted to develop action classification models based on TCM-HP theory, respectively. In order to ensure robustness, these four classification methods were evaluated by using the method of tenfold cross validation and 20 external validation samples for prediction. As results, 72.7-100% of 33 validation samples including 17 HCHs and 16 BASRHs were correctly predicted by these four types of methods. Both of the DBN and CNN methods gave out the best results and their sensitivity, specificity, precision, accuracy were all 100.00%. Especially, the predicted results of external validation set showed that the performance of deep learning methods (DBN, CNN) were better than traditional machine learning methods (kNN, SVM) in terms of their sensitivity, specificity, precision, accuracy. Moreover, the distribution patterns of TCM-HPs of HCHs and BASRHs were also analyzed to detect the featured TCM-HPs of these two types of herbs. The result showed that the featured TCM-HPs of HCHs were cold, bitter, liver and stomach meridians entered, while those of BASRHs were warm, bitter and pungent, liver meridian entered. The performance on validation set and external validation set of deep learning methods (DBN, CNN) were better than machine learning models (kNN, SVM) in sensitivity, specificity, precision, accuracy when predicting the actions of heat-clearing and blood-activating stasis-resolving based on TCM-HP theory. The deep learning classification methods owned better generalization ability and accuracy when predicting the actions of heat-clearing and blood-activating stasis-resolving based on TCM-HP theory. Besides, the methods of deep learning would help us to improve our understanding about the relationship between herbal property and action, as well as to enrich and develop the theory of TCM-HP scientifically.
A neighboring structure reconstructed matching algorithm based on LARK features
NASA Astrophysics Data System (ADS)
Xue, Taobei; Han, Jing; Zhang, Yi; Bai, Lianfa
2015-11-01
Aimed at the low contrast ratio and high noise of infrared images, and the randomness and ambient occlusion of its objects, this paper presents a neighboring structure reconstructed matching (NSRM) algorithm based on LARK features. The neighboring structure relationships of local window are considered based on a non-negative linear reconstruction method to build a neighboring structure relationship matrix. Then the LARK feature matrix and the NSRM matrix are processed separately to get two different similarity images. By fusing and analyzing the two similarity images, those infrared objects are detected and marked by the non-maximum suppression. The NSRM approach is extended to detect infrared objects with incompact structure. High performance is demonstrated on infrared body set, indicating a lower false detecting rate than conventional methods in complex natural scenes.
A Feature-based Approach to Big Data Analysis of Medical Images
Toews, Matthew; Wachinger, Christian; Estepar, Raul San Jose; Wells, William M.
2015-01-01
This paper proposes an inference method well-suited to large sets of medical images. The method is based upon a framework where distinctive 3D scale-invariant features are indexed efficiently to identify approximate nearest-neighbor (NN) feature matches in O(log N) computational complexity in the number of images N. It thus scales well to large data sets, in contrast to methods based on pair-wise image registration or feature matching requiring O(N) complexity. Our theoretical contribution is a density estimator based on a generative model that generalizes kernel density estimation and K-nearest neighbor (KNN) methods. The estimator can be used for on-the-fly queries, without requiring explicit parametric models or an off-line training phase. The method is validated on a large multi-site data set of 95,000,000 features extracted from 19,000 lung CT scans. Subject-level classification identifies all images of the same subjects across the entire data set despite deformation due to breathing state, including unintentional duplicate scans. State-of-the-art performance is achieved in predicting chronic pulmonary obstructive disorder (COPD) severity across the 5-category GOLD clinical rating, with an accuracy of 89% if both exact and one-off predictions are considered correct. PMID:26221685
A Feature-Based Approach to Big Data Analysis of Medical Images.
Toews, Matthew; Wachinger, Christian; Estepar, Raul San Jose; Wells, William M
2015-01-01
This paper proposes an inference method well-suited to large sets of medical images. The method is based upon a framework where distinctive 3D scale-invariant features are indexed efficiently to identify approximate nearest-neighbor (NN) feature matches-in O (log N) computational complexity in the number of images N. It thus scales well to large data sets, in contrast to methods based on pair-wise image registration or feature matching requiring O(N) complexity. Our theoretical contribution is a density estimator based on a generative model that generalizes kernel density estimation and K-nearest neighbor (KNN) methods.. The estimator can be used for on-the-fly queries, without requiring explicit parametric models or an off-line training phase. The method is validated on a large multi-site data set of 95,000,000 features extracted from 19,000 lung CT scans. Subject-level classification identifies all images of the same subjects across the entire data set despite deformation due to breathing state, including unintentional duplicate scans. State-of-the-art performance is achieved in predicting chronic pulmonary obstructive disorder (COPD) severity across the 5-category GOLD clinical rating, with an accuracy of 89% if both exact and one-off predictions are considered correct.
Secure and Efficient k-NN Queries⋆
Asif, Hafiz; Vaidya, Jaideep; Shafiq, Basit; Adam, Nabil
2017-01-01
Given the morass of available data, ranking and best match queries are often used to find records of interest. As such, k-NN queries, which give the k closest matches to a query point, are of particular interest, and have many applications. We study this problem in the context of the financial sector, wherein an investment portfolio database is queried for matching portfolios. Given the sensitivity of the information involved, our key contribution is to develop a secure k-NN computation protocol that can enable the computation k-NN queries in a distributed multi-party environment while taking domain semantics into account. The experimental results show that the proposed protocols are extremely efficient. PMID:29218333
About neighborhood counting measure metric and minimum risk metric.
Argentini, Andrea; Blanzieri, Enrico
2010-04-01
In a 2006 TPAMI paper, Wang proposed the Neighborhood Counting Measure, a similarity measure for the k-NN algorithm. In his paper, Wang mentioned the Minimum Risk Metric (MRM), an early distance measure based on the minimization of the risk of misclassification. Wang did not compare NCM to MRM because of its allegedly excessive computational load. In this comment paper, we complete the comparison that was missing in Wang's paper and, from our empirical evaluation, we show that MRM outperforms NCM and that its running time is not prohibitive as Wang suggested.
NASA Astrophysics Data System (ADS)
Qi, Yong; Lei, Kai; Zhang, Lizeqing; Xing, Ximing; Gou, Wenyue
2018-06-01
This paper introduced the development of a self-serving medical data assisted diagnosis software of cervical cancer on the basis of artificial neural network (SVN, FNN, KNN). The system is developed based on the idea of self-service platform, supported by the application and innovation of neural network algorithm in medical data identification. Furthermore, it combined the advanced methods in various fields to effectively solve the complicated and inaccurate problem of cervical canceration data in the traditional manual treatment.
Implementation of a Localization System for Sensor Networks
2006-05-18
its N-point DFT is mathematically formulated as X[k] = N−1∑ n=0 x[n] W nkN , k = 0, 1, . . . , N − 1 (7.1) W knN = e −j(2π/N)kn (7.2) There are two...distributed ad-hoc wireless sensor networks. In Int. Conf. on Acoustics, Speech , and Signal Proc. (ICASSP), pages 2037 – 2040, Salt Lake City, UT. [18] J...Stability of recursive qrd-ls algorithms using finite- precision systolic array implementation. IEEE Trans. on Acoustics, Speech , and Signal Proc., 37(5
Finite Element Modeling Used to Study Stress Distribution on the Foot
NASA Technical Reports Server (NTRS)
Morales, Nelson; Davis, Brian; Tajaddini, Azita
2004-01-01
A method to study the stress distribution inside the forefoot during walking was developed at the Cleveland Clinic Foundation by a researcher from the NASA Glenn Research Center. In this method, a semiautomated process was outlined to create a three-dimensional, patient-specific, finite element model (FEM) of the forefoot using magnetic resonance images (MRI). The images were processed in Matlab using the k-nearest neighbor (k-NN) classification algorithm and Sobel edge detection to separate the different tissue types: bone, skin, fat, and muscle. This information was used to create curves and surfaces that were exported to an FEM preprocessor known as Truegrid. In Truegrid, eight-noded or brick elements were created by using surface mapping. The FEM was processed and postprocessed in Abaqus. Material properties of the models were obtained from past experiments such as fat pad confined compression, skin axial and biaxial tests, muscle in vivo compressive tests, and reference literature (bone properties). Nonlinear (hyperelastic) material models were used for the skin (epidermis and dermis), fat, and muscles; and a linear elastic model was used for the bones. Muscle activation during walking yielded uncertainties in the muscle material model since contracted muscles are stiffer than relaxed muscles. These uncertainties were resolved by performing a sensitivity analysis of the muscle material properties. The original properties were multiplied by arbitrary factors of 2, 3, 0.5, and 0.33. The strain and stress distributions, as well as the locations of peak values, were similar in all cases. The peak contact pressure P obtained for each case varied with respect to the applied factor f as follows:
DOE Office of Scientific and Technical Information (OSTI.GOV)
Qin, SB; Cady, ST; Dominguez-Garcia, AD
This paper presents the theory and implementation of a distributed algorithm for controlling differential power processing converters in photovoltaic (PV) applications. This distributed algorithm achieves true maximum power point tracking of series-connected PV submodules by relying only on local voltage measurements and neighbor-to-neighbor communication between the differential power converters. Compared to previous solutions, the proposed algorithm achieves reduced number of perturbations at each step and potentially faster tracking without adding extra hardware; all these features make this algorithm well-suited for long submodule strings. The formulation of the algorithm, discussion of its properties, as well as three case studies are presented.more » The performance of the distributed tracking algorithm has been verified via experiments, which yielded quantifiable improvements over other techniques that have been implemented in practice. Both simulations and hardware experiments have confirmed the effectiveness of the proposed distributed algorithm.« less
Özdemir, Ahmet Turan
2016-01-01
Wearable devices for fall detection have received attention in academia and industry, because falls are very dangerous, especially for elderly people, and if immediate aid is not provided, it may result in death. However, some predictive devices are not easily worn by elderly people. In this work, a huge dataset, including 2520 tests, is employed to determine the best sensor placement location on the body and to reduce the number of sensor nodes for device ergonomics. During the tests, the volunteer’s movements are recorded with six groups of sensors each with a triaxial (accelerometer, gyroscope and magnetometer) sensor, which is placed tightly on different parts of the body with special straps: head, chest, waist, right-wrist, right-thigh and right-ankle. The accuracy of individual sensor groups with their location is investigated with six machine learning techniques, namely the k-nearest neighbor (k-NN) classifier, Bayesian decision making (BDM), support vector machines (SVM), least squares method (LSM), dynamic time warping (DTW) and artificial neural networks (ANNs). Each technique is applied to single, double, triple, quadruple, quintuple and sextuple sensor configurations. These configurations create 63 different combinations, and for six machine learning techniques, a total of 63 × 6 = 378 combinations is investigated. As a result, the waist region is found to be the most suitable location for sensor placement on the body with 99.96% fall detection sensitivity by using the k-NN classifier, whereas the best sensitivity achieved by the wrist sensor is 97.37%, despite this location being highly preferred for today’s wearable applications. PMID:27463719
Chen, Zewei; Zhang, Xin; Zhang, Zhuoyong
2016-12-01
Timely risk assessment of chronic kidney disease (CKD) and proper community-based CKD monitoring are important to prevent patients with potential risk from further kidney injuries. As many symptoms are associated with the progressive development of CKD, evaluating risk of CKD through a set of clinical data of symptoms coupled with multivariate models can be considered as an available method for prevention of CKD and would be useful for community-based CKD monitoring. Three common used multivariate models, i.e., K-nearest neighbor (KNN), support vector machine (SVM), and soft independent modeling of class analogy (SIMCA), were used to evaluate risk of 386 patients based on a series of clinical data taken from UCI machine learning repository. Different types of composite data, in which proportional disturbances were added to simulate measurement deviations caused by environment and instrument noises, were also utilized to evaluate the feasibility and robustness of these models in risk assessment of CKD. For the original data set, three mentioned multivariate models can differentiate patients with CKD and non-CKD with the overall accuracies over 93 %. KNN and SVM have better performances than SIMCA has in this study. For the composite data set, SVM model has the best ability to tolerate noise disturbance and thus are more robust than the other two models. Using clinical data set on symptoms coupled with multivariate models has been proved to be feasible approach for assessment of patient with potential CKD risk. SVM model can be used as useful and robust tool in this study.
Yu, Huan; Caldwell, Curtis; Mah, Katherine; Mozeg, Daniel
2009-03-01
Coregistered fluoro-deoxy-glucose (FDG) positron emission tomography/computed tomography (PET/CT) has shown potential to improve the accuracy of radiation targeting of head and neck cancer (HNC) when compared to the use of CT simulation alone. The objective of this study was to identify textural features useful in distinguishing tumor from normal tissue in head and neck via quantitative texture analysis of coregistered 18F-FDG PET and CT images. Abnormal and typical normal tissues were manually segmented from PET/CT images of 20 patients with HNC and 20 patients with lung cancer. Texture features including some derived from spatial grey-level dependence matrices (SGLDM) and neighborhood gray-tone-difference matrices (NGTDM) were selected for characterization of these segmented regions of interest (ROIs). Both K nearest neighbors (KNNs) and decision tree (DT)-based KNN classifiers were employed to discriminate images of abnormal and normal tissues. The area under the curve (AZ) of receiver operating characteristics (ROC) was used to evaluate the discrimination performance of features in comparison to an expert observer. The leave-one-out and bootstrap techniques were used to validate the results. The AZ of DT-based KNN classifier was 0.95. Sensitivity and specificity for normal and abnormal tissue classification were 89% and 99%, respectively. In summary, NGTDM features such as PET Coarseness, PET Contrast, and CT Coarseness extracted from FDG PET/CT images provided good discrimination performance. The clinical use of such features may lead to improvement in the accuracy of radiation targeting of HNC.
NASA Astrophysics Data System (ADS)
Chen, Xue; Li, Xiaohui; Yu, Xin; Chen, Deying; Liu, Aichun
2018-01-01
Diagnosis of malignancies is a challenging clinical issue. In this work, we present quick and robust diagnosis and discrimination of lymphoma and multiple myeloma (MM) using laser-induced breakdown spectroscopy (LIBS) conducted on human serum samples, in combination with chemometric methods. The serum samples collected from lymphoma and MM cancer patients and healthy controls were deposited on filter papers and ablated with a pulsed 1064 nm Nd:YAG laser. 24 atomic lines of Ca, Na, K, H, O, and N were selected for malignancy diagnosis. Principal component analysis (PCA), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and k nearest neighbors (kNN) classification were applied to build the malignancy diagnosis and discrimination models. The performances of the models were evaluated using 10-fold cross validation. The discrimination accuracy, confusion matrix and receiver operating characteristic (ROC) curves were obtained. The values of area under the ROC curve (AUC), sensitivity and specificity at the cut-points were determined. The kNN model exhibits the best performances with overall discrimination accuracy of 96.0%. Distinct discrimination between malignancies and healthy controls has been achieved with AUC, sensitivity and specificity for healthy controls all approaching 1. For lymphoma, the best discrimination performance values are AUC = 0.990, sensitivity = 0.970 and specificity = 0.956. For MM, the corresponding values are AUC = 0.986, sensitivity = 0.892 and specificity = 0.994. The results show that the serum-LIBS technique can serve as a quick, less invasive and robust method for diagnosis and discrimination of human malignancies.
Estimating local scaling properties for the classification of interstitial lung disease patterns
NASA Astrophysics Data System (ADS)
Huber, Markus B.; Nagarajan, Mahesh B.; Leinsinger, Gerda; Ray, Lawrence A.; Wismueller, Axel
2011-03-01
Local scaling properties of texture regions were compared in their ability to classify morphological patterns known as 'honeycombing' that are considered indicative for the presence of fibrotic interstitial lung diseases in high-resolution computed tomography (HRCT) images. For 14 patients with known occurrence of honeycombing, a stack of 70 axial, lung kernel reconstructed images were acquired from HRCT chest exams. 241 regions of interest of both healthy and pathological (89) lung tissue were identified by an experienced radiologist. Texture features were extracted using six properties calculated from gray-level co-occurrence matrices (GLCM), Minkowski Dimensions (MDs), and the estimation of local scaling properties with Scaling Index Method (SIM). A k-nearest-neighbor (k-NN) classifier and a Multilayer Radial Basis Functions Network (RBFN) were optimized in a 10-fold cross-validation for each texture vector, and the classification accuracy was calculated on independent test sets as a quantitative measure of automated tissue characterization. A Wilcoxon signed-rank test was used to compare two accuracy distributions including the Bonferroni correction. The best classification results were obtained by the set of SIM features, which performed significantly better than all the standard GLCM and MD features (p < 0.005) for both classifiers with the highest accuracy (94.1%, 93.7%; for the k-NN and RBFN classifier, respectively). The best standard texture features were the GLCM features 'homogeneity' (91.8%, 87.2%) and 'absolute value' (90.2%, 88.5%). The results indicate that advanced texture features using local scaling properties can provide superior classification performance in computer-assisted diagnosis of interstitial lung diseases when compared to standard texture analysis methods.
Özdemir, Ahmet Turan
2016-07-25
Wearable devices for fall detection have received attention in academia and industry, because falls are very dangerous, especially for elderly people, and if immediate aid is not provided, it may result in death. However, some predictive devices are not easily worn by elderly people. In this work, a huge dataset, including 2520 tests, is employed to determine the best sensor placement location on the body and to reduce the number of sensor nodes for device ergonomics. During the tests, the volunteer's movements are recorded with six groups of sensors each with a triaxial (accelerometer, gyroscope and magnetometer) sensor, which is placed tightly on different parts of the body with special straps: head, chest, waist, right-wrist, right-thigh and right-ankle. The accuracy of individual sensor groups with their location is investigated with six machine learning techniques, namely the k-nearest neighbor (k-NN) classifier, Bayesian decision making (BDM), support vector machines (SVM), least squares method (LSM), dynamic time warping (DTW) and artificial neural networks (ANNs). Each technique is applied to single, double, triple, quadruple, quintuple and sextuple sensor configurations. These configurations create 63 different combinations, and for six machine learning techniques, a total of 63 × 6 = 378 combinations is investigated. As a result, the waist region is found to be the most suitable location for sensor placement on the body with 99.96% fall detection sensitivity by using the k-NN classifier, whereas the best sensitivity achieved by the wrist sensor is 97.37%, despite this location being highly preferred for today's wearable applications.
Development of a Lead-free Piezoelectric (K,Na)NbO3 Thin Film Deposited on Nickel-based Electrodes
NASA Astrophysics Data System (ADS)
Bani Milhim, Alaeddin
It is desirable to replace noble metals used as electrode materials for piezoelectric thin film with base metals. This will reduce the piezoelectric thin film fabrication cost. A nickel?based layer in conjunction with other protective layers is proposed as a bottom electrode for lead-free piezoelectric KNN thin film. The obtained results do not indicate the oxidation of the nickel?based bottom electrode after the deposition of KNN at 600 °C for 10 hours in the presence of oxygen and/or after annealing the sample at 400 °C for an hour in air. The fabricated KNN thin film was fully characterized in this work. The effective piezoelectric coefficients d33 and d31 were estimated to be 37 pm/V and 17.2 pm/V, respectively, at 100 kV/cm. The piezoelectric properties of the fabricated KNN/Ni/Ti/SiO2/Si are affected by the crystal orientation of the KNN layer, which was preferentially oriented in the (110) direction. Optimization of the deposition parameters of the fabricated KNN/Ni/Ti/SiO2/Si film is expected to further enhance the piezoelectric properties. Two novel systems utilizing the developed KNN piezoelectric thin film are proposed and their performance simulated based on the achieved KNN thin film parameters. The first is a precision automated nanomanipulation system using an AFM as a sensor and piezo-actuated manipulators. Real-time feedback of the particle being manipulated can be achieved using the proposed system. The length of the manipulators needs to be at least 2 mm to be incorporated with a commercial AFM system. To fabricate the required manipulators, a three-step electrochemical etching technique was developed. Tungsten tips combining well-defined conical shape, a length as large as 2 mm, and sharpness with a radius of curvature of around 20 nm were fabricated using the proposed technique. By depositing the KNN thin film on the fabricated manipulator, nanomanipulators with out-of-plane actuation can be produced. Ultrasonic piezoelectric fan array, the second system, is proposed for GPU cooling applications. The developed KNN thin film is proposed as the piezo layer in the piezo fan structure. The novel solution can offer large air flow rate and low power consumption. Since the operating frequency is beyond the human audible frequencies, non?audible noise fans are expected by using the proposed ultrasonic piezo fan system. Moreover, fabrication of these ultrasonic piezo fans can be part of the GPU fabrication process itself.
Fall Detection Using Smartphone Audio Features.
Cheffena, Michael
2016-07-01
An automated fall detection system based on smartphone audio features is developed. The spectrogram, mel frequency cepstral coefficents (MFCCs), linear predictive coding (LPC), and matching pursuit (MP) features of different fall and no-fall sound events are extracted from experimental data. Based on the extracted audio features, four different machine learning classifiers: k-nearest neighbor classifier (k-NN), support vector machine (SVM), least squares method (LSM), and artificial neural network (ANN) are investigated for distinguishing between fall and no-fall events. For each audio feature, the performance of each classifier in terms of sensitivity, specificity, accuracy, and computational complexity is evaluated. The best performance is achieved using spectrogram features with ANN classifier with sensitivity, specificity, and accuracy all above 98%. The classifier also has acceptable computational requirement for training and testing. The system is applicable in home environments where the phone is placed in the vicinity of the user.
Batres-Mendoza, Patricia; Montoro-Sanjose, Carlos R; Guerra-Hernandez, Erick I; Almanza-Ojeda, Dora L; Rostro-Gonzalez, Horacio; Romero-Troncoso, Rene J; Ibarra-Manzano, Mario A
2016-03-05
Quaternions can be used as an alternative to model the fundamental patterns of electroencephalographic (EEG) signals in the time domain. Thus, this article presents a new quaternion-based technique known as quaternion-based signal analysis (QSA) to represent EEG signals obtained using a brain-computer interface (BCI) device to detect and interpret cognitive activity. This quaternion-based signal analysis technique can extract features to represent brain activity related to motor imagery accurately in various mental states. Experimental tests in which users where shown visual graphical cues related to left and right movements were used to collect BCI-recorded signals. These signals were then classified using decision trees (DT), support vector machine (SVM) and k-nearest neighbor (KNN) techniques. The quantitative analysis of the classifiers demonstrates that this technique can be used as an alternative in the EEG-signal modeling phase to identify mental states.
Batres-Mendoza, Patricia; Montoro-Sanjose, Carlos R.; Guerra-Hernandez, Erick I.; Almanza-Ojeda, Dora L.; Rostro-Gonzalez, Horacio; Romero-Troncoso, Rene J.; Ibarra-Manzano, Mario A.
2016-01-01
Quaternions can be used as an alternative to model the fundamental patterns of electroencephalographic (EEG) signals in the time domain. Thus, this article presents a new quaternion-based technique known as quaternion-based signal analysis (QSA) to represent EEG signals obtained using a brain-computer interface (BCI) device to detect and interpret cognitive activity. This quaternion-based signal analysis technique can extract features to represent brain activity related to motor imagery accurately in various mental states. Experimental tests in which users where shown visual graphical cues related to left and right movements were used to collect BCI-recorded signals. These signals were then classified using decision trees (DT), support vector machine (SVM) and k-nearest neighbor (KNN) techniques. The quantitative analysis of the classifiers demonstrates that this technique can be used as an alternative in the EEG-signal modeling phase to identify mental states. PMID:26959029
Kringel, D; Ultsch, A; Zimmermann, M; Jansen, J-P; Ilias, W; Freynhagen, R; Griessinger, N; Kopf, A; Stein, C; Doehring, A; Resch, E; Lötsch, J
2017-01-01
Next-generation sequencing (NGS) provides unrestricted access to the genome, but it produces ‘big data’ exceeding in amount and complexity the classical analytical approaches. We introduce a bioinformatics-based classifying biomarker that uses emergent properties in genetics to separate pain patients requiring extremely high opioid doses from controls. Following precisely calculated selection of the 34 most informative markers in the OPRM1, OPRK1, OPRD1 and SIGMAR1 genes, pattern of genotypes belonging to either patient group could be derived using a k-nearest neighbor (kNN) classifier that provided a diagnostic accuracy of 80.6±4%. This outperformed alternative classifiers such as reportedly functional opioid receptor gene variants or complex biomarkers obtained via multiple regression or decision tree analysis. The accumulation of several genetic variants with only minor functional influences may result in a qualitative consequence affecting complex phenotypes, pointing at emergent properties in genetics. PMID:27139154
Kringel, D; Ultsch, A; Zimmermann, M; Jansen, J-P; Ilias, W; Freynhagen, R; Griessinger, N; Kopf, A; Stein, C; Doehring, A; Resch, E; Lötsch, J
2017-10-01
Next-generation sequencing (NGS) provides unrestricted access to the genome, but it produces 'big data' exceeding in amount and complexity the classical analytical approaches. We introduce a bioinformatics-based classifying biomarker that uses emergent properties in genetics to separate pain patients requiring extremely high opioid doses from controls. Following precisely calculated selection of the 34 most informative markers in the OPRM1, OPRK1, OPRD1 and SIGMAR1 genes, pattern of genotypes belonging to either patient group could be derived using a k-nearest neighbor (kNN) classifier that provided a diagnostic accuracy of 80.6±4%. This outperformed alternative classifiers such as reportedly functional opioid receptor gene variants or complex biomarkers obtained via multiple regression or decision tree analysis. The accumulation of several genetic variants with only minor functional influences may result in a qualitative consequence affecting complex phenotypes, pointing at emergent properties in genetics.
E-Nose Vapor Identification Based on Dempster-Shafer Fusion of Multiple Classifiers
NASA Technical Reports Server (NTRS)
Li, Winston; Leung, Henry; Kwan, Chiman; Linnell, Bruce R.
2005-01-01
Electronic nose (e-nose) vapor identification is an efficient approach to monitor air contaminants in space stations and shuttles in order to ensure the health and safety of astronauts. Data preprocessing (measurement denoising and feature extraction) and pattern classification are important components of an e-nose system. In this paper, a wavelet-based denoising method is applied to filter the noisy sensor measurements. Transient-state features are then extracted from the denoised sensor measurements, and are used to train multiple classifiers such as multi-layer perceptions (MLP), support vector machines (SVM), k nearest neighbor (KNN), and Parzen classifier. The Dempster-Shafer (DS) technique is used at the end to fuse the results of the multiple classifiers to get the final classification. Experimental analysis based on real vapor data shows that the wavelet denoising method can remove both random noise and outliers successfully, and the classification rate can be improved by using classifier fusion.
NASA Technical Reports Server (NTRS)
Salu, Yehuda; Tilton, James
1993-01-01
The classification of multispectral image data obtained from satellites has become an important tool for generating ground cover maps. This study deals with the application of nonparametric pixel-by-pixel classification methods in the classification of pixels, based on their multispectral data. A new neural network, the Binary Diamond, is introduced, and its performance is compared with a nearest neighbor algorithm and a back-propagation network. The Binary Diamond is a multilayer, feed-forward neural network, which learns from examples in unsupervised, 'one-shot' mode. It recruits its neurons according to the actual training set, as it learns. The comparisons of the algorithms were done by using a realistic data base, consisting of approximately 90,000 Landsat 4 Thematic Mapper pixels. The Binary Diamond and the nearest neighbor performances were close, with some advantages to the Binary Diamond. The performance of the back-propagation network lagged behind. An efficient nearest neighbor algorithm, the binned nearest neighbor, is described. Ways for improving the performances, such as merging categories, and analyzing nonboundary pixels, are addressed and evaluated.
Greedy Gossip With Eavesdropping
NASA Astrophysics Data System (ADS)
Ustebay, Deniz; Oreshkin, Boris N.; Coates, Mark J.; Rabbat, Michael G.
2010-07-01
This paper presents greedy gossip with eavesdropping (GGE), a novel randomized gossip algorithm for distributed computation of the average consensus problem. In gossip algorithms, nodes in the network randomly communicate with their neighbors and exchange information iteratively. The algorithms are simple and decentralized, making them attractive for wireless network applications. In general, gossip algorithms are robust to unreliable wireless conditions and time varying network topologies. In this paper we introduce GGE and demonstrate that greedy updates lead to rapid convergence. We do not require nodes to have any location information. Instead, greedy updates are made possible by exploiting the broadcast nature of wireless communications. During the operation of GGE, when a node decides to gossip, instead of choosing one of its neighbors at random, it makes a greedy selection, choosing the node which has the value most different from its own. In order to make this selection, nodes need to know their neighbors' values. Therefore, we assume that all transmissions are wireless broadcasts and nodes keep track of their neighbors' values by eavesdropping on their communications. We show that the convergence of GGE is guaranteed for connected network topologies. We also study the rates of convergence and illustrate, through theoretical bounds and numerical simulations, that GGE consistently outperforms randomized gossip and performs comparably to geographic gossip on moderate-sized random geometric graph topologies.
A novel topology control approach to maintain the node degree in dynamic wireless sensor networks.
Huang, Yuanjiang; Martínez, José-Fernán; Díaz, Vicente Hernández; Sendra, Juana
2014-03-07
Topology control is an important technique to improve the connectivity and the reliability of Wireless Sensor Networks (WSNs) by means of adjusting the communication range of wireless sensor nodes. In this paper, a novel Fuzzy-logic Topology Control (FTC) is proposed to achieve any desired average node degree by adaptively changing communication range, thus improving the network connectivity, which is the main target of FTC. FTC is a fully localized control algorithm, and does not rely on location information of neighbors. Instead of designing membership functions and if-then rules for fuzzy-logic controller, FTC is constructed from the training data set to facilitate the design process. FTC is proved to be accurate, stable and has short settling time. In order to compare it with other representative localized algorithms (NONE, FLSS, k-Neighbor and LTRT), FTC is evaluated through extensive simulations. The simulation results show that: firstly, similar to k-Neighbor algorithm, FTC is the best to achieve the desired average node degree as node density varies; secondly, FTC is comparable to FLSS and k-Neighbor in terms of energy-efficiency, but is better than LTRT and NONE; thirdly, FTC has the lowest average maximum communication range than other algorithms, which indicates that the most energy-consuming node in the network consumes the lowest power.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mirjankar, Nikhil S.; Fraga, Carlos G.; Carman, April J.
Chemical attribution signatures (CAS) for chemical threat agents (CTAs) are being investigated to provide an evidentiary link between CTAs and specific sources to support criminal investigations and prosecutions. In a previous study, anionic impurity profiles developed using high performance ion chromatography (HPIC) were demonstrated as CAS for matching samples from eight potassium cyanide (KCN) stocks to their reported countries of origin. Herein, a larger number of solid KCN stocks (n = 13) and, for the first time, solid sodium cyanide (NaCN) stocks (n = 15) were examined to determine what additional sourcing information can be obtained through anion, carbon stablemore » isotope, and elemental analyses of cyanide stocks by HPIC, isotope ratio mass spectrometry (IRMS), and inductively coupled plasma optical emission spectroscopy (ICP-OES), respectively. The HPIC anion data was evaluated using the variable selection methods of Fisher-ratio (F-ratio), interval partial least squares (iPLS), and genetic algorithm-based partial least squares (GAPLS) and the classification methods of partial least squares discriminate analysis (PLSDA), K nearest neighbors (KNN), and support vector machines discriminate analysis (SVMDA). In summary, hierarchical cluster analysis (HCA) of anion impurity profiles from multiple cyanide stocks from six reported country of origins resulted in cyanide samples clustering into three groups: Czech Republic, Germany, and United States, independent of the associated alkali metal (K or Na). The three country groups were independently corroborated by HCA of cyanide elemental profiles and corresponded to countries with known solid cyanide factories. Both the anion and elemental CAS are believed to originate from the aqueous alkali hydroxides used in cyanide manufacture. Carbon stable isotope measurements resulted in two clusters: Germany and United States (the single Czech stock grouped with United States stocks). The carbon isotope CAS is believed to originate from the carbon source and process used to make the HCN utilized in cyanide synthesis. Classification errors for two validation studies using anion impurity profiles collected over five years on different instruments were as low as zero for KNN and SVMDA, demonstrating the excellent reliability (so far) of using anion impurities for matching a cyanide sample to its country of manufacture (i.e., factory). Variable selection reduced errors for those classification methods having errors greater than zero with iPLS-forward selection, and F-ratio typically providing the lowest errors. Finally, using anion profiles to match cyanides to a specific stock or stock group resulted in cross-validation errors ranging from zero to 5.3%.« less
NASA Astrophysics Data System (ADS)
Jin, Wenchao; Wang, Zhao; Li, Meng; He, Yahua; Hu, Xiaokang; Li, Luying; Gao, Yihua; Hu, Yongming; Gu, Haoshuang; Wang, Xiaolin
2018-04-01
Lead-free (K,Na)NbO3 (KNN) nanorod arrays were synthesized with the assistance of a Nb: SrTiO3 single-crystal substrate through the hydrothermal process. The evolutions of the morphology, composition, and structure of the as-synthesized KNN nanorods with the increase in reaction time were investigated. The results confirmed that the increase in reaction time up to 3 h led to the increase in the length and aspect ratio of the well-aligned KNN nanorods. All samples have K-rich orthorhombic crystal structures, while the diffraction peaks shifted towards a higher degree. The peak shifts should be attributed to the increase in the Na content in the KNN lattice, which could decrease the lattice parameters owing to the small ionic radius of Na+ than that of K+. Moreover, the increase in reaction time also resulted in the suppression of oxygen vacancies on the surface of the KNN nanorods. These evolutions of the composition and crystal structure, as well as the decrease in the defect content, lead to great enhancement of the nanorod's piezoelectric response, as their d33 value was increased from 19 to 64 pm/V. These results demonstrated the significant impact of reaction time on the hydrothermal growth of high-performance lead-free KNN one-dimensional nanomaterials.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Atamanik, E.; Thangadurai, V.
We report a comparative study of the dielectric properties of solid-state (ceramic method) synthesized NaNbO{sub 3} (NN), Na{sub 0.75}K{sub 0.25}NbO{sub 3} (K25NN), K{sub 0.5}Na{sub 0.5}NbO{sub 3} (KNN) and some composite materials containing In{sub 2}O{sub 3} and NN or KNN using an AC impedance method. Powder X-ray diffraction (PXRD) was employed to investigate the phase purity. No significant amount of impurity phase was observed for NN, K25NN, and KNN. Substitutions of 10, 15 and 25 mol% In{sup 3+} for Nb{sup 5+} in KNN and NN using solid-state reactions at 1150 deg. C resulted in composite materials. AC impedance studies of NN,more » KNN and K25NN in the temperature range of 500-800 deg. C showed a single semicircle (attributed to the bulk property) in the high-frequency range of 10{sup 3} to 10{sup 6} Hz. The individual contributions from the bulk and grain boundary on the dielectric properties were resolved and quantified from the impedance data. The calculated dielectric values for NN were consistent with previously reported in the literature. 10% Indium based KNN composite materials had the lowest dielectric loss 0.585 and the dielectric constant of 233 at 100 kHz at the temperature of 650 deg. C.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yongsiri, Ploypailin; Sirisoonthorn, Somnuk; Pengpat, Kamonpan, E-mail: kamonpan.p@cmu.ac.th
Highlights: • The KNN–SiO{sub 2} doped Er{sub 2}O{sub 3} glass-ceramics was prepared by incorporation method. • High dielectric constant (458.41 at 100 kHz) and low loss (0.0005) could be obtained. • TEM and SEM confirmed the existence of KNN crystals embedded in glass matrix. • The Er{sub 2}O{sub 3} dopant causes insignificant effect on modifying E{sub g} value. - Abstract: In this study, transparent glass-ceramics from potassium sodium niobate (KNN)-silicate glass system doped with erbium oxide (Er{sub 2}O{sub 3}) were successfully prepared by incorporation method. KNN was added in glass batches as heterogeneous nucleating agent. The KNN powder was mixedmore » with SiO{sub 2} and Er{sub 2}O{sub 3} dopant with KNN and Er{sub 2}O{sub 3} content varied between 70–80 and 0.5–1.0 mol%, respectively. Each batch was subsequently melted at 1300 °C for 15 min in a platinum crucible using an electric furnace. The quenched glasses were then subjected to heat treatment at various temperatures for 4 h. XRD results showed that the prepared glass ceramics contained crystals of KNN solid solution. In contrary, dielectric constant (ϵ{sub r}) and dielectric loss (tan δ) were found to increase with increasing heat treatment temperature. Additionally, optical properties such as absorbance and energy band gap have been investigated.« less
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
ERIC Educational Resources Information Center
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
Ronald E. McRoberts
2009-01-01
Nearest neighbors techniques have been shown to be useful for predicting multiple forest attributes from forest inventory and Landsat satellite image data. However, in regions lacking good digital land cover information, nearest neighbors selected to predict continuous variables such as tree volume must be selected without regard to relevant categorical variables such...
Probability machines: consistent probability estimation using nonparametric learning machines.
Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A
2012-01-01
Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.
A Novel Topology Control Approach to Maintain the Node Degree in Dynamic Wireless Sensor Networks
Huang, Yuanjiang; Martínez, José-Fernán; Díaz, Vicente Hernández; Sendra, Juana
2014-01-01
Topology control is an important technique to improve the connectivity and the reliability of Wireless Sensor Networks (WSNs) by means of adjusting the communication range of wireless sensor nodes. In this paper, a novel Fuzzy-logic Topology Control (FTC) is proposed to achieve any desired average node degree by adaptively changing communication range, thus improving the network connectivity, which is the main target of FTC. FTC is a fully localized control algorithm, and does not rely on location information of neighbors. Instead of designing membership functions and if-then rules for fuzzy-logic controller, FTC is constructed from the training data set to facilitate the design process. FTC is proved to be accurate, stable and has short settling time. In order to compare it with other representative localized algorithms (NONE, FLSS, k-Neighbor and LTRT), FTC is evaluated through extensive simulations. The simulation results show that: firstly, similar to k-Neighbor algorithm, FTC is the best to achieve the desired average node degree as node density varies; secondly, FTC is comparable to FLSS and k-Neighbor in terms of energy-efficiency, but is better than LTRT and NONE; thirdly, FTC has the lowest average maximum communication range than other algorithms, which indicates that the most energy-consuming node in the network consumes the lowest power. PMID:24608008
A collaborative filtering recommendation algorithm based on weighted SimRank and social trust
NASA Astrophysics Data System (ADS)
Su, Chang; Zhang, Butao
2017-05-01
Collaborative filtering is one of the most widely used recommendation technologies, but the data sparsity and cold start problem of collaborative filtering algorithms are difficult to solve effectively. In order to alleviate the problem of data sparsity in collaborative filtering algorithm, firstly, a weighted improved SimRank algorithm is proposed to compute the rating similarity between users in rating data set. The improved SimRank can find more nearest neighbors for target users according to the transmissibility of rating similarity. Then, we build trust network and introduce the calculation of trust degree in the trust relationship data set. Finally, we combine rating similarity and trust to build a comprehensive similarity in order to find more appropriate nearest neighbors for target user. Experimental results show that the algorithm proposed in this paper improves the recommendation precision of the Collaborative algorithm effectively.
An interactive control algorithm used for equilateral triangle formation with robotic sensors.
Li, Xiang; Chen, Hongcai
2014-04-22
This paper describes an interactive control algorithm, called Triangle Formation Algorithm (TFA), used for three neighboring robotic sensors which are distributed randomly to self-organize into and equilateral triangle (E) formation. The algorithm is proposed based on the triangular geometry and considering the actual sensors used in robotics. In particular, the stability of the TFA, which can be executed by robotic sensors independently and asynchronously for E formation, is analyzed in details based on Lyapunov stability theory. Computer simulations are carried out for verifying the effectiveness of the TFA. The analytical results and simulation studies indicate that three neighboring robots employing conventional sensors can self-organize into E formations successfully regardless of their initial distribution using the same TFAs.
An Interactive Control Algorithm Used for Equilateral Triangle Formation with Robotic Sensors
Li, Xiang; Chen, Hongcai
2014-01-01
This paper describes an interactive control algorithm, called Triangle Formation Algorithm (TFA), used for three neighboring robotic sensors which are distributed randomly to self-organize into and equilateral triangle (E) formation. The algorithm is proposed based on the triangular geometry and considering the actual sensors used in robotics. In particular, the stability of the TFA, which can be executed by robotic sensors independently and asynchronously for E formation, is analyzed in details based on Lyapunov stability theory. Computer simulations are carried out for verifying the effectiveness of the TFA. The analytical results and simulation studies indicate that three neighboring robots employing conventional sensors can self-organize into E formations successfully regardless of their initial distribution using the same TFAs. PMID:24759118
Acharya, U. Rajendra; Sree, S. Vinitha; Kulshreshtha, Sanjeev; Molinari, Filippo; Koh, Joel En Wei; Saba, Luca; Suri, Jasjit S.
2014-01-01
Ovarian cancer is the fifth highest cause of cancer in women and the leading cause of death from gynecological cancers. Accurate diagnosis of ovarian cancer from acquired images is dependent on the expertise and experience of ultrasonographers or physicians, and is therefore, associated with inter observer variabilities. Computer Aided Diagnostic (CAD) techniques use a number of different data mining techniques to automatically predict the presence or absence of cancer, and therefore, are more reliable and accurate. A review of published literature in the field of CAD based ovarian cancer detection indicates that many studies use ultrasound images as the base for analysis. The key objective of this work is to propose an effective adjunct CAD technique called GyneScan for ovarian tumor detection in ultrasound images. In our proposed data mining framework, we extract several texture features based on first order statistics, Gray Level Co-occurrence Matrix and run length matrix. The significant features selected using t-test are then used to train and test several supervised learning based classifiers such as Probabilistic Neural Networks (PNN), Support Vector Machine (SVM), Decision Tree (DT), k-Nearest Neighbor (KNN), and Naïve Bayes (NB). We evaluated the developed framework using 1300 benign and 1300 malignant images. Using 11 significant features in KNN/PNN classifiers, we were able to achieve 100% classification accuracy, sensitivity, specificity, and positive predictive value in detecting ovarian tumor. Even though more validation using larger databases would better establish the robustness of our technique, the preliminary results are promising. This technique could be used as a reliable adjunct method to existing imaging modalities to provide a more confident second opinion on the presence/absence of ovarian tumor. PMID:24325128
Zhuang, Qianfen; Cao, Wei; Ni, Yongnian; Wang, Yong
2018-08-01
Most of the conventional multidimensional differential sensors currently need at least two-step fabrication, namely synthesis of probe(s) and identification of multiple analytes by mixing of analytes with probe(s), and were conducted using multiple sensing elements or several devices. In the study, we chose five different nucleobases (adenine, cytosine, guanine, thymine, and uracil) as model analytes, and found that under hydrothermal conditions, sodium citrate could react directly with various nucleobases to yield different nitrogen-doped carbon nanodots (CDs). The CDs synthesized from different nucleobases exhibited different fluorescent properties, leading to their respective characteristic fluorescence spectra. Hence, we combined the fluorescence spectra of the CDs with advanced chemometrics like principle component analysis (PCA), hierarchical cluster analysis (HCA), K-nearest neighbor (KNN) and soft independent modeling of class analogy (SIMCA), to present a conceptually novel "synthesis-identification integration" strategy to construct a multidimensional differential sensor for nucleobase discrimination. Single-wavelength excitation fluorescence spectral data, single-wavelength emission fluorescence spectral data, and fluorescence Excitation-Emission Matrices (EEMs) of the CDs were respectively used as input data of the differential sensor. The results showed that the discrimination ability of the multidimensional differential sensor with EEM data set as input data was superior to those with single-wavelength excitation/emission fluorescence data set, suggesting that increasing the number of the data input could improve the discrimination power. Two supervised pattern recognition methods, namely KNN and SIMCA, correctly identified the five nucleobases with a classification accuracy of 100%. The proposed "synthesis-identification integration" strategy together with a multidimensional array of experimental data holds great promise in the construction of differential sensors. Copyright © 2018 Elsevier B.V. All rights reserved.
Classification of interstitial lung disease patterns with topological texture features
NASA Astrophysics Data System (ADS)
Huber, Markus B.; Nagarajan, Mahesh; Leinsinger, Gerda; Ray, Lawrence A.; Wismüller, Axel
2010-03-01
Topological texture features were compared in their ability to classify morphological patterns known as 'honeycombing' that are considered indicative for the presence of fibrotic interstitial lung diseases in high-resolution computed tomography (HRCT) images. For 14 patients with known occurrence of honey-combing, a stack of 70 axial, lung kernel reconstructed images were acquired from HRCT chest exams. A set of 241 regions of interest of both healthy and pathological (89) lung tissue were identified by an experienced radiologist. Texture features were extracted using six properties calculated from gray-level co-occurrence matrices (GLCM), Minkowski Dimensions (MDs), and three Minkowski Functionals (MFs, e.g. MF.euler). A k-nearest-neighbor (k-NN) classifier and a Multilayer Radial Basis Functions Network (RBFN) were optimized in a 10-fold cross-validation for each texture vector, and the classification accuracy was calculated on independent test sets as a quantitative measure of automated tissue characterization. A Wilcoxon signed-rank test was used to compare two accuracy distributions and the significance thresholds were adjusted for multiple comparisons by the Bonferroni correction. The best classification results were obtained by the MF features, which performed significantly better than all the standard GLCM and MD features (p < 0.005) for both classifiers. The highest accuracy was found for MF.euler (97.5%, 96.6%; for the k-NN and RBFN classifier, respectively). The best standard texture features were the GLCM features 'homogeneity' (91.8%, 87.2%) and 'absolute value' (90.2%, 88.5%). The results indicate that advanced topological texture features can provide superior classification performance in computer-assisted diagnosis of interstitial lung diseases when compared to standard texture analysis methods.
Votano, Joseph R; Parham, Marc; Hall, L Mark; Hall, Lowell H; Kier, Lemont B; Oloff, Scott; Tropsha, Alexander
2006-11-30
Four modeling techniques, using topological descriptors to represent molecular structure, were employed to produce models of human serum protein binding (% bound) on a data set of 1008 experimental values, carefully screened from publicly available sources. To our knowledge, this data is the largest set on human serum protein binding reported for QSAR modeling. The data was partitioned into a training set of 808 compounds and an external validation test set of 200 compounds. Partitioning was accomplished by clustering the compounds in a structure descriptor space so that random sampling of 20% of the whole data set produced an external test set that is a good representative of the training set with respect to both structure and protein binding values. The four modeling techniques include multiple linear regression (MLR), artificial neural networks (ANN), k-nearest neighbors (kNN), and support vector machines (SVM). With the exception of the MLR model, the ANN, kNN, and SVM QSARs were ensemble models. Training set correlation coefficients and mean absolute error ranged from r2=0.90 and MAE=7.6 for ANN to r2=0.61 and MAE=16.2 for MLR. Prediction results from the validation set yielded correlation coefficients and mean absolute errors which ranged from r2=0.70 and MAE=14.1 for ANN to a low of r2=0.59 and MAE=18.3 for the SVM model. Structure descriptors that contribute significantly to the models are discussed and compared with those found in other published models. For the ANN model, structure descriptor trends with respect to their affects on predicted protein binding can assist the chemist in structure modification during the drug design process.
NASA Astrophysics Data System (ADS)
Qur’ania, A.; Sarinah, I.
2018-03-01
People often wrong in knowing the type of jasmine by just looking at the white color of the jasmine, while not all white flowers including jasmine and not all jasmine flowers have white. There is a jasmine that is yellow and there is a jasmine that is white and purple.The aim of this research is to identify Jasmine flower (Jasminum sp.) based on the shape of the flower image-based using Sobel edge detection and k-Nearest Neighbor. Edge detection is used to detect the type of flower from the flower shape. Edge detection aims to improve the appearance of the border of a digital image. While k-Nearest Neighbor method is used to classify the classification of test objects into classes that have neighbouring properties closest to the object of training. The data used in this study are three types of jasmine namely jasmine white (Jasminum sambac), jasmine gambir (Jasminum pubescens), and jasmine japan (Pseuderanthemum reticulatum). Testing of jasmine flower image resized 50 × 50 pixels, 100 × 100 pixels, 150 × 150 pixels yields an accuracy of 84%. Tests on distance values of the k-NN method with spacing 5, 10 and 15 resulted in different accuracy rates for 5 and 10 closest distances yielding the same accuracy rate of 84%, for the 15 shortest distance resulted in a small accuracy of 65.2%.
Navarro, Pedro J.; Fernández, Carlos; Borraz, Raúl; Alonso, Diego
2016-01-01
This article describes an automated sensor-based system to detect pedestrians in an autonomous vehicle application. Although the vehicle is equipped with a broad set of sensors, the article focuses on the processing of the information generated by a Velodyne HDL-64E LIDAR sensor. The cloud of points generated by the sensor (more than 1 million points per revolution) is processed to detect pedestrians, by selecting cubic shapes and applying machine vision and machine learning algorithms to the XY, XZ, and YZ projections of the points contained in the cube. The work relates an exhaustive analysis of the performance of three different machine learning algorithms: k-Nearest Neighbours (kNN), Naïve Bayes classifier (NBC), and Support Vector Machine (SVM). These algorithms have been trained with 1931 samples. The final performance of the method, measured a real traffic scenery, which contained 16 pedestrians and 469 samples of non-pedestrians, shows sensitivity (81.2%), accuracy (96.2%) and specificity (96.8%). PMID:28025565
Navarro, Pedro J; Fernández, Carlos; Borraz, Raúl; Alonso, Diego
2016-12-23
This article describes an automated sensor-based system to detect pedestrians in an autonomous vehicle application. Although the vehicle is equipped with a broad set of sensors, the article focuses on the processing of the information generated by a Velodyne HDL-64E LIDAR sensor. The cloud of points generated by the sensor (more than 1 million points per revolution) is processed to detect pedestrians, by selecting cubic shapes and applying machine vision and machine learning algorithms to the XY, XZ, and YZ projections of the points contained in the cube. The work relates an exhaustive analysis of the performance of three different machine learning algorithms: k-Nearest Neighbours (kNN), Naïve Bayes classifier (NBC), and Support Vector Machine (SVM). These algorithms have been trained with 1931 samples. The final performance of the method, measured a real traffic scenery, which contained 16 pedestrians and 469 samples of non-pedestrians, shows sensitivity (81.2%), accuracy (96.2%) and specificity (96.8%).
Real-time image annotation by manifold-based biased Fisher discriminant analysis
NASA Astrophysics Data System (ADS)
Ji, Rongrong; Yao, Hongxun; Wang, Jicheng; Sun, Xiaoshuai; Liu, Xianming
2008-01-01
Automatic Linguistic Annotation is a promising solution to bridge the semantic gap in content-based image retrieval. However, two crucial issues are not well addressed in state-of-art annotation algorithms: 1. The Small Sample Size (3S) problem in keyword classifier/model learning; 2. Most of annotation algorithms can not extend to real-time online usage due to their low computational efficiencies. This paper presents a novel Manifold-based Biased Fisher Discriminant Analysis (MBFDA) algorithm to address these two issues by transductive semantic learning and keyword filtering. To address the 3S problem, Co-Training based Manifold learning is adopted for keyword model construction. To achieve real-time annotation, a Bias Fisher Discriminant Analysis (BFDA) based semantic feature reduction algorithm is presented for keyword confidence discrimination and semantic feature reduction. Different from all existing annotation methods, MBFDA views image annotation from a novel Eigen semantic feature (which corresponds to keywords) selection aspect. As demonstrated in experiments, our manifold-based biased Fisher discriminant analysis annotation algorithm outperforms classical and state-of-art annotation methods (1.K-NN Expansion; 2.One-to-All SVM; 3.PWC-SVM) in both computational time and annotation accuracy with a large margin.
Computational intelligence techniques for biological data mining: An overview
NASA Astrophysics Data System (ADS)
Faye, Ibrahima; Iqbal, Muhammad Javed; Said, Abas Md; Samir, Brahim Belhaouari
2014-10-01
Computational techniques have been successfully utilized for a highly accurate analysis and modeling of multifaceted and raw biological data gathered from various genome sequencing projects. These techniques are proving much more effective to overcome the limitations of the traditional in-vitro experiments on the constantly increasing sequence data. However, most critical problems that caught the attention of the researchers may include, but not limited to these: accurate structure and function prediction of unknown proteins, protein subcellular localization prediction, finding protein-protein interactions, protein fold recognition, analysis of microarray gene expression data, etc. To solve these problems, various classification and clustering techniques using machine learning have been extensively used in the published literature. These techniques include neural network algorithms, genetic algorithms, fuzzy ARTMAP, K-Means, K-NN, SVM, Rough set classifiers, decision tree and HMM based algorithms. Major difficulties in applying the above algorithms include the limitations found in the previous feature encoding and selection methods while extracting the best features, increasing classification accuracy and decreasing the running time overheads of the learning algorithms. The application of this research would be potentially useful in the drug design and in the diagnosis of some diseases. This paper presents a concise overview of the well-known protein classification techniques.
Nguyen, Hieu Cong; Jung, Jaehoon; Lee, Jungbin; Choi, Sung-Uk; Hong, Suk-Young; Heo, Joon
2015-07-31
The reflectance of the Earth's surface is significantly influenced by atmospheric conditions such as water vapor content and aerosols. Particularly, the absorption and scattering effects become stronger when the target features are non-bright objects, such as in aqueous or vegetated areas. For any remote-sensing approach, atmospheric correction is thus required to minimize those effects and to convert digital number (DN) values to surface reflectance. The main aim of this study was to test the three most popular atmospheric correction models, namely (1) Dark Object Subtraction (DOS); (2) Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) and (3) the Second Simulation of Satellite Signal in the Solar Spectrum (6S) and compare them with Top of Atmospheric (TOA) reflectance. By using the k-Nearest Neighbor (kNN) algorithm, a series of experiments were conducted for above-ground forest biomass (AGB) estimations of the Gongju and Sejong region of South Korea, in order to check the effectiveness of atmospheric correction methods for Landsat ETM+. Overall, in the forest biomass estimation, the 6S model showed the bestRMSE's, followed by FLAASH, DOS and TOA. In addition, a significant improvement of RMSE by 6S was found with images when the study site had higher total water vapor and temperature levels. Moreover, we also tested the sensitivity of the atmospheric correction methods to each of the Landsat ETM+ bands. The results confirmed that 6S dominates the other methods, especially in the infrared wavelengths covering the pivotal bands for forest applications. Finally, we suggest that the 6S model, integrating water vapor and aerosol optical depth derived from MODIS products, is better suited for AGB estimation based on optical remote-sensing data, especially when using satellite images acquired in the summer during full canopy development.
Jiang, Jiyang; Liu, Tao; Zhu, Wanlin; Koncz, Rebecca; Liu, Hao; Lee, Teresa; Sachdev, Perminder S; Wen, Wei
2018-07-01
We present 'UBO Detector', a cluster-based, fully automated pipeline for extracting and calculating variables for regions of white matter hyperintensities (WMH) (available for download at https://cheba.unsw.edu.au/group/neuroimaging-pipeline). It takes T1-weighted and fluid attenuated inversion recovery (FLAIR) scans as input, and SPM12 and FSL functions are utilised for pre-processing. The candidate clusters are then generated by FMRIB's Automated Segmentation Tool (FAST). A supervised machine learning algorithm, k-nearest neighbor (k-NN), is applied to determine whether the candidate clusters are WMH or non-WMH. UBO Detector generates both image and text (volumes and the number of WMH clusters) outputs for whole brain, periventricular, deep, and lobar WMH, as well as WMH in arterial territories. The computation time for each brain is approximately 15 min. We validated the performance of UBO Detector by showing a) high segmentation (similarity index (SI) = 0.848) and volumetric (intraclass correlation coefficient (ICC) = 0.985) agreement between the UBO Detector-derived and manually traced WMH; b) highly correlated (r 2 > 0.9) and a steady increase of WMH volumes over time; and c) significant associations of periventricular (t = 22.591, p < 0.001) and deep (t = 14.523, p < 0.001) WMH volumes generated by UBO Detector with Fazekas rating scores. With parallel computing enabled in UBO Detector, the processing can take advantage of multi-core CPU's that are commonly available on workstations. In conclusion, UBO Detector is a reliable, efficient and fully automated WMH segmentation pipeline. Copyright © 2018 Elsevier Inc. All rights reserved.
Evangelista, Lorraine S.; Ghasemzadeh, Hassan; Lee, Jung-Ah; Fallahzadeh, Ramin; Sarrafzadeh, Majid; Moser, Debra K.
2017-01-01
Background It is unclear whether subgroups of patients may benefit from remote monitoring systems (RMS) and what user characteristics and contextual factors determine effective use of RMS in patients with heart failure (HF). Objective The study was conducted to determine whether certain user characteristics (i.e. personal and clinical variables) predict use of RMS using advanced machine learning software algorithms in patients with HF. Methods This pilot study was a single-arm experimental study with a pre- (baseline) and post- (3 months) design; data from the baseline measures were used for the current data analyses. Sixteen patients provided consent; only 7 patients (mean age 65.8±6.1, range 58–83) accessed the RMS and transmitted daily data (e.g. weight, blood pressure) as instructed during the 12 week study duration. Results Baseline demographic and clinical characteristics of users and non-users were comparable for a majority of factors. However, users were more likely to have no HF specialty based care or an automatic internal cardioverter defibrillator. The precision accuracy of decision tree, multilayer perceptron (MLP) and k-Nearest Neighbor (k-NN) classifiers for predicting access to RMS was 87.5%, 90.3%, and 94.5% respectively. Conclusion Our preliminary data show that a small set of baseline attributes is sufficient to predict subgroups of patients who had a higher likelihood of using RMS. While our findings shed light on potential end-users more likely to benefit from RMS-based interventions, additional research in a larger sample is warranted to explicate the impact of user characteristics on actual use of these technologies. PMID:27886024
Śliwińska, Magdalena; Garcia-Hernandez, Celia; Kościński, Mikołaj; Dymerski, Tomasz; Wardencki, Waldemar; Namieśnik, Jacek; Śliwińska-Bartkowiak, Małgorzata; Jurga, Stefan; Garcia-Cabezon, Cristina; Rodriguez-Mendez, Maria Luz
2016-01-01
The capability of a phthalocyanine-based voltammetric electronic tongue to analyze strong alcoholic beverages has been evaluated and compared with the performance of spectroscopic techniques coupled to chemometrics. Nalewka Polish liqueurs prepared from five apple varieties have been used as a model of strong liqueurs. Principal Component Analysis has demonstrated that the best discrimination between liqueurs prepared from different apple varieties is achieved using the e-tongue and UV-Vis spectroscopy. Raman spectra coupled to chemometrics have not been efficient in discriminating liqueurs. The calculated Euclidean distances and the k-Nearest Neighbors algorithm (kNN) confirmed these results. The main advantage of the e-tongue is that, using PLS-1, good correlations have been found simultaneously with the phenolic content measured by the Folin–Ciocalteu method (R2 of 0.97 in calibration and R2 of 0.93 in validation) and also with the density, a marker of the alcoholic content method (R2 of 0.93 in calibration and R2 of 0.88 in validation). UV-Vis coupled with chemometrics has shown good correlations only with the phenolic content (R2 of 0.99 in calibration and R2 of 0.99 in validation) but correlations with the alcoholic content were low. Raman coupled with chemometrics has shown good correlations only with density (R2 of 0.96 in calibration and R2 of 0.85 in validation). In summary, from the three holistic methods evaluated to analyze strong alcoholic liqueurs, the voltammetric electronic tongue using phthalocyanines as sensing elements is superior to Raman or UV-Vis techniques because it shows an excellent discrimination capability and remarkable correlations with both antioxidant capacity and alcoholic content—the most important parameters to be measured in this type of liqueurs. PMID:27735832
Śliwińska, Magdalena; Garcia-Hernandez, Celia; Kościński, Mikołaj; Dymerski, Tomasz; Wardencki, Waldemar; Namieśnik, Jacek; Śliwińska-Bartkowiak, Małgorzata; Jurga, Stefan; Garcia-Cabezon, Cristina; Rodriguez-Mendez, Maria Luz
2016-10-09
The capability of a phthalocyanine-based voltammetric electronic tongue to analyze strong alcoholic beverages has been evaluated and compared with the performance of spectroscopic techniques coupled to chemometrics. Nalewka Polish liqueurs prepared from five apple varieties have been used as a model of strong liqueurs. Principal Component Analysis has demonstrated that the best discrimination between liqueurs prepared from different apple varieties is achieved using the e-tongue and UV-Vis spectroscopy. Raman spectra coupled to chemometrics have not been efficient in discriminating liqueurs. The calculated Euclidean distances and the k-Nearest Neighbors algorithm (kNN) confirmed these results. The main advantage of the e-tongue is that, using PLS-1, good correlations have been found simultaneously with the phenolic content measured by the Folin-Ciocalteu method (R² of 0.97 in calibration and R² of 0.93 in validation) and also with the density, a marker of the alcoholic content method (R² of 0.93 in calibration and R² of 0.88 in validation). UV-Vis coupled with chemometrics has shown good correlations only with the phenolic content (R² of 0.99 in calibration and R² of 0.99 in validation) but correlations with the alcoholic content were low. Raman coupled with chemometrics has shown good correlations only with density (R² of 0.96 in calibration and R² of 0.85 in validation). In summary, from the three holistic methods evaluated to analyze strong alcoholic liqueurs, the voltammetric electronic tongue using phthalocyanines as sensing elements is superior to Raman or UV-Vis techniques because it shows an excellent discrimination capability and remarkable correlations with both antioxidant capacity and alcoholic content-the most important parameters to be measured in this type of liqueurs.
Takahashi, Hiro; Aoyagi, Kazuhiko; Nakanishi, Yukihiro; Sasaki, Hiroki; Yoshida, Teruhiko; Honda, Hiroyuki
2006-07-01
Esophageal cancer is a well-known cancer with poorer prognosis than other cancers. An optimal and individualized treatment protocol based on accurate diagnosis is urgently needed to improve the treatment of cancer patients. For this purpose, it is important to develop a sophisticated algorithm that can manage a large amount of data, such as gene expression data from DNA microarrays, for optimal and individualized diagnosis. Marker gene selection is essential in the analysis of gene expression data. We have already developed a combination method of the use of the projective adaptive resonance theory and that of a boosted fuzzy classifier with the SWEEP operator denoted PART-BFCS. This method is superior to other methods, and has four features, namely fast calculation, accurate prediction, reliable prediction, and rule extraction. In this study, we applied this method to analyze microarray data obtained from esophageal cancer patients. A combination method of PART-BFCS and the U-test was also investigated. It was necessary to use a specific type of BFCS, namely, BFCS-1,2, because the esophageal cancer data were very complexity. PART-BFCS and PART-BFCS with the U-test models showed higher performances than two conventional methods, namely, k-nearest neighbor (kNN) and weighted voting (WV). The genes including CDK6 could be found by our methods and excellent IF-THEN rules could be extracted. The genes selected in this study have a high potential as new diagnosis markers for esophageal cancer. These results indicate that the new methods can be used in marker gene selection for the diagnosis of cancer patients.
NASA Astrophysics Data System (ADS)
Ahn, C. W.; Y Lee, S.; Lee, H. J.; Ullah, A.; Bae, J. S.; Jeong, E. D.; Choi, J. S.; Park, B. H.; Kim, I. W.
2009-11-01
We have fabricated K0.5Na0.5NbO3 (KNN) thin films on Pt substrates by a chemical solution deposition method and investigated the effect of K and Na excess (0-30 mol%) on ferroelectric and piezoelectric properties of KNN thin film. It was found that with increasing K and Na excess in a precursor solution from 0 to 30 mol%, the leakage current and ferroelectric properties were strongly affected. KNN thin film synthesized by using 20 mol% K and Na excess precursor solution exhibited a low leakage current density and well saturated ferroelectric P-E hysteresis loops. Moreover, the optimized KNN thin film had good fatigue resistance and a piezoelectric constant of 40 pm V-1, which is comparable to that of polycrystalline PZT thin films.
A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database
Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; ...
2013-01-01
Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order in amore » 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less
Missing value imputation in DNA microarrays based on conjugate gradient method.
Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh
2012-02-01
Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Du Hongliang; Zhou Wancheng; Luo Fa
The (1-x)(K{sub 0.5}Na{sub 0.5})NbO{sub 3}-x(Ba{sub 0.5}Sr{sub 0.5})TiO{sub 3} (KNN-BST) solid solution has been synthesized by conventional solid-state sintering in order to search for the new lead-free relaxor ferroelectrics for high temperature applications. The phase structure, dielectric properties, and relaxor behavior of the (1-x)KNN-xBST solid solution are systematically investigated. The phase structure of the (1-x)KNN-xBST solid solution gradually changes from pure perovskite phase with an orthorhombic symmetry to the tetragonal symmetry, then to the pseudocubic phase, and to the cubic phase with increasing addition of BST. The 0.90KNN-0.10BST solid solution shows a broad dielectric peak with permittivity maximum near 2500 andmore » low dielectric loss (<4%) in the temperature range of 100-250 deg. C. The result indicates that this material may have great potential for a variety of high temperature applications. The diffuse phase transition and the temperature of the maximum dielectric permittivity shifting toward higher temperature with increasing frequency, which are two typical characteristics for relaxor ferroelectrics, are observed in the (1-x)KNN-xBST solid solution. The dielectric relaxor behavior obeys a modified Curie-Weiss law and a Vogel-Fulcher relationship. The relaxor nature is attributed to the appearance of polar nanoregions owing to the formation of randon fields including local electric fields and elastic fields. These results confirm that the KNN-based relaxor ferroelectrics can be regarded as an alternative direction for the development of high temperature lead-free relaxor ferroelectrics.« less
False-nearest-neighbors algorithm and noise-corrupted time series
NASA Astrophysics Data System (ADS)
Rhodes, Carl; Morari, Manfred
1997-05-01
The false-nearest-neighbors (FNN) algorithm was originally developed to determine the embedding dimension for autonomous time series. For noise-free computer-generated time series, the algorithm does a good job in predicting the embedding dimension. However, the problem of predicting the embedding dimension when the time-series data are corrupted by noise was not fully examined in the original studies of the FNN algorithm. Here it is shown that with large data sets, even small amounts of noise can lead to incorrect prediction of the embedding dimension. Surprisingly, as the length of the time series analyzed by FNN grows larger, the cause of incorrect prediction becomes more pronounced. An analysis of the effect of noise on the FNN algorithm and a solution for dealing with the effects of noise are given here. Some results on the theoretically correct choice of the FNN threshold are also presented.
iNJclust: Iterative Neighbor-Joining Tree Clustering Framework for Inferring Population Structure.
Limpiti, Tulaya; Amornbunchornvej, Chainarong; Intarapanich, Apichart; Assawamakin, Anunchai; Tongsima, Sissades
2014-01-01
Understanding genetic differences among populations is one of the most important issues in population genetics. Genetic variations, e.g., single nucleotide polymorphisms, are used to characterize commonality and difference of individuals from various populations. This paper presents an efficient graph-based clustering framework which operates iteratively on the Neighbor-Joining (NJ) tree called the iNJclust algorithm. The framework uses well-known genetic measurements, namely the allele-sharing distance, the neighbor-joining tree, and the fixation index. The behavior of the fixation index is utilized in the algorithm's stopping criterion. The algorithm provides an estimated number of populations, individual assignments, and relationships between populations as outputs. The clustering result is reported in the form of a binary tree, whose terminal nodes represent the final inferred populations and the tree structure preserves the genetic relationships among them. The clustering performance and the robustness of the proposed algorithm are tested extensively using simulated and real data sets from bovine, sheep, and human populations. The result indicates that the number of populations within each data set is reasonably estimated, the individual assignment is robust, and the structure of the inferred population tree corresponds to the intrinsic relationships among populations within the data.
A scale-based connected coherence tree algorithm for image segmentation.
Ding, Jundi; Ma, Runing; Chen, Songcan
2008-02-01
This paper presents a connected coherence tree algorithm (CCTA) for image segmentation with no prior knowledge. It aims to find regions of semantic coherence based on the proposed epsilon-neighbor coherence segmentation criterion. More specifically, with an adaptive spatial scale and an appropriate intensity-difference scale, CCTA often achieves several sets of coherent neighboring pixels which maximize the probability of being a single image content (including kinds of complex backgrounds). In practice, each set of coherent neighboring pixels corresponds to a coherence class (CC). The fact that each CC just contains a single equivalence class (EC) ensures the separability of an arbitrary image theoretically. In addition, the resultant CCs are represented by tree-based data structures, named connected coherence tree (CCT)s. In this sense, CCTA is a graph-based image analysis algorithm, which expresses three advantages: 1) its fundamental idea, epsilon-neighbor coherence segmentation criterion, is easy to interpret and comprehend; 2) it is efficient due to a linear computational complexity in the number of image pixels; 3) both subjective comparisons and objective evaluation have shown that it is effective for the tasks of semantic object segmentation and figure-ground separation in a wide variety of images. Those images either contain tiny, long and thin objects or are severely degraded by noise, uneven lighting, occlusion, poor illumination, and shadow.
Energy Efficient and Stable Weight Based Clustering for Mobile Ad Hoc Networks
NASA Astrophysics Data System (ADS)
Bouk, Safdar H.; Sasase, Iwao
Recently several weighted clustering algorithms have been proposed, however, to the best of our knowledge; there is none that propagates weights to other nodes without weight message for leader election, normalizes node parameters and considers neighboring node parameters to calculate node weights. In this paper, we propose an Energy Efficient and Stable Weight Based Clustering (EE-SWBC) algorithm that elects cluster heads without sending any additional weight message. It propagates node parameters to its neighbors through neighbor discovery message (HELLO Message) and stores these parameters in neighborhood list. Each node normalizes parameters and efficiently calculates its own weight and the weights of neighboring nodes from that neighborhood table using Grey Decision Method (GDM). GDM finds the ideal solution (best node parameters in neighborhood list) and calculates node weights in comparison to the ideal solution. The node(s) with maximum weight (parameters closer to the ideal solution) are elected as cluster heads. In result, EE-SWBC fairly selects potential nodes with parameters closer to ideal solution with less overhead. Different performance metrics of EE-SWBC and Distributed Weighted Clustering Algorithm (DWCA) are compared through simulations. The simulation results show that EE-SWBC maintains fewer average numbers of stable clusters with minimum overhead, less energy consumption and fewer changes in cluster structure within network compared to DWCA.
Long-term surface EMG monitoring using K-means clustering and compressive sensing
NASA Astrophysics Data System (ADS)
Balouchestani, Mohammadreza; Krishnan, Sridhar
2015-05-01
In this work, we present an advanced K-means clustering algorithm based on Compressed Sensing theory (CS) in combination with the K-Singular Value Decomposition (K-SVD) method for Clustering of long-term recording of surface Electromyography (sEMG) signals. The long-term monitoring of sEMG signals aims at recording of the electrical activity produced by muscles which are very useful procedure for treatment and diagnostic purposes as well as for detection of various pathologies. The proposed algorithm is examined for three scenarios of sEMG signals including healthy person (sEMG-Healthy), a patient with myopathy (sEMG-Myopathy), and a patient with neuropathy (sEMG-Neuropathr), respectively. The proposed algorithm can easily scan large sEMG datasets of long-term sEMG recording. We test the proposed algorithm with Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) dimensionality reduction methods. Then, the output of the proposed algorithm is fed to K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers in order to calclute the clustering performance. The proposed algorithm achieves a classification accuracy of 99.22%. This ability allows reducing 17% of Average Classification Error (ACE), 9% of Training Error (TE), and 18% of Root Mean Square Error (RMSE). The proposed algorithm also reduces 14% clustering energy consumption compared to the existing K-Means clustering algorithm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, Qi; Zhu, Fang-Yuan; Cheng, Li-Qian
Crystallographic structure of sol-gel-processed lead-free (K,Na)NbO{sub 3} (KNN) epitaxial films on [100]-cut SrTiO{sub 3} single-crystalline substrates was investigated for a deeper understanding of its piezoelectric response. Lattice parameter measurement by high-resolution X-ray diffraction and transmission electron microscopy revealed that the orthorhombic KNN films on SrTiO{sub 3} (100) surfaces are [010] oriented (b-axis-oriented) rather than commonly identified c-axis orientation. Based on the crystallographic orientation and corresponding ferroelectric domain structure investigated by piezoresponse force microscopy, the superior piezoelectric property along b-axis of epitaxial KNN films than other orientations can be explained.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao, Yajing; Chen, Yan; Chen, Kepi
In this study, the effects of doping with GeO 2 on the synthesis temperature, phase structure and morphology of (K 0.5Na 0.5)NbO 3 (KNN) ceramic powders were studied using XRD and SEM. The results show that KNN powders with good crystallinity and compositional homogeneity can be obtained after calcination at up to 900°C for 2 h. Introducing 0.5 mol.% GeO 2 into the starting mixture improved the synthesis of the KNN powders and allowed the calcination temperature to be decreased to 800°C, which can be ascribed to the formation of the liquid phase during the synthesis.
The Korean Neonatal Network: An Overview
Chang, Yun Sil; Park, Hyun-Young
2015-01-01
Currently, in the Republic of Korea, despite the very-low-birth rate, the birth rate and number of preterm infants are markedly increasing. Neonatal deaths and major complications mostly occur in premature infants, especially very-low-birth-weight infants (VLBWIs). VLBWIs weigh less than 1,500 g at birth and require intensive treatment in a neonatal intensive care unit (NICU). The operation of the Korean Neonatal Network (KNN) officially started on April 15, 2013, by the Korean Society of Neonatology with support from the Korea Centers for Disease Control and Prevention. The KNN is a national multicenter neonatal network based on a prospective web-based registry for VLBWIs. About 2,000 VLBWIs from 60 participating hospital NICUs are registered annually in the KNN. The KNN has built unique systems such as a web-based real-time data display on the web site and a site-visit monitoring system for data quality surveillance. The KNN should be maintained and developed further in order to generate appropriate, population-based, data-driven, health-care policies; facilitate active multicenter neonatal research, including quality improvement of neonatal care; and ultimately lead to improvement in the prognosis of high-risk newborns and subsequent reduction in health-care costs through the development of evidence-based neonatal medicine in Korea. PMID:26566355
The Korean Neonatal Network: An Overview.
Chang, Yun Sil; Park, Hyun-Young; Park, Won Soon
2015-10-01
Currently, in the Republic of Korea, despite the very-low-birth rate, the birth rate and number of preterm infants are markedly increasing. Neonatal deaths and major complications mostly occur in premature infants, especially very-low-birth-weight infants (VLBWIs). VLBWIs weigh less than 1,500 g at birth and require intensive treatment in a neonatal intensive care unit (NICU). The operation of the Korean Neonatal Network (KNN) officially started on April 15, 2013, by the Korean Society of Neonatology with support from the Korea Centers for Disease Control and Prevention. The KNN is a national multicenter neonatal network based on a prospective web-based registry for VLBWIs. About 2,000 VLBWIs from 60 participating hospital NICUs are registered annually in the KNN. The KNN has built unique systems such as a web-based real-time data display on the web site and a site-visit monitoring system for data quality surveillance. The KNN should be maintained and developed further in order to generate appropriate, population-based, data-driven, health-care policies; facilitate active multicenter neonatal research, including quality improvement of neonatal care; and ultimately lead to improvement in the prognosis of high-risk newborns and subsequent reduction in health-care costs through the development of evidence-based neonatal medicine in Korea.
Structure identification methods for atomistic simulations of crystalline materials
Stukowski, Alexander
2012-05-28
Here, we discuss existing and new computational analysis techniques to classify local atomic arrangements in large-scale atomistic computer simulations of crystalline solids. This article includes a performance comparison of typical analysis algorithms such as common neighbor analysis (CNA), centrosymmetry analysis, bond angle analysis, bond order analysis and Voronoi analysis. In addition we propose a simple extension to the CNA method that makes it suitable for multi-phase systems. Finally, we introduce a new structure identification algorithm, the neighbor distance analysis, which is designed to identify atomic structure units in grain boundaries.
Multi-Band Received Signal Strength Fingerprinting Based Indoor Location System
NASA Astrophysics Data System (ADS)
Sertthin, Chinnapat; Fujii, Takeo; Ohtsuki, Tomoaki; Nakagawa, Masao
This paper proposes a new multi-band received signal strength (MRSS) fingerprinting based indoor location system, which employs the frequency diversity on the conventional single-band received signal strength (RSS) fingerprinting based indoor location system. In the proposed system, the impacts of frequency diversity on the enhancements of positioning accuracy are analyzed. Effectiveness of the proposed system is proved by experimental approach, which was conducted in non line-of-sight (NLOS) environment under the area of 103m2 at Yagami Campus, Keio University. WLAN access points, which simultaneously transmit dual-band signal of 2.4 and 5.2GHz, are utilized as transmitters. Likewise, a dual-band WLAN receiver is utilized as a receiver. Signal distances calculated by both Manhattan and Euclidean were classified by K-Nearest Neighbor (KNN) classifier to illustrate the performance of the proposed system. The results confirmed that Frequency diversity attributions of multi-band signal provide accuracy improvement over 50% of the conventional single-band.
NASA Astrophysics Data System (ADS)
Jiang, Li; Xuan, Jianping; Shi, Tielin
2013-12-01
Generally, the vibration signals of faulty machinery are non-stationary and nonlinear under complicated operating conditions. Therefore, it is a big challenge for machinery fault diagnosis to extract optimal features for improving classification accuracy. This paper proposes semi-supervised kernel Marginal Fisher analysis (SSKMFA) for feature extraction, which can discover the intrinsic manifold structure of dataset, and simultaneously consider the intra-class compactness and the inter-class separability. Based on SSKMFA, a novel approach to fault diagnosis is put forward and applied to fault recognition of rolling bearings. SSKMFA directly extracts the low-dimensional characteristics from the raw high-dimensional vibration signals, by exploiting the inherent manifold structure of both labeled and unlabeled samples. Subsequently, the optimal low-dimensional features are fed into the simplest K-nearest neighbor (KNN) classifier to recognize different fault categories and severities of bearings. The experimental results demonstrate that the proposed approach improves the fault recognition performance and outperforms the other four feature extraction methods.
Chemical data as markers of the geographical origins of sugarcane spirits.
Serafim, F A T; Pereira-Filho, Edenir R; Franco, D W
2016-04-01
In an attempt to classify sugarcane spirits according to their geographic region of origin, chemical data for 24 analytes were evaluated in 50 cachaças produced using a similar procedure in selected regions of Brazil: São Paulo - SP (15), Minas Gerais - MG (11), Rio de Janeiro - RJ (11), Paraiba -PB (9), and Ceará - CE (4). Multivariate analysis was applied to the analytical results, and the predictive abilities of different classification methods were evaluated. Principal component analysis identified five groups, and chemical similarities were observed between MG and SP samples and between RJ and PB samples. CE samples presented a distinct chemical profile. Among the samples, partial linear square discriminant analysis (PLS-DA) classified 50.2% of the samples correctly, K-nearest neighbor (KNN) 86%, and soft independent modeling of class analogy (SIMCA) 56.2%. Therefore, in this proof of concept demonstration, the proposed approach based on chemical data satisfactorily predicted the cachaças' geographic origins. Copyright © 2015 Elsevier Ltd. All rights reserved.
Canizo, Brenda V; Escudero, Leticia B; Pérez, María B; Pellerano, Roberto G; Wuilloud, Rodolfo G
2018-03-01
The feasibility of the application of chemometric techniques associated with multi-element analysis for the classification of grape seeds according to their provenance vineyard soil was investigated. Grape seed samples from different localities of Mendoza province (Argentina) were evaluated. Inductively coupled plasma mass spectrometry (ICP-MS) was used for the determination of twenty-nine elements (Ag, As, Ce, Co, Cs, Cu, Eu, Fe, Ga, Gd, La, Lu, Mn, Mo, Nb, Nd, Ni, Pr, Rb, Sm, Te, Ti, Tl, Tm, U, V, Y, Zn and Zr). Once the analytical data were collected, supervised pattern recognition techniques such as linear discriminant analysis (LDA), partial least square discriminant analysis (PLS-DA), k-nearest neighbors (k-NN), support vector machine (SVM) and Random Forest (RF) were applied to construct classification/discrimination rules. The results indicated that nonlinear methods, RF and SVM, perform best with up to 98% and 93% accuracy rate, respectively, and therefore are excellent tools for classification of grapes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Voxel classification based airway tree segmentation
NASA Astrophysics Data System (ADS)
Lo, Pechin; de Bruijne, Marleen
2008-03-01
This paper presents a voxel classification based method for segmenting the human airway tree in volumetric computed tomography (CT) images. In contrast to standard methods that use only voxel intensities, our method uses a more complex appearance model based on a set of local image appearance features and Kth nearest neighbor (KNN) classification. The optimal set of features for classification is selected automatically from a large set of features describing the local image structure at several scales. The use of multiple features enables the appearance model to differentiate between airway tree voxels and other voxels of similar intensities in the lung, thus making the segmentation robust to pathologies such as emphysema. The classifier is trained on imperfect segmentations that can easily be obtained using region growing with a manual threshold selection. Experiments show that the proposed method results in a more robust segmentation that can grow into the smaller airway branches without leaking into emphysematous areas, and is able to segment many branches that are not present in the training set.
Feature reconstruction of LFP signals based on PLSR in the neural information decoding study.
Yonghui Dong; Zhigang Shang; Mengmeng Li; Xinyu Liu; Hong Wan
2017-07-01
To solve the problems of Signal-to-Noise Ratio (SNR) and multicollinearity when the Local Field Potential (LFP) signals is used for the decoding of animal motion intention, a feature reconstruction of LFP signals based on partial least squares regression (PLSR) in the neural information decoding study is proposed in this paper. Firstly, the feature information of LFP coding band is extracted based on wavelet transform. Then the PLSR model is constructed by the extracted LFP coding features. According to the multicollinearity characteristics among the coding features, several latent variables which contribute greatly to the steering behavior are obtained, and the new LFP coding features are reconstructed. Finally, the K-Nearest Neighbor (KNN) method is used to classify the reconstructed coding features to verify the decoding performance. The results show that the proposed method can achieve the highest accuracy compared to the other three methods and the decoding effect of the proposed method is robust.
Breast Cancer Detection with Reduced Feature Set.
Mert, Ahmet; Kılıç, Niyazi; Bilgili, Erdem; Akan, Aydin
2015-01-01
This paper explores feature reduction properties of independent component analysis (ICA) on breast cancer decision support system. Wisconsin diagnostic breast cancer (WDBC) dataset is reduced to one-dimensional feature vector computing an independent component (IC). The original data with 30 features and reduced one feature (IC) are used to evaluate diagnostic accuracy of the classifiers such as k-nearest neighbor (k-NN), artificial neural network (ANN), radial basis function neural network (RBFNN), and support vector machine (SVM). The comparison of the proposed classification using the IC with original feature set is also tested on different validation (5/10-fold cross-validations) and partitioning (20%-40%) methods. These classifiers are evaluated how to effectively categorize tumors as benign and malignant in terms of specificity, sensitivity, accuracy, F-score, Youden's index, discriminant power, and the receiver operating characteristic (ROC) curve with its criterion values including area under curve (AUC) and 95% confidential interval (CI). This represents an improvement in diagnostic decision support system, while reducing computational complexity.
Busarakam, Kanungnid; Bull, Alan T; Trujillo, Martha E; Riesco, Raul; Sangal, Vartul; van Wezel, Gilles P; Goodfellow, Michael
2016-06-01
A polyphasic study was designed to determine the taxonomic provenance of three Modestobacter strains isolated from an extreme hyper-arid Atacama Desert soil. The strains, isolates KNN 45-1a, KNN 45-2b(T) and KNN 45-3b, were shown to have chemotaxonomic and morphological properties in line with their classification in the genus Modestobacter. The isolates had identical 16S rRNA gene sequences and formed a branch in the Modestobacter gene tree that was most closely related to the type strain of Modestobacter marinus (99.6% similarity). All three isolates were distinguished readily from Modestobacter type strains by a broad range of phenotypic properties, by qualitative and quantitative differences in fatty acid profiles and by BOX fingerprint patterns. The whole genome sequence of isolate KNN 45-2b(T) showed 89.3% average nucleotide identity, 90.1% (SD: 10.97%) average amino acid identity and a digital DNA-DNA hybridization value of 42.4±3.1 against the genome sequence of M. marinus DSM 45201(T), values consistent with its assignment to a separate species. On the basis of all of these data, it is proposed that the isolates be assigned to the genus Modestobacter as Modestobacter caceresii sp. nov. with isolate KNN 45-2b(T) (CECT 9023(T)=DSM 101691(T)) as the type strain. Analysis of the whole-genome sequence of M. caceresii KNN 45-2b(T), with 4683 open reading frames and a genome size of ∽4.96Mb, revealed the presence of genes and gene-clusters that encode for properties relevant to its adaptability to harsh environmental conditions prevalent in extreme hyper arid Atacama Desert soils. Copyright © 2016. Published by Elsevier GmbH.
Research on Classification of Chinese Text Data Based on SVM
NASA Astrophysics Data System (ADS)
Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao
2017-09-01
Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.
Fast quantum nD Fourier and radon transforms
NASA Astrophysics Data System (ADS)
Labunets, Valeri G.; Labunets-Rundblad, Ekaterina V.; Astola, Jaakko T.
2001-07-01
Fast Classical and quantum algorithms are introduced for a wide class of non-separable nD discrete unitary K- transforms(DKT)KNn. They require a number of 1D DKT Kn smaller than in the Cooley-Tukey radix-p FFT-type approach. The method utilizes a decomposition of the nDK- transform into a product of original nD discrete Radon Transform and of a family parallel/independ 1DK-transforms. If the nDK-transform has a separable kernel, that again in this case our approach leads to decrease of multiplicative complexity by factor of n compared to the tow/column separable Cooley-Tukey p-radix approach.
Text Content Pushing Technology Research Based on Location and Topic
NASA Astrophysics Data System (ADS)
Wei, Dongqi; Wei, Jianxin; Wumuti, Naheman; Jiang, Baode
2016-11-01
In the field, geological workers usually want to obtain related geological background information in the working area quickly and accurately. This information exists in the massive geological data, text data is described in natural language accounted for a large proportion. This paper studied location information extracting method in the mass text data; proposed a geographic location—geological content—geological content related algorithm based on Spark and Mapreduce2, finally classified content by using KNN, and built the content pushing system based on location and topic. It is running in the geological survey cloud, and we have gained a good effect in testing by using real geological data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murtazaev, A. K.; Ramazanov, M. K., E-mail: sheikh77@mail.ru; Kassan-Ogly, F. A.
2015-01-15
Phase transitions in the antiferromagnetic Ising model on a body-centered cubic lattice are studied on the basis of the replica algorithm by the Monte Carlo method and histogram analysis taking into account the interaction of next-to-nearest neighbors. The phase diagram of the dependence of the critical temperature on the intensity of interaction of the next-to-nearest neighbors is constructed. It is found that a second-order phase transition is realized in this model in the investigated interval of the intensities of interaction of next-to-nearest neighbors.
Nanoscale characterization and local piezoelectric properties of lead-free KNN-LT-LS thin films
NASA Astrophysics Data System (ADS)
Abazari, M.; Choi, T.; Cheong, S.-W.; Safari, A.
2010-01-01
We report the observation of domain structure and piezoelectric properties of pure and Mn-doped (K0.44,Na0.52,Li0.04)(Nb0.84,Ta0.1,Sb0.06)O3 (KNN-LT-LS) thin films on SrTiO3 substrates. It is revealed that, using piezoresponse force microscopy, ferroelectric domain structure in such 500 nm thin films comprised of primarily 180° domains. This was in accordance with the tetragonal structure of the films, confirmed by relative permittivity measurements and x-ray diffraction patterns. Effective piezoelectric coefficient (d33) of the films were calculated using piezoelectric displacement curves and shown to be ~53 pm V-1 for pure KNN-LT-LS thin films. This value is among the highest values reported for an epitaxial lead-free thin film and shows a great potential for KNN-LT-LS to serve as an alternative to PZT thin films in future applications.
NASA Astrophysics Data System (ADS)
Park, Sang-Gon; Jeong, Dong-Seok
2000-12-01
In this paper, we propose a fast adaptive diamond search algorithm (FADS) for block matching motion estimation. Many fast motion estimation algorithms reduce the computational complexity by the UESA (Unimodal Error Surface Assumption) where the matching error monotonically increases as the search moves away from the global minimum point. Recently, many fast BMAs (Block Matching Algorithms) make use of the fact that global minimum points in real world video sequences are centered at the position of zero motion. But these BMAs, especially in large motion, are easily trapped into the local minima and result in poor matching accuracy. So, we propose a new motion estimation algorithm using the spatial correlation among the neighboring blocks. We move the search origin according to the motion vectors of the spatially neighboring blocks and their MAEs (Mean Absolute Errors). The computer simulation shows that the proposed algorithm has almost the same computational complexity with DS (Diamond Search), but enhances PSNR. Moreover, the proposed algorithm gives almost the same PSNR as that of FS (Full Search), even for the large motion with half the computational load.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schreiner, S.; Paschal, C.B.; Galloway, R.L.
Four methods of producing maximum intensity projection (MIP) images were studied and compared. Three of the projection methods differ in the interpolation kernel used for ray tracing. The interpolation kernels include nearest neighbor interpolation, linear interpolation, and cubic convolution interpolation. The fourth projection method is a voxel projection method that is not explicitly a ray-tracing technique. The four algorithms` performance was evaluated using a computer-generated model of a vessel and using real MR angiography data. The evaluation centered around how well an algorithm transferred an object`s width to the projection plane. The voxel projection algorithm does not suffer from artifactsmore » associated with the nearest neighbor algorithm. Also, a speed-up in the calculation of the projection is seen with the voxel projection method. Linear interpolation dramatically improves the transfer of width information from the 3D MRA data set over both nearest neighbor and voxel projection methods. Even though the cubic convolution interpolation kernel is theoretically superior to the linear kernel, it did not project widths more accurately than linear interpolation. A possible advantage to the nearest neighbor interpolation is that the size of small vessels tends to be exaggerated in the projection plane, thereby increasing their visibility. The results confirm that the way in which an MIP image is constructed has a dramatic effect on information contained in the projection. The construction method must be chosen with the knowledge that the clinical information in the 2D projections in general will be different from that contained in the original 3D data volume. 27 refs., 16 figs., 2 tabs.« less
NASA Astrophysics Data System (ADS)
Abazari, M.; Safari, A.
2009-05-01
We report the effects of Ba, Ti, and Mn dopants on ferroelectric polarization and leakage current of (K0.44Na0.52Li0.04)(Nb0.84Ta0.1Sb0.06)O3 (KNN-LT-LS) thin films deposited by pulsed laser deposition. It is shown that donor dopants such as Ba2+, which increased the resistivity in bulk KNN-LT-LS, had an opposite effect in the thin film. Ti4+ as an acceptor B-site dopant reduces the leakage current by an order of magnitude, while the polarization values showed a slight degradation. Mn4+, however, was found to effectively suppress the leakage current by over two orders of magnitude while enhancing the polarization, with 15 and 23 μC/cm2 remanent and saturated polarization, whose values are ˜70% and 82% of the reported values for bulk composition. This phenomenon has been associated with the dual effect of Mn4+ in KNN-LT-LS thin film, by substituting both A- and B-site cations. A detailed description on how each dopant affects the concentrations of vacancies in the lattice is presented. Mn-doped KNN-LT-LS thin films are shown to be a promising candidate for lead-free thin films and applications.
Real-time Interpolation for True 3-Dimensional Ultrasound Image Volumes
Ji, Songbai; Roberts, David W.; Hartov, Alex; Paulsen, Keith D.
2013-01-01
We compared trilinear interpolation to voxel nearest neighbor and distance-weighted algorithms for fast and accurate processing of true 3-dimensional ultrasound (3DUS) image volumes. In this study, the computational efficiency and interpolation accuracy of the 3 methods were compared on the basis of a simulated 3DUS image volume, 34 clinical 3DUS image volumes from 5 patients, and 2 experimental phantom image volumes. We show that trilinear interpolation improves interpolation accuracy over both the voxel nearest neighbor and distance-weighted algorithms yet achieves real-time computational performance that is comparable to the voxel nearest neighbor algrorithm (1–2 orders of magnitude faster than the distance-weighted algorithm) as well as the fastest pixel-based algorithms for processing tracked 2-dimensional ultrasound images (0.035 seconds per 2-dimesional cross-sectional image [76,800 pixels interpolated, or 0.46 ms/1000 pixels] and 1.05 seconds per full volume with a 1-mm3 voxel size [4.6 million voxels interpolated, or 0.23 ms/1000 voxels]). On the basis of these results, trilinear interpolation is recommended as a fast and accurate interpolation method for rectilinear sampling of 3DUS image acquisitions, which is required to facilitate subsequent processing and display during operating room procedures such as image-guided neurosurgery. PMID:21266563
Real-time interpolation for true 3-dimensional ultrasound image volumes.
Ji, Songbai; Roberts, David W; Hartov, Alex; Paulsen, Keith D
2011-02-01
We compared trilinear interpolation to voxel nearest neighbor and distance-weighted algorithms for fast and accurate processing of true 3-dimensional ultrasound (3DUS) image volumes. In this study, the computational efficiency and interpolation accuracy of the 3 methods were compared on the basis of a simulated 3DUS image volume, 34 clinical 3DUS image volumes from 5 patients, and 2 experimental phantom image volumes. We show that trilinear interpolation improves interpolation accuracy over both the voxel nearest neighbor and distance-weighted algorithms yet achieves real-time computational performance that is comparable to the voxel nearest neighbor algrorithm (1-2 orders of magnitude faster than the distance-weighted algorithm) as well as the fastest pixel-based algorithms for processing tracked 2-dimensional ultrasound images (0.035 seconds per 2-dimesional cross-sectional image [76,800 pixels interpolated, or 0.46 ms/1000 pixels] and 1.05 seconds per full volume with a 1-mm(3) voxel size [4.6 million voxels interpolated, or 0.23 ms/1000 voxels]). On the basis of these results, trilinear interpolation is recommended as a fast and accurate interpolation method for rectilinear sampling of 3DUS image acquisitions, which is required to facilitate subsequent processing and display during operating room procedures such as image-guided neurosurgery.
Random ensemble learning for EEG classification.
Hosseini, Mohammad-Parsa; Pompili, Dario; Elisevich, Kost; Soltanian-Zadeh, Hamid
2018-01-01
Real-time detection of seizure activity in epilepsy patients is critical in averting seizure activity and improving patients' quality of life. Accurate evaluation, presurgical assessment, seizure prevention, and emergency alerts all depend on the rapid detection of seizure onset. A new method of feature selection and classification for rapid and precise seizure detection is discussed wherein informative components of electroencephalogram (EEG)-derived data are extracted and an automatic method is presented using infinite independent component analysis (I-ICA) to select independent features. The feature space is divided into subspaces via random selection and multichannel support vector machines (SVMs) are used to classify these subspaces. The result of each classifier is then combined by majority voting to establish the final output. In addition, a random subspace ensemble using a combination of SVM, multilayer perceptron (MLP) neural network and an extended k-nearest neighbors (k-NN), called extended nearest neighbor (ENN), is developed for the EEG and electrocorticography (ECoG) big data problem. To evaluate the solution, a benchmark ECoG of eight patients with temporal and extratemporal epilepsy was implemented in a distributed computing framework as a multitier cloud-computing architecture. Using leave-one-out cross-validation, the accuracy, sensitivity, specificity, and both false positive and false negative ratios of the proposed method were found to be 0.97, 0.98, 0.96, 0.04, and 0.02, respectively. Application of the solution to cases under investigation with ECoG has also been effected to demonstrate its utility. Copyright © 2017 Elsevier B.V. All rights reserved.
A Partitioning Algorithm for Block-Diagonal Matrices With Overlap
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guy Antoine Atenekeng Kahou; Laura Grigori; Masha Sosonkina
2008-02-02
We present a graph partitioning algorithm that aims at partitioning a sparse matrix into a block-diagonal form, such that any two consecutive blocks overlap. We denote this form of the matrix as the overlapped block-diagonal matrix. The partitioned matrix is suitable for applying the explicit formulation of Multiplicative Schwarz preconditioner (EFMS) described in [3]. The graph partitioning algorithm partitions the graph of the input matrix into K partitions, such that every partition {Omega}{sub i} has at most two neighbors {Omega}{sub i-1} and {Omega}{sub i+1}. First, an ordering algorithm, such as the reverse Cuthill-McKee algorithm, that reduces the matrix profile ismore » performed. An initial overlapped block-diagonal partition is obtained from the profile of the matrix. An iterative strategy is then used to further refine the partitioning by allowing nodes to be transferred between neighboring partitions. Experiments are performed on matrices arising from real-world applications to show the feasibility and usefulness of this approach.« less
SibRank: Signed bipartite network analysis for neighbor-based collaborative ranking
NASA Astrophysics Data System (ADS)
Shams, Bita; Haratizadeh, Saman
2016-09-01
Collaborative ranking is an emerging field of recommender systems that utilizes users' preference data rather than rating values. Unfortunately, neighbor-based collaborative ranking has gained little attention despite its more flexibility and justifiability. This paper proposes a novel framework, called SibRank that seeks to improve the state of the art neighbor-based collaborative ranking methods. SibRank represents users' preferences as a signed bipartite network, and finds similar users, through a novel personalized ranking algorithm in signed networks.
Efficient Estimation of Mutual Information for Strongly Dependent Variables
2015-05-11
the two possibilities: for a fixed dimension d and near- est neighbor parameter k, we find a constant ↵ k,d , such that if V̄ (i)/V (i) < ↵ k,d , then...also compare the results to several baseline estima- tors: KSG (Kraskov et al., 2004), generalized near- est neighbor graph (GNN) (Pál et al., 2010...Amaury Lendasse, and Francesco Corona. A boundary corrected expansion of the moments of near- est neighbor distributions. Random Struct. Algorithms
Nearest Neighbor Searching in Binary Search Trees: Simulation of a Multiprocessor System.
ERIC Educational Resources Information Center
Stewart, Mark; Willett, Peter
1987-01-01
Describes the simulation of a nearest neighbor searching algorithm for document retrieval using a pool of microprocessors. Three techniques are described which allow parallel searching of a binary search tree as well as a PASCAL-based system, PASSIM, which can simulate these techniques. Fifty-six references are provided. (Author/LRW)
Prediction of carbonate rock type from NMR responses using data mining techniques
NASA Astrophysics Data System (ADS)
Gonçalves, Eduardo Corrêa; da Silva, Pablo Nascimento; Silveira, Carla Semiramis; Carneiro, Giovanna; Domingues, Ana Beatriz; Moss, Adam; Pritchard, Tim; Plastino, Alexandre; Azeredo, Rodrigo Bagueira de Vasconcellos
2017-05-01
Recent studies have indicated that the accurate identification of carbonate rock types in a reservoir can be employed as a preliminary step to enhance the effectiveness of petrophysical property modeling. Furthermore, rock typing activity has been shown to be of key importance in several steps of formation evaluation, such as the study of sedimentary series, reservoir zonation and well-to-well correlation. In this paper, a methodology based exclusively on the analysis of 1H-NMR (Nuclear Magnetic Resonance) relaxation responses - using data mining algorithms - is evaluated to perform the automatic classification of carbonate samples according to their rock type. We analyze the effectiveness of six different classification algorithms (k-NN, Naïve Bayes, C4.5, Random Forest, SMO and Multilayer Perceptron) and two data preprocessing strategies (discretization and feature selection). The dataset used in this evaluation is formed by 78 1H-NMR T2 distributions of fully brine-saturated rock samples from six different rock type classes. The experiments reveal that the combination of preprocessing strategies with classification algorithms is able to achieve a prediction accuracy of 97.4%.
Quantum realization of the nearest neighbor value interpolation method for INEQR
NASA Astrophysics Data System (ADS)
Zhou, RiGui; Hu, WenWen; Luo, GaoFeng; Liu, XingAo; Fan, Ping
2018-07-01
This paper presents the nearest neighbor value (NNV) interpolation algorithm for the improved novel enhanced quantum representation of digital images (INEQR). It is necessary to use interpolation in image scaling because there is an increase or a decrease in the number of pixels. The difference between the proposed scheme and nearest neighbor interpolation is that the concept applied, to estimate the missing pixel value, is guided by the nearest value rather than the distance. Firstly, a sequence of quantum operations is predefined, such as cyclic shift transformations and the basic arithmetic operations. Then, the feasibility of the nearest neighbor value interpolation method for quantum image of INEQR is proven using the previously designed quantum operations. Furthermore, quantum image scaling algorithm in the form of circuits of the NNV interpolation for INEQR is constructed for the first time. The merit of the proposed INEQR circuit lies in their low complexity, which is achieved by utilizing the unique properties of quantum superposition and entanglement. Finally, simulation-based experimental results involving different classical images and ratios (i.e., conventional or non-quantum) are simulated based on the classical computer's MATLAB 2014b software, which demonstrates that the proposed interpolation method has higher performances in terms of high resolution compared to the nearest neighbor and bilinear interpolation.
Community detection using preference networks
NASA Astrophysics Data System (ADS)
Tasgin, Mursel; Bingol, Haluk O.
2018-04-01
Community detection is the task of identifying clusters or groups of nodes in a network where nodes within the same group are more connected with each other than with nodes in different groups. It has practical uses in identifying similar functions or roles of nodes in many biological, social and computer networks. With the availability of very large networks in recent years, performance and scalability of community detection algorithms become crucial, i.e. if time complexity of an algorithm is high, it cannot run on large networks. In this paper, we propose a new community detection algorithm, which has a local approach and is able to run on large networks. It has a simple and effective method; given a network, algorithm constructs a preference network of nodes where each node has a single outgoing edge showing its preferred node to be in the same community with. In such a preference network, each connected component is a community. Selection of the preferred node is performed using similarity based metrics of nodes. We use two alternatives for this purpose which can be calculated in 1-neighborhood of nodes, i.e. number of common neighbors of selector node and its neighbors and, the spread capability of neighbors around the selector node which is calculated by the gossip algorithm of Lind et.al. Our algorithm is tested on both computer generated LFR networks and real-life networks with ground-truth community structure. It can identify communities accurately in a fast way. It is local, scalable and suitable for distributed execution on large networks.
Hakimzadeh, Neda; Parastar, Hadi; Fattahi, Mohammad
2014-01-24
In this study, multivariate curve resolution (MCR) and multivariate classification methods are proposed to develop a new chemometric strategy for comprehensive analysis of high-performance liquid chromatography-diode array absorbance detection (HPLC-DAD) fingerprints of sixty Salvia reuterana samples from five different geographical regions. Different chromatographic problems occurred during HPLC-DAD analysis of S. reuterana samples, such as baseline/background contribution and noise, low signal-to-noise ratio (S/N), asymmetric peaks, elution time shifts, and peak overlap are handled using the proposed strategy. In this way, chromatographic fingerprints of sixty samples are properly segmented to ten common chromatographic regions using local rank analysis and then, the corresponding segments are column-wise augmented for subsequent MCR analysis. Extended multivariate curve resolution-alternating least squares (MCR-ALS) is used to obtain pure component profiles in each segment. In general, thirty-one chemical components were resolved using MCR-ALS in sixty S. reuterana samples and the lack of fit (LOF) values of MCR-ALS models were below 10.0% in all cases. Pure spectral profiles are considered for identification of chemical components by comparing their resolved spectra with the standard ones and twenty-four components out of thirty-one components were identified. Additionally, pure elution profiles are used to obtain relative concentrations of chemical components in different samples for multivariate classification analysis by principal component analysis (PCA) and k-nearest neighbors (kNN). Inspection of the PCA score plot (explaining 76.1% of variance accounted for three PCs) showed that S. reuterana samples belong to four clusters. The degree of class separation (DCS) which quantifies the distance separating clusters in relation to the scatter within each cluster is calculated for four clusters and it was in the range of 1.6-5.8. These results are then confirmed by kNN. In addition, according to the PCA loading plot and kNN dendrogram of thirty-one variables, five chemical constituents of luteolin-7-o-glucoside, salvianolic acid D, rosmarinic acid, lithospermic acid and trijuganone A are identified as the most important variables (i.e., chemical markers) for clusters discrimination. Finally, the effect of different chemical markers on samples differentiation is investigated using counter-propagation artificial neural network (CP-ANN) method. It is concluded that the proposed strategy can be successfully applied for comprehensive analysis of chromatographic fingerprints of complex natural samples. Copyright © 2013 Elsevier B.V. All rights reserved.
Improvement in synthesis of (K 0.5Na 0.5)NbO 3 powders by Ge 4+ acceptor doping
Zhao, Yajing; Chen, Yan; Chen, Kepi
2016-11-17
In this study, the effects of doping with GeO 2 on the synthesis temperature, phase structure and morphology of (K 0.5Na 0.5)NbO 3 (KNN) ceramic powders were studied using XRD and SEM. The results show that KNN powders with good crystallinity and compositional homogeneity can be obtained after calcination at up to 900°C for 2 h. Introducing 0.5 mol.% GeO 2 into the starting mixture improved the synthesis of the KNN powders and allowed the calcination temperature to be decreased to 800°C, which can be ascribed to the formation of the liquid phase during the synthesis.
Virtual viewpoint synthesis in multi-view video system
NASA Astrophysics Data System (ADS)
Li, Fang; Yang, Shiqiang
2005-07-01
In this paper, we present a virtual viewpoint video synthesis algorithm to satisfy the following three aims: low computing consuming; real time interpolation and acceptable video quality. In contrast with previous technologies, this method obtain incompletely 3D structure using neighbor video sources instead of getting total 3D information with all video sources, so that the computation is reduced greatly. So we demonstrate our interactive multi-view video synthesis algorithm in a personal computer. Furthermore, adopting the method of choosing feature points to build the correspondence between the frames captured by neighbor cameras, we need not require camera calibration. Finally, our method can be used when the angle between neighbor cameras is 25-30 degrees that it is much larger than common computer vision experiments. In this way, our method can be applied into many applications such as sports live, video conference, etc.
Chikh, Mohamed Amine; Saidi, Meryem; Settouti, Nesma
2012-10-01
The use of expert systems and artificial intelligence techniques in disease diagnosis has been increasing gradually. Artificial Immune Recognition System (AIRS) is one of the methods used in medical classification problems. AIRS2 is a more efficient version of the AIRS algorithm. In this paper, we used a modified AIRS2 called MAIRS2 where we replace the K- nearest neighbors algorithm with the fuzzy K-nearest neighbors to improve the diagnostic accuracy of diabetes diseases. The diabetes disease dataset used in our work is retrieved from UCI machine learning repository. The performances of the AIRS2 and MAIRS2 are evaluated regarding classification accuracy, sensitivity and specificity values. The highest classification accuracy obtained when applying the AIRS2 and MAIRS2 using 10-fold cross-validation was, respectively 82.69% and 89.10%.
NASA Astrophysics Data System (ADS)
Li, Chao; Yang, Sheng-Chao; Guo, Qiao-Sheng; Zheng, Kai-Yan; Wang, Ping-Li; Meng, Zhen-Gui
2016-01-01
A combination of Fourier transform infrared spectroscopy with chemometrics tools provided an approach for studying Marsdenia tenacissima according to its geographical origin. A total of 128 M. tenacissima samples from four provinces in China were analyzed with FTIR spectroscopy. Six pattern recognition methods were used to construct the discrimination models: support vector machine-genetic algorithms, support vector machine-particle swarm optimization, K-nearest neighbors, radial basis function neural network, random forest and support vector machine-grid search. Experimental results showed that K-nearest neighbors was superior to other mathematical algorithms after data were preprocessed with wavelet de-noising, with a discrimination rate of 100% in both the training and prediction sets. This study demonstrated that FTIR spectroscopy coupled with K-nearest neighbors could be successfully applied to determine the geographical origins of M. tenacissima samples, thereby providing reliable authentication in a rapid, cheap and noninvasive way.
Compositional inhomogeneityand segregation in (K 0.5Na 0.5)NbO 3 ceramics
Chen, Kepi; Tang, Jing; Chen, Yan
2016-03-11
The effects of the calcination temperature of (K 0.5Na 0.5)NbO 3 (KNN) powder on the sintering and piezoelectric properties of KNN ceramics have been investigated in this report. KNN powders are synthesized via the solid-state approach. Scanning electron microscopy and X-ray diffraction characterizations indicate that the incomplete reaction at 700 °C and 750 °C calcination results in the compositional inhomogeneity of the K-rich and Na-rich phases while the orthorhombic single phase is obtained after calcination at 900 °C. During the sintering, the presence of the liquid K-rich phase due to the lower melting point has a significant impact on themore » densification, the abnormal grain growth and the deteriorated piezoelectric properties. From the standpoint of piezoelectric properties, the optimal calcination temperature obtained for KNN ceramics calcined at this temperature is determined to be 800 °C, with piezoelectric constant d 33=128.3 pC/N, planar electromechanical coupling coefficient k p=32.2%, mechanical quality factor Q m=88, and dielectric loss tan δ=2.1%.« less
Shao, Xiaolong; Li, Hui; Wang, Nan; Zhang, Qiang
2015-10-21
An electronic nose (e-nose) was used to characterize sesame oils processed by three different methods (hot-pressed, cold-pressed, and refined), as well as blends of the sesame oils and soybean oil. Seven classification and prediction methods, namely PCA, LDA, PLS, KNN, SVM, LASSO and RF, were used to analyze the e-nose data. The classification accuracy and MAUC were employed to evaluate the performance of these methods. The results indicated that sesame oils processed with different methods resulted in different sensor responses, with cold-pressed sesame oil producing the strongest sensor signals, followed by the hot-pressed sesame oil. The blends of pressed sesame oils with refined sesame oil were more difficult to be distinguished than the blends of pressed sesame oils and refined soybean oil. LDA, KNN, and SVM outperformed the other classification methods in distinguishing sesame oil blends. KNN, LASSO, PLS, and SVM (with linear kernel), and RF models could adequately predict the adulteration level (% of added soybean oil) in the sesame oil blends. Among the prediction models, KNN with k = 1 and 2 yielded the best prediction results.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Kepi; Tang, Jing; Chen, Yan
The effects of the calcination temperature of (K 0.5Na 0.5)NbO 3 (KNN) powder on the sintering and piezoelectric properties of KNN ceramics have been investigated in this report. KNN powders are synthesized via the solid-state approach. Scanning electron microscopy and X-ray diffraction characterizations indicate that the incomplete reaction at 700 °C and 750 °C calcination results in the compositional inhomogeneity of the K-rich and Na-rich phases while the orthorhombic single phase is obtained after calcination at 900 °C. During the sintering, the presence of the liquid K-rich phase due to the lower melting point has a significant impact on themore » densification, the abnormal grain growth and the deteriorated piezoelectric properties. From the standpoint of piezoelectric properties, the optimal calcination temperature obtained for KNN ceramics calcined at this temperature is determined to be 800 °C, with piezoelectric constant d 33=128.3 pC/N, planar electromechanical coupling coefficient k p=32.2%, mechanical quality factor Q m=88, and dielectric loss tan δ=2.1%.« less
A Parallel Algorithm for Contact in a Finite Element Hydrocode
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pierce, Timothy G.
A parallel algorithm is developed for contact/impact of multiple three dimensional bodies undergoing large deformation. As time progresses the relative positions of contact between the multiple bodies changes as collision and sliding occurs. The parallel algorithm is capable of tracking these changes and enforcing an impenetrability constraint and momentum transfer across the surfaces in contact. Portions of the various surfaces of the bodies are assigned to the processors of a distributed-memory parallel machine in an arbitrary fashion, known as the primary decomposition. A secondary, dynamic decomposition is utilized to bring opposing sections of the contacting surfaces together on the samemore » processors, so that opposing forces may be balanced and the resultant deformation of the bodies calculated. The secondary decomposition is accomplished and updated using only local communication with a limited subset of neighbor processors. Each processor represents both a domain of the primary decomposition and a domain of the secondary, or contact, decomposition. Thus each processor has four sets of neighbor processors: (a) those processors which represent regions adjacent to it in the primary decomposition, (b) those processors which represent regions adjacent to it in the contact decomposition, (c) those processors which send it the data from which it constructs its contact domain, and (d) those processors to which it sends its primary domain data, from which they construct their contact domains. The latter three of these neighbor sets change dynamically as the simulation progresses. By constraining all communication to these sets of neighbors, all global communication, with its attendant nonscalable performance, is avoided. A set of tests are provided to measure the degree of scalability achieved by this algorithm on up to 1024 processors. Issues related to the operating system of the test platform which lead to some degradation of the results are analyzed. This algorithm has been implemented as the contact capability of the ALE3D multiphysics code, and is currently in production use.« less
Predictive modelling for startup and investor relationship based on crowdfunding platform data
NASA Astrophysics Data System (ADS)
Alamsyah, Andry; Buono Asto Nugroho, Tri
2018-03-01
Crowdfunding platform is a place where startup shows off publicly their idea for the purpose to get their project funded. Crowdfunding platform such as Kickstarter are becoming popular today, it provides the efficient way for startup to get funded without liabilities, it also provides variety project category that can be participated. There is an available safety procedure to ensure achievable low-risk environment. The startup promoted project must accomplish their funded goal target. If they fail to reach the target, then there is no investment activity take place. It motivates startup to be more active to promote or disseminate their project idea and it also protect investor from losing money. The study objective is to predict the successfulness of proposed project and mapping investor trend using data mining framework. To achieve the objective, we proposed 3 models. First model is to predict whether a project is going to be successful or failed using K-Nearest Neighbour (KNN). Second model is to predict the number of successful project using Artificial Neural Network (ANN). Third model is to map the trend of investor in investing the project using K-Means clustering algorithm. KNN gives 99.04% model accuracy, while ANN best configuration gives 16-14-1 neuron layers and 0.2 learning rate, and K-Means gives 6 best separation clusters. The results of those models can help startup or investor to make decision regarding startup investment.
NASA Astrophysics Data System (ADS)
Oung, Qi Wei; Nisha Basah, Shafriza; Muthusamy, Hariharan; Vijean, Vikneswaran; Lee, Hoileong
2018-03-01
Parkinson’s disease (PD) is one type of progressive neurodegenerative disease known as motor system syndrome, which is due to the death of dopamine-generating cells, a region of the human midbrain. PD normally affects people over 60 years of age, which at present has influenced a huge part of worldwide population. Lately, many researches have shown interest into the connection between PD and speech disorders. Researches have revealed that speech signals may be a suitable biomarker for distinguishing between people with Parkinson’s (PWP) from healthy subjects. Therefore, early diagnosis of PD through the speech signals can be considered for this aim. In this research, the speech data are acquired based on speech behaviour as the biomarker for differentiating PD severity levels (mild and moderate) from healthy subjects. Feature extraction algorithms applied are Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), and Weighted Linear Prediction Cepstral Coefficients (WLPCC). For classification, two types of classifiers are used: k-Nearest Neighbour (KNN) and Probabilistic Neural Network (PNN). The experimental results demonstrated that PNN classifier and KNN classifier achieve the best average classification performance of 92.63% and 88.56% respectively through 10-fold cross-validation measures. Favourably, the suggested techniques have the possibilities of becoming a new choice of promising tools for the PD detection with tremendous performance.
A multiple-point spatially weighted k-NN method for object-based classification
NASA Astrophysics Data System (ADS)
Tang, Yunwei; Jing, Linhai; Li, Hui; Atkinson, Peter M.
2016-10-01
Object-based classification, commonly referred to as object-based image analysis (OBIA), is now commonly regarded as able to produce more appealing classification maps, often of greater accuracy, than pixel-based classification and its application is now widespread. Therefore, improvement of OBIA using spatial techniques is of great interest. In this paper, multiple-point statistics (MPS) is proposed for object-based classification enhancement in the form of a new multiple-point k-nearest neighbour (k-NN) classification method (MPk-NN). The proposed method first utilises a training image derived from a pre-classified map to characterise the spatial correlation between multiple points of land cover classes. The MPS borrows spatial structures from other parts of the training image, and then incorporates this spatial information, in the form of multiple-point probabilities, into the k-NN classifier. Two satellite sensor images with a fine spatial resolution were selected to evaluate the new method. One is an IKONOS image of the Beijing urban area and the other is a WorldView-2 image of the Wolong mountainous area, in China. The images were object-based classified using the MPk-NN method and several alternatives, including the k-NN, the geostatistically weighted k-NN, the Bayesian method, the decision tree classifier (DTC), and the support vector machine classifier (SVM). It was demonstrated that the new spatial weighting based on MPS can achieve greater classification accuracy relative to the alternatives and it is, thus, recommended as appropriate for object-based classification.
Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin
2013-11-13
A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results.
Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin
2013-01-01
A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results. PMID:24233027
A Comparative Study with RapidMiner and WEKA Tools over some Classification Techniques for SMS Spam
NASA Astrophysics Data System (ADS)
Foozy, Cik Feresa Mohd; Ahmad, Rabiah; Faizal Abdollah, M. A.; Chai Wen, Chuah
2017-08-01
SMS Spamming is a serious attack that can manipulate the use of the SMS by spreading the advertisement in bulk. By sending the unwanted SMS that contain advertisement can make the users feeling disturb and this against the privacy of the mobile users. To overcome these issues, many studies have proposed to detect SMS Spam by using data mining tools. This paper will do a comparative study using five machine learning techniques such as Naïve Bayes, K-NN (K-Nearest Neighbour Algorithm), Decision Tree, Random Forest and Decision Stumps to observe the accuracy result between RapidMiner and WEKA for dataset SMS Spam UCI Machine Learning repository.
An algorithm for spatial heirarchy clustering
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Velasco, F. R. D.
1981-01-01
A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.
Zhao, Meng; Ding, Baocang
2015-03-01
This paper considers the distributed model predictive control (MPC) of nonlinear large-scale systems with dynamically decoupled subsystems. According to the coupled state in the overall cost function of centralized MPC, the neighbors are confirmed and fixed for each subsystem, and the overall objective function is disassembled into each local optimization. In order to guarantee the closed-loop stability of distributed MPC algorithm, the overall compatibility constraint for centralized MPC algorithm is decomposed into each local controller. The communication between each subsystem and its neighbors is relatively low, only the current states before optimization and the optimized input variables after optimization are being transferred. For each local controller, the quasi-infinite horizon MPC algorithm is adopted, and the global closed-loop system is proven to be exponentially stable. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Multidimensional Optimization of Signal Space Distance Parameters in WLAN Positioning
Brković, Milenko; Simić, Mirjana
2014-01-01
Accurate indoor localization of mobile users is one of the challenging problems of the last decade. Besides delivering high speed Internet, Wireless Local Area Network (WLAN) can be used as an effective indoor positioning system, being competitive both in terms of accuracy and cost. Among the localization algorithms, nearest neighbor fingerprinting algorithms based on Received Signal Strength (RSS) parameter have been extensively studied as an inexpensive solution for delivering indoor Location Based Services (LBS). In this paper, we propose the optimization of the signal space distance parameters in order to improve precision of WLAN indoor positioning, based on nearest neighbor fingerprinting algorithms. Experiments in a real WLAN environment indicate that proposed optimization leads to substantial improvements of the localization accuracy. Our approach is conceptually simple, is easy to implement, and does not require any additional hardware. PMID:24757443
Foliar and woody materials discriminated using terrestrial LiDAR in a mixed natural forest
NASA Astrophysics Data System (ADS)
Zhu, Xi; Skidmore, Andrew K.; Darvishzadeh, Roshanak; Niemann, K. Olaf; Liu, Jing; Shi, Yifang; Wang, Tiejun
2018-02-01
Separation of foliar and woody materials using remotely sensed data is crucial for the accurate estimation of leaf area index (LAI) and woody biomass across forest stands. In this paper, we present a new method to accurately separate foliar and woody materials using terrestrial LiDAR point clouds obtained from ten test sites in a mixed forest in Bavarian Forest National Park, Germany. Firstly, we applied and compared an adaptive radius near-neighbor search algorithm with a fixed radius near-neighbor search method in order to obtain both radiometric and geometric features derived from terrestrial LiDAR point clouds. Secondly, we used a random forest machine learning algorithm to classify foliar and woody materials and examined the impact of understory and slope on the classification accuracy. An average overall accuracy of 84.4% (Kappa = 0.75) was achieved across all experimental plots. The adaptive radius near-neighbor search method outperformed the fixed radius near-neighbor search method. The classification accuracy was significantly higher when the combination of both radiometric and geometric features was utilized. The analysis showed that increasing slope and understory coverage had a significant negative effect on the overall classification accuracy. Our results suggest that the utilization of the adaptive radius near-neighbor search method coupling both radiometric and geometric features has the potential to accurately discriminate foliar and woody materials from terrestrial LiDAR data in a mixed natural forest.
Zhu, Hao; Rusyn, Ivan; Richard, Ann; Tropsha, Alexander
2008-01-01
Background To develop efficient approaches for rapid evaluation of chemical toxicity and human health risk of environmental compounds, the National Toxicology Program (NTP) in collaboration with the National Center for Chemical Genomics has initiated a project on high-throughput screening (HTS) of environmental chemicals. The first HTS results for a set of 1,408 compounds tested for their effects on cell viability in six different cell lines have recently become available via PubChem. Objectives We have explored these data in terms of their utility for predicting adverse health effects of the environmental agents. Methods and results Initially, the classification k nearest neighbor (kNN) quantitative structure–activity relationship (QSAR) modeling method was applied to the HTS data only, for a curated data set of 384 compounds. The resulting models had prediction accuracies for training, test (containing 275 compounds together), and external validation (109 compounds) sets as high as 89%, 71%, and 74%, respectively. We then asked if HTS results could be of value in predicting rodent carcinogenicity. We identified 383 compounds for which data were available from both the Berkeley Carcinogenic Potency Database and NTP–HTS studies. We found that compounds classified by HTS as “actives” in at least one cell line were likely to be rodent carcinogens (sensitivity 77%); however, HTS “inactives” were far less informative (specificity 46%). Using chemical descriptors only, kNN QSAR modeling resulted in 62.3% prediction accuracy for rodent carcinogenicity applied to this data set. Importantly, the prediction accuracy of the model was significantly improved (72.7%) when chemical descriptors were augmented by HTS data, which were regarded as biological descriptors. Conclusions Our studies suggest that combining NTP–HTS profiles with conventional chemical descriptors could considerably improve the predictive power of computational approaches in toxicology. PMID:18414635
A Semi-parametric Multivariate Gap-filling Model for Eddy Covariance Latent Heat Flux
NASA Astrophysics Data System (ADS)
Li, M.; Chen, Y.
2010-12-01
Quantitative descriptions of latent heat fluxes are important to study the water and energy exchanges between terrestrial ecosystems and the atmosphere. The eddy covariance approaches have been recognized as the most reliable technique for measuring surface fluxes over time scales ranging from hours to years. However, unfavorable micrometeorological conditions, instrument failures, and applicable measurement limitations may cause inevitable flux gaps in time series data. Development and application of suitable gap-filling techniques are crucial to estimate long term fluxes. In this study, a semi-parametric multivariate gap-filling model was developed to fill latent heat flux gaps for eddy covariance measurements. Our approach combines the advantages of a multivariate statistical analysis (principal component analysis, PCA) and a nonlinear interpolation technique (K-nearest-neighbors, KNN). The PCA method was first used to resolve the multicollinearity relationships among various hydrometeorological factors, such as radiation, soil moisture deficit, LAI, and wind speed. The KNN method was then applied as a nonlinear interpolation tool to estimate the flux gaps as the weighted sum latent heat fluxes with the K-nearest distances in the PCs’ domain. Two years, 2008 and 2009, of eddy covariance and hydrometeorological data from a subtropical mixed evergreen forest (the Lien-Hua-Chih Site) were collected to calibrate and validate the proposed approach with artificial gaps after standard QC/QA procedures. The optimal K values and weighting factors were determined by the maximum likelihood test. The results of gap-filled latent heat fluxes conclude that developed model successful preserving energy balances of daily, monthly, and yearly time scales. Annual amounts of evapotranspiration from this study forest were 747 mm and 708 mm for 2008 and 2009, respectively. Nocturnal evapotranspiration was estimated with filled gaps and results are comparable with other studies. Seasonal and daily variability of latent heat fluxes were also discussed.
INFO-RNA--a fast approach to inverse RNA folding.
Busch, Anke; Backofen, Rolf
2006-08-01
The structure of RNA molecules is often crucial for their function. Therefore, secondary structure prediction has gained much interest. Here, we consider the inverse RNA folding problem, which means designing RNA sequences that fold into a given structure. We introduce a new algorithm for the inverse folding problem (INFO-RNA) that consists of two parts; a dynamic programming method for good initial sequences and a following improved stochastic local search that uses an effective neighbor selection method. During the initialization, we design a sequence that among all sequences adopts the given structure with the lowest possible energy. For the selection of neighbors during the search, we use a kind of look-ahead of one selection step applying an additional energy-based criterion. Afterwards, the pre-ordered neighbors are tested using the actual optimization criterion of minimizing the structure distance between the target structure and the mfe structure of the considered neighbor. We compared our algorithm to RNAinverse and RNA-SSD for artificial and biological test sets. Using INFO-RNA, we performed better than RNAinverse and in most cases, we gained better results than RNA-SSD, the probably best inverse RNA folding tool on the market. www.bioinf.uni-freiburg.de?Subpages/software.html.
Script-independent text line segmentation in freestyle handwritten documents.
Li, Yi; Zheng, Yefeng; Doermann, David; Jaeger, Stefan; Li, Yi
2008-08-01
Text line segmentation in freestyle handwritten documents remains an open document analysis problem. Curvilinear text lines and small gaps between neighboring text lines present a challenge to algorithms developed for machine printed or hand-printed documents. In this paper, we propose a novel approach based on density estimation and a state-of-the-art image segmentation technique, the level set method. From an input document image, we estimate a probability map, where each element represents the probability that the underlying pixel belongs to a text line. The level set method is then exploited to determine the boundary of neighboring text lines by evolving an initial estimate. Unlike connected component based methods ( [1], [2] for example), the proposed algorithm does not use any script-specific knowledge. Extensive quantitative experiments on freestyle handwritten documents with diverse scripts, such as Arabic, Chinese, Korean, and Hindi, demonstrate that our algorithm consistently outperforms previous methods [1]-[3]. Further experiments show the proposed algorithm is robust to scale change, rotation, and noise.
Farran, Bassam; Channanath, Arshad Mohamed; Behbehani, Kazem; Thanaraj, Thangavel Alphonse
2013-05-14
We build classification models and risk assessment tools for diabetes, hypertension and comorbidity using machine-learning algorithms on data from Kuwait. We model the increased proneness in diabetic patients to develop hypertension and vice versa. We ascertain the importance of ethnicity (and natives vs expatriate migrants) and of using regional data in risk assessment. Retrospective cohort study. Four machine-learning techniques were used: logistic regression, k-nearest neighbours (k-NN), multifactor dimensionality reduction and support vector machines. The study uses fivefold cross validation to obtain generalisation accuracies and errors. Kuwait Health Network (KHN) that integrates data from primary health centres and hospitals in Kuwait. 270 172 hospital visitors (of which, 89 858 are diabetic, 58 745 hypertensive and 30 522 comorbid) comprising Kuwaiti natives, Asian and Arab expatriates. Incident type 2 diabetes, hypertension and comorbidity. Classification accuracies of >85% (for diabetes) and >90% (for hypertension) are achieved using only simple non-laboratory-based parameters. Risk assessment tools based on k-NN classification models are able to assign 'high' risk to 75% of diabetic patients and to 94% of hypertensive patients. Only 5% of diabetic patients are seen assigned 'low' risk. Asian-specific models and assessments perform even better. Pathological conditions of diabetes in the general population or in hypertensive population and those of hypertension are modelled. Two-stage aggregate classification models and risk assessment tools, built combining both the component models on diabetes (or on hypertension), perform better than individual models. Data on diabetes, hypertension and comorbidity from the cosmopolitan State of Kuwait are available for the first time. This enabled us to apply four different case-control models to assess risks. These tools aid in the preliminary non-intrusive assessment of the population. Ethnicity is seen significant to the predictive models. Risk assessments need to be developed using regional data as we demonstrate the applicability of the American Diabetes Association online calculator on data from Kuwait.
Kim, Minkyoung; Choi, Seung-Hoon; Kim, Junhyoung; Choi, Kihang; Shin, Jae-Min; Kang, Sang-Kee; Choi, Yun-Jaie; Jung, Dong Hyun
2009-11-01
This study describes the application of a density-based algorithm to clustering small peptide conformations after a molecular dynamics simulation. We propose a clustering method for small peptide conformations that enables adjacent clusters to be separated more clearly on the basis of neighbor density. Neighbor density means the number of neighboring conformations, so if a conformation has too few neighboring conformations, then it is considered as noise or an outlier and is excluded from the list of cluster members. With this approach, we can easily identify clusters in which the members are densely crowded in the conformational space, and we can safely avoid misclustering individual clusters linked by noise or outliers. Consideration of neighbor density significantly improves the efficiency of clustering of small peptide conformations sampled from molecular dynamics simulations and can be used for predicting peptide structures.
Shao, Xiaolong; Li, Hui; Wang, Nan; Zhang, Qiang
2015-01-01
An electronic nose (e-nose) was used to characterize sesame oils processed by three different methods (hot-pressed, cold-pressed, and refined), as well as blends of the sesame oils and soybean oil. Seven classification and prediction methods, namely PCA, LDA, PLS, KNN, SVM, LASSO and RF, were used to analyze the e-nose data. The classification accuracy and MAUC were employed to evaluate the performance of these methods. The results indicated that sesame oils processed with different methods resulted in different sensor responses, with cold-pressed sesame oil producing the strongest sensor signals, followed by the hot-pressed sesame oil. The blends of pressed sesame oils with refined sesame oil were more difficult to be distinguished than the blends of pressed sesame oils and refined soybean oil. LDA, KNN, and SVM outperformed the other classification methods in distinguishing sesame oil blends. KNN, LASSO, PLS, and SVM (with linear kernel), and RF models could adequately predict the adulteration level (% of added soybean oil) in the sesame oil blends. Among the prediction models, KNN with k = 1 and 2 yielded the best prediction results. PMID:26506350
An enhanced fast scanning algorithm for image segmentation
NASA Astrophysics Data System (ADS)
Ismael, Ahmed Naser; Yusof, Yuhanis binti
2015-12-01
Segmentation is an essential and important process that separates an image into regions that have similar characteristics or features. This will transform the image for a better image analysis and evaluation. An important benefit of segmentation is the identification of region of interest in a particular image. Various algorithms have been proposed for image segmentation and this includes the Fast Scanning algorithm which has been employed on food, sport and medical images. It scans all pixels in the image and cluster each pixel according to the upper and left neighbor pixels. The clustering process in Fast Scanning algorithm is performed by merging pixels with similar neighbor based on an identified threshold. Such an approach will lead to a weak reliability and shape matching of the produced segments. This paper proposes an adaptive threshold function to be used in the clustering process of the Fast Scanning algorithm. This function used the gray'value in the image's pixels and variance Also, the level of the image that is more the threshold are converted into intensity values between 0 and 1, and other values are converted into intensity values zero. The proposed enhanced Fast Scanning algorithm is realized on images of the public and private transportation in Iraq. Evaluation is later made by comparing the produced images of proposed algorithm and the standard Fast Scanning algorithm. The results showed that proposed algorithm is faster in terms the time from standard fast scanning.
Celik, Yuksel; Ulker, Erkan
2013-01-01
Marriage in honey bees optimization (MBO) is a metaheuristic optimization algorithm developed by inspiration of the mating and fertilization process of honey bees and is a kind of swarm intelligence optimizations. In this study we propose improved marriage in honey bees optimization (IMBO) by adding Levy flight algorithm for queen mating flight and neighboring for worker drone improving. The IMBO algorithm's performance and its success are tested on the well-known six unconstrained test functions and compared with other metaheuristic optimization algorithms.
2014-08-01
consensus algorithm called randomized gossip is more suitable [7, 8]. In asynchronous randomized gossip algorithms, pairs of neighboring nodes exchange...messages and perform updates in an asynchronous and unattended manner, and they also 1 The class of broadcast gossip algorithms [9, 10, 11, 12] are...dynamics [2] and asynchronous pairwise randomized gossip [7, 8], broadcast gossip algorithms do not require that nodes know the identities of their
Optical and Piezoelectric Study of KNN Solid Solutions Co-Doped with La-Mn and Eu-Fe.
Peña-Jiménez, Jesús-Alejandro; González, Federico; López-Juárez, Rigoberto; Hernández-Alcántara, José-Manuel; Camarillo, Enrique; Murrieta-Sánchez, Héctor; Pardo, Lorena; Villafuerte-Castrejón, María-Elena
2016-09-28
The solid-state method was used to synthesize single phase potassium-sodium niobate (KNN) co-doped with the La 3+ -Mn 4+ and Eu 3+ -Fe 3+ ion pairs. Structural determination of all studied solid solutions was accomplished by XRD and Rietveld refinement method. Electron paramagnetic resonance (EPR) studies were performed to determine the oxidation state of paramagnetic centers. Optical spectroscopy measurements, excitation, emission and decay lifetime were carried out for each solid solution. The present study reveals that doping KNN with La 3+ -Mn 4+ and Eu 3+ -Fe 3+ at concentrations of 0.5 mol % and 1 mol %, respectively, improves the ferroelectric and piezoelectric behavior and induce the generation of optical properties in the material for potential applications.
Optical and Piezoelectric Study of KNN Solid Solutions Co-Doped with La-Mn and Eu-Fe
Peña-Jiménez, Jesús-Alejandro; González, Federico; López-Juárez, Rigoberto; Hernández-Alcántara, José-Manuel; Camarillo, Enrique; Murrieta-Sánchez, Héctor; Pardo, Lorena; Villafuerte-Castrejón, María-Elena
2016-01-01
The solid-state method was used to synthesize single phase potassium-sodium niobate (KNN) co-doped with the La3+–Mn4+ and Eu3+–Fe3+ ion pairs. Structural determination of all studied solid solutions was accomplished by XRD and Rietveld refinement method. Electron paramagnetic resonance (EPR) studies were performed to determine the oxidation state of paramagnetic centers. Optical spectroscopy measurements, excitation, emission and decay lifetime were carried out for each solid solution. The present study reveals that doping KNN with La3+–Mn4+ and Eu3+–Fe3+ at concentrations of 0.5 mol % and 1 mol %, respectively, improves the ferroelectric and piezoelectric behavior and induce the generation of optical properties in the material for potential applications. PMID:28773925
A hybrid personalized data recommendation approach for geoscience data sharing
NASA Astrophysics Data System (ADS)
WANG, M.; Wang, J.
2016-12-01
Recommender systems are effective tools helping Internet users overcome information overloading. The two most widely used recommendation algorithms are collaborating filtering (CF) and content-based filtering (CBF). A number of recommender systems based on those two algorithms were developed for multimedia, online sells, and other domains. Each of the two algorithms has its advantages and shortcomings. Hybrid approaches that combine these two algorithms are better choices in many cases. In geoscience data sharing domain, where the items (datasets) are more informative (in space and time) and domain-specific, no recommender system is specialized for data users. This paper reports a dynamic weighted hybrid recommendation algorithm that combines CF and CBF for geoscience data sharing portal. We first derive users' ratings on items with their historical visiting time by Jenks Natural Break. In the CBF part, we incorporate the space, time, and subject information of geoscience datasets to compute item similarity. Predicted ratings were computed with k-NN method separately using CBF and CF, and then combined with weights. With training dataset we attempted to find the best model describing ideal weights and users' co-rating numbers. A logarithmic function was confirmed to be the best model. The model was then used to tune the weights of CF and CBF on user-item basis with test dataset. Evaluation results show that the dynamic weighted approach outperforms either solo CF or CBF approach in terms of Precision and Recall.
Sun, Minglei; Yang, Shaobao; Jiang, Jinling; Wang, Qiwei
2015-01-01
Pelger-Huet anomaly (PHA) and Pseudo Pelger-Huet anomaly (PPHA) are neutrophil with abnormal morphology. They have the bilobed or unilobed nucleus and excessive clumping chromatin. Currently, detection of this kind of cell mainly depends on the manual microscopic examination by a clinician, thus, the quality of detection is limited by the efficiency and a certain subjective consciousness of the clinician. In this paper, a detection method for PHA and PPHA is proposed based on karyomorphism and chromatin distribution features. Firstly, the skeleton of the nucleus is extracted using an augmented Fast Marching Method (AFMM) and width distribution is obtained through distance transform. Then, caryoplastin in the nucleus is extracted based on Speeded Up Robust Features (SURF) and a K-nearest-neighbor (KNN) classifier is constructed to analyze the features. Experiment shows that the sensitivity and specificity of this method achieved 87.5% and 83.33%, which means that the detection accuracy of PHA is acceptable. Meanwhile, the detection method should be helpful to the automatic morphological classification of blood cells.
Kharbach, Mourad; Kamal, Rabie; Mansouri, Mohammed Alaoui; Marmouzi, Ilias; Viaene, Johan; Cherrah, Yahia; Alaoui, Katim; Vercammen, Joeri; Bouklouze, Abdelaziz; Vander Heyden, Yvan
2018-10-15
This study investigated the effectiveness of SIFT-MS versus chemical profiling, both coupled to multivariate data analysis, to classify 95 Extra Virgin Argan Oils (EVAO), originating from five Moroccan Argan forest locations. The full scan option of SIFT-MS, is suitable to indicate the geographic origin of EVAO based on the fingerprints obtained using the three chemical ionization precursors (H 3 O + , NO + and O 2 + ). The chemical profiling (including acidity, peroxide value, spectrophotometric indices, fatty acids, tocopherols- and sterols composition) was also used for classification. Partial least squares discriminant analysis (PLS-DA), soft independent modeling of class analogy (SIMCA), K-nearest neighbors (KNN), and support vector machines (SVM), were compared. The SIFT-MS data were therefore fed to variable-selection methods to find potential biomarkers for classification. The classification models based either on chemical profiling or SIFT-MS data were able to classify the samples with high accuracy. SIFT-MS was found to be advantageous for rapid geographic classification. Copyright © 2018 Elsevier Ltd. All rights reserved.
Prediction of microsleeps using pairwise joint entropy and mutual information between EEG channels.
Baseer, Abdul; Weddell, Stephen J; Jones, Richard D
2017-07-01
Microsleeps are involuntary and brief instances of complete loss of responsiveness, typically of 0.5-15 s duration. They adversely affect performance in extended attention-driven jobs and can be fatal. Our aim was to predict microsleeps from 16 channel EEG signals. Two information theoretic concepts - pairwise joint entropy and mutual information - were independently used to continuously extract features from EEG signals. k-nearest neighbor (kNN) with k = 3 was used to calculate both joint entropy and mutual information. Highly correlated features were discarded and the rest were ranked using Fisher score followed by an average of 3-fold cross-validation area under the curve of the receiver operating characteristic (AUC ROC ). Leave-one-out method (LOOM) was performed to test the performance of microsleep prediction system on independent data. The best prediction for 0.25 s ahead was AUCROC, sensitivity, precision, geometric mean (GM), and φ of 0.93, 0.68, 0.33, 0.75, and 0.38 respectively with joint entropy using single linear discriminant analysis (LDA) classifier.
Bajoub, Aadil; Medina-Rodríguez, Santiago; Gómez-Romero, María; Ajal, El Amine; Bagur-González, María Gracia; Fernández-Gutiérrez, Alberto; Carrasco-Pancorbo, Alegría
2017-01-15
High Performance Liquid Chromatography (HPLC) with diode array (DAD) and fluorescence (FLD) detection was used to acquire the fingerprints of the phenolic fraction of monovarietal extra-virgin olive oils (extra-VOOs) collected over three consecutive crop seasons (2011/2012-2013/2014). The chromatographic fingerprints of 140 extra-VOO samples processed from olive fruits of seven olive varieties, were recorded and statistically treated for varietal authentication purposes. First, DAD and FLD chromatographic-fingerprint datasets were separately processed and, subsequently, were joined using "Low-level" and "Mid-Level" data fusion methods. After the preliminary examination by principal component analysis (PCA), three supervised pattern recognition techniques, Partial Least Squares Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogies (SIMCA) and K-Nearest Neighbors (k-NN) were applied to the four chromatographic-fingerprinting matrices. The classification models built were very sensitive and selective, showing considerably good recognition and prediction abilities. The combination "chromatographic dataset+chemometric technique" allowing the most accurate classification for each monovarietal extra-VOO was highlighted. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Karsi, Redouane; Zaim, Mounia; El Alami, Jamila
2017-07-01
Thanks to the development of the internet, a large community now has the possibility to communicate and express its opinions and preferences through multiple media such as blogs, forums, social networks and e-commerce sites. Today, it becomes clearer that opinions published on the web are a very valuable source for decision-making, so a rapidly growing field of research called “sentiment analysis” is born to address the problem of automatically determining the polarity (Positive, negative, neutral,…) of textual opinions. People expressing themselves in a particular domain often use specific domain language expressions, thus, building a classifier, which performs well in different domains is a challenging problem. The purpose of this paper is to evaluate the impact of domain for sentiment classification when using machine learning techniques. In our study three popular machine learning techniques: Support Vector Machines (SVM), Naive Bayes and K nearest neighbors(KNN) were applied on datasets collected from different domains. Experimental results show that Support Vector Machines outperforms other classifiers in all domains, since it achieved at least 74.75% accuracy with a standard deviation of 4,08.
Longobardi, Francesco; Innamorato, Valentina; Di Gioia, Annalisa; Ventrella, Andrea; Lippolis, Vincenzo; Logrieco, Antonio F; Catucci, Lucia; Agostiano, Angela
2017-12-15
Lentil samples coming from two different countries, i.e. Italy and Canada, were analysed using untargeted 1 H NMR fingerprinting in combination with chemometrics in order to build models able to classify them according to their geographical origin. For such aim, Soft Independent Modelling of Class Analogy (SIMCA), k-Nearest Neighbor (k-NN), Principal Component Analysis followed by Linear Discriminant Analysis (PCA-LDA) and Partial Least Squares-Discriminant Analysis (PLS-DA) were applied to the NMR data and the results were compared. The best combination of average recognition (100%) and cross-validation prediction abilities (96.7%) was obtained for the PCA-LDA. All the statistical models were validated both by using a test set and by carrying out a Monte Carlo Cross Validation: the obtained performances were found to be satisfying for all the models, with prediction abilities higher than 95% demonstrating the suitability of the developed methods. Finally, the metabolites that mostly contributed to the lentil discrimination were indicated. Copyright © 2017 Elsevier Ltd. All rights reserved.
Automated diagnosis of epilepsy using CWT, HOS and texture parameters.
Acharya, U Rajendra; Yanti, Ratna; Zheng, Jia Wei; Krishnan, M Muthu Rama; Tan, Jen Hong; Martis, Roshan Joy; Lim, Choo Min
2013-06-01
Epilepsy is a chronic brain disorder which manifests as recurrent seizures. Electroencephalogram (EEG) signals are generally analyzed to study the characteristics of epileptic seizures. In this work, we propose a method for the automated classification of EEG signals into normal, interictal and ictal classes using Continuous Wavelet Transform (CWT), Higher Order Spectra (HOS) and textures. First the CWT plot was obtained for the EEG signals and then the HOS and texture features were extracted from these plots. Then the statistically significant features were fed to four classifiers namely Decision Tree (DT), K-Nearest Neighbor (KNN), Probabilistic Neural Network (PNN) and Support Vector Machine (SVM) to select the best classifier. We observed that the SVM classifier with Radial Basis Function (RBF) kernel function yielded the best results with an average accuracy of 96%, average sensitivity of 96.9% and average specificity of 97% for 23.6 s duration of EEG data. Our proposed technique can be used as an automatic seizure monitoring software. It can also assist the doctors to cross check the efficacy of their prescribed drugs.
Link prediction with node clustering coefficient
NASA Astrophysics Data System (ADS)
Wu, Zhihao; Lin, Youfang; Wang, Jing; Gregory, Steve
2016-06-01
Predicting missing links in incomplete complex networks efficiently and accurately is still a challenging problem. The recently proposed Cannistrai-Alanis-Ravai (CAR) index shows the power of local link/triangle information in improving link-prediction accuracy. Inspired by the idea of employing local link/triangle information, we propose a new similarity index with more local structure information. In our method, local link/triangle structure information can be conveyed by clustering coefficient of common-neighbors directly. The reason why clustering coefficient has good effectiveness in estimating the contribution of a common-neighbor is that it employs links existing between neighbors of a common-neighbor and these links have the same structural position with the candidate link to this common-neighbor. In our experiments, three estimators: precision, AUP and AUC are used to evaluate the accuracy of link prediction algorithms. Experimental results on ten tested networks drawn from various fields show that our new index is more effective in predicting missing links than CAR index, especially for networks with low correlation between number of common-neighbors and number of links between common-neighbors.
Accounting for one-channel depletion improves missing value imputation in 2-dye microarray data.
Ritz, Cecilia; Edén, Patrik
2008-01-19
For 2-dye microarray platforms, some missing values may arise from an un-measurably low RNA expression in one channel only. Information of such "one-channel depletion" is so far not included in algorithms for imputation of missing values. Calculating the mean deviation between imputed values and duplicate controls in five datasets, we show that KNN-based imputation gives a systematic bias of the imputed expression values of one-channel depleted spots. Evaluating the correction of this bias by cross-validation showed that the mean square deviation between imputed values and duplicates were reduced up to 51%, depending on dataset. By including more information in the imputation step, we more accurately estimate missing expression values.
A Fast Implementation of the ISOCLUS Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2003-01-01
Unsupervised clustering is a fundamental building block in numerous image processing applications. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute the coordinates of a set of cluster centers in d-space, such that those centers minimize the mean squared distance from each data point to its nearest center. This clustering algorithm is similar to another well-known clustering method, called k-means. One significant feature of ISOCLUS over k-means is that the actual number of clusters reported might be fewer or more than the number supplied as part of the input. The algorithm uses different heuristics to determine whether to merge lor split clusters. As ISOCLUS can run very slowly, particularly on large data sets, there has been a growing .interest in the remote sensing community in computing it efficiently. We have developed a faster implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm of Kanungo, et al. They showed that, by using a kd-tree data structure for storing the data, it is possible to reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm, and we show that it is possible to achieve essentially the same results as ISOCLUS on large data sets, but with significantly lower running times. This adaptation involves computing a number of cluster statistics that are needed for ISOCLUS but not for k-means. Both the k-means and ISOCLUS algorithms are based on iterative schemes, in which nearest neighbors are calculated until some convergence criterion is satisfied. Each iteration requires that the nearest center for each data point be computed. Naively, this requires O(kn) time, where k denotes the current number of centers. Traditional techniques for accelerating nearest neighbor searching involve storing the k centers in a data structure. However, because of the iterative nature of the algorithm, this data structure would need to be rebuilt with each new iteration. Our approach is to store the data points in a kd-tree data structure. The assignment of points to nearest neighbors is carried out by a filtering process, which successively eliminates centers that can not possibly be the nearest neighbor for a given region of space. This algorithm is significantly faster, because large groups of data points can be assigned to their nearest center in a single operation. Preliminary results on a number of real Landsat datasets show that our revised ISOCLUS-like scheme runs about twice as fast.
Nguyen, Hieu Cong; Jung, Jaehoon; Lee, Jungbin; Choi, Sung-Uk; Hong, Suk-Young; Heo, Joon
2015-01-01
The reflectance of the Earth’s surface is significantly influenced by atmospheric conditions such as water vapor content and aerosols. Particularly, the absorption and scattering effects become stronger when the target features are non-bright objects, such as in aqueous or vegetated areas. For any remote-sensing approach, atmospheric correction is thus required to minimize those effects and to convert digital number (DN) values to surface reflectance. The main aim of this study was to test the three most popular atmospheric correction models, namely (1) Dark Object Subtraction (DOS); (2) Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) and (3) the Second Simulation of Satellite Signal in the Solar Spectrum (6S) and compare them with Top of Atmospheric (TOA) reflectance. By using the k-Nearest Neighbor (kNN) algorithm, a series of experiments were conducted for above-ground forest biomass (AGB) estimations of the Gongju and Sejong region of South Korea, in order to check the effectiveness of atmospheric correction methods for Landsat ETM+. Overall, in the forest biomass estimation, the 6S model showed the bestRMSE’s, followed by FLAASH, DOS and TOA. In addition, a significant improvement of RMSE by 6S was found with images when the study site had higher total water vapor and temperature levels. Moreover, we also tested the sensitivity of the atmospheric correction methods to each of the Landsat ETM+ bands. The results confirmed that 6S dominates the other methods, especially in the infrared wavelengths covering the pivotal bands for forest applications. Finally, we suggest that the 6S model, integrating water vapor and aerosol optical depth derived from MODIS products, is better suited for AGB estimation based on optical remote-sensing data, especially when using satellite images acquired in the summer during full canopy development. PMID:26263996
Ingle, Brandall L; Veber, Brandon C; Nichols, John W; Tornero-Velez, Rogelio
2016-11-28
The free fraction of a xenobiotic in plasma (F ub ) is an important determinant of chemical adsorption, distribution, metabolism, elimination, and toxicity, yet experimental plasma protein binding data are scarce for environmentally relevant chemicals. The presented work explores the merit of utilizing available pharmaceutical data to predict F ub for environmentally relevant chemicals via machine learning techniques. Quantitative structure-activity relationship (QSAR) models were constructed with k nearest neighbors (kNN), support vector machines (SVM), and random forest (RF) machine learning algorithms from a training set of 1045 pharmaceuticals. The models were then evaluated with independent test sets of pharmaceuticals (200 compounds) and environmentally relevant ToxCast chemicals (406 total, in two groups of 238 and 168 compounds). The selection of a minimal feature set of 10-15 2D molecular descriptors allowed for both informative feature interpretation and practical applicability domain assessment via a bounded box of descriptor ranges and principal component analysis. The diverse pharmaceutical and environmental chemical sets exhibit similarities in terms of chemical space (99-82% overlap), as well as comparable bias and variance in constructed learning curves. All the models exhibit significant predictability with mean absolute errors (MAE) in the range of 0.10-0.18F ub . The models performed best for highly bound chemicals (MAE 0.07-0.12), neutrals (MAE 0.11-0.14), and acids (MAE 0.14-0.17). A consensus model had the highest accuracy across both pharmaceuticals (MAE 0.151-0.155) and environmentally relevant chemicals (MAE 0.110-0.131). The inclusion of the majority of the ToxCast test sets within the AD of the consensus model, coupled with high prediction accuracy for these chemicals, indicates the model provides a QSAR for F ub that is broadly applicable to both pharmaceuticals and environmentally relevant chemicals.
Nuclear Forensic Inferences Using Iterative Multidimensional Statistics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Robel, M; Kristo, M J; Heller, M A
2009-06-09
Nuclear forensics involves the analysis of interdicted nuclear material for specific material characteristics (referred to as 'signatures') that imply specific geographical locations, production processes, culprit intentions, etc. Predictive signatures rely on expert knowledge of physics, chemistry, and engineering to develop inferences from these material characteristics. Comparative signatures, on the other hand, rely on comparison of the material characteristics of the interdicted sample (the 'questioned sample' in FBI parlance) with those of a set of known samples. In the ideal case, the set of known samples would be a comprehensive nuclear forensics database, a database which does not currently exist. Inmore » fact, our ability to analyze interdicted samples and produce an extensive list of precise materials characteristics far exceeds our ability to interpret the results. Therefore, as we seek to develop the extensive databases necessary for nuclear forensics, we must also develop the methods necessary to produce the necessary inferences from comparison of our analytical results with these large, multidimensional sets of data. In the work reported here, we used a large, multidimensional dataset of results from quality control analyses of uranium ore concentrate (UOC, sometimes called 'yellowcake'). We have found that traditional multidimensional techniques, such as principal components analysis (PCA), are especially useful for understanding such datasets and drawing relevant conclusions. In particular, we have developed an iterative partial least squares-discriminant analysis (PLS-DA) procedure that has proven especially adept at identifying the production location of unknown UOC samples. By removing classes which fell far outside the initial decision boundary, and then rebuilding the PLS-DA model, we have consistently produced better and more definitive attributions than with a single pass classification approach. Performance of the iterative PLS-DA method compared favorably to that of classification and regression tree (CART) and k nearest neighbor (KNN) algorithms, with the best combination of accuracy and robustness, as tested by classifying samples measured independently in our laboratories against the vendor QC based reference set.« less
NASA Astrophysics Data System (ADS)
Cho, Hyun-chong; Hadjiiski, Lubomir; Sahiner, Berkman; Chan, Heang-Ping; Paramagul, Chintana; Helvie, Mark; Nees, Alexis V.
2012-03-01
We designed a Content-Based Image Retrieval (CBIR) Computer-Aided Diagnosis (CADx) system to assist radiologists in characterizing masses on ultrasound images. The CADx system retrieves masses that are similar to a query mass from a reference library based on computer-extracted features that describe texture, width-to-height ratio, and posterior shadowing of a mass. Retrieval is performed with k nearest neighbor (k-NN) method using Euclidean distance similarity measure and Rocchio relevance feedback algorithm (RRF). In this study, we evaluated the similarity between the query and the retrieved masses with relevance feedback using our interactive CBIR CADx system. The similarity assessment and feedback were provided by experienced radiologists' visual judgment. For training the RRF parameters, similarities of 1891 image pairs obtained from 62 masses were rated by 3 MQSA radiologists using a 9-point scale (9=most similar). A leave-one-out method was used in training. For each query mass, 5 most similar masses were retrieved from the reference library using radiologists' similarity ratings, which were then used by RRF to retrieve another 5 masses for the same query. The best RRF parameters were chosen based on three simulated observer experiments, each of which used one of the radiologists' ratings for retrieval and relevance feedback. For testing, 100 independent query masses on 100 images and 121 reference masses on 230 images were collected. Three radiologists rated the similarity between the query and the computer-retrieved masses. Average similarity ratings without and with RRF were 5.39 and 5.64 on the training set and 5.78 and 6.02 on the test set, respectively. The average Az values without and with RRF were 0.86+/-0.03 and 0.87+/-0.03 on the training set and 0.91+/-0.03 and 0.90+/-0.03 on the test set, respectively. This study demonstrated that RRF improved the similarity of the retrieved masses.
Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search.
Liu, Xianglong; Deng, Cheng; Lang, Bo; Tao, Dacheng; Li, Xuelong
2016-02-01
Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare work studies the unified approach to constructing multiple informative hash tables using any type of hashing algorithms. Meanwhile, for multiple table search, it also lacks of a generic query-adaptive and fine-grained ranking scheme that can alleviate the binary quantization loss suffered in the standard hashing techniques. To solve the above problems, in this paper, we first regard the table construction as a selection problem over a set of candidate hash functions. With the graph representation of the function set, we propose an efficient solution that sequentially applies normalized dominant set to finding the most informative and independent hash functions for each table. To further reduce the redundancy between tables, we explore the reciprocal hash tables in a boosting manner, where the hash function graph is updated with high weights emphasized on the misclassified neighbor pairs of previous hash tables. To refine the ranking of the retrieved buckets within a certain Hamming radius from the query, we propose a query-adaptive bitwise weighting scheme to enable fine-grained bucket ranking in each hash table, exploiting the discriminative power of its hash functions and their complement for nearest neighbor search. Moreover, we integrate such scheme into the multiple table search using a fast, yet reciprocal table lookup algorithm within the adaptive weighted Hamming radius. In this paper, both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings. Our extensive experiments on several large-scale benchmarks demonstrate that the proposed techniques can significantly outperform both the naive construction methods and the state-of-the-art hashing algorithms.
NASA Astrophysics Data System (ADS)
Saha, S.; Basak, S.; Safonova, M.; Bora, K.; Agrawal, S.; Sarkar, P.; Murthy, J.
2018-04-01
Seven Earth-sized planets, known as the TRAPPIST-1 system, was discovered with great fanfare in the last week of February 2017. Three of these planets are in the habitable zone of their star, making them potentially habitable planets (PHPs) a mere 40 light years away. The discovery of the closest potentially habitable planet to us just a year before - Proxima b and a realization that Earth-type planets in circumstellar habitable zones are a common occurrence provides the impetus to the existing pursuit for life outside the Solar System. The search for life has two goals essentially: looking for planets with Earth-like conditions (Earth similarity) and looking for the possibility of life in some form (habitability). An index was recently developed, the Cobb-Douglas Habitability Score (CDHS), based on Cobb-Douglas habitability production function (CD-HPF), which computes the habitability score by using measured and estimated planetary parameters. As an initial set, radius, density, escape velocity and surface temperature of a planet were used. The proposed metric, with exponents accounting for metric elasticity, is endowed with analytical properties that ensure global optima and can be scaled to accommodate a finite number of input parameters. We show here that the model is elastic, and the conditions on elasticity to ensure global maxima can scale as the number of predictor parameters increase. K-NN (K-Nearest Neighbor) classification algorithm, embellished with probabilistic herding and thresholding restriction, utilizes CDHS scores and labels exoplanets into appropriate classes via feature-learning methods yielding granular clusters of habitability. The algorithm works on top of a decision-theoretical model using the power of convex optimization and machine learning. The goal is to characterize the recently discovered exoplanets into an "Earth League" and several other classes based on their CDHS values. A second approach, based on a novel feature-learning and tree-building method classifies the same planets without computing the CDHS of the planets and produces a similar outcome. For this, we use XGBoosted trees. The convergence of the outcome of the two different approaches indicates the strength of the proposed solution scheme and the likelihood of the potential habitability of the recently announced discoveries.
Multiwavelet grading of prostate pathological images
NASA Astrophysics Data System (ADS)
Soltanian-Zadeh, Hamid; Jafari-Khouzani, Kourosh
2002-05-01
We have developed image analysis methods to automatically grade pathological images of prostate. The proposed method generates Gleason grades to images, where each image is assigned a grade between 1 and 5. This is done using features extracted from multiwavelet transformations. We extract energy and entropy features from submatrices obtained in the decomposition. Next, we apply a k-NN classifier to grade the image. To find optimal multiwavelet basis, preprocessing, and classifier, we use features extracted by different multiwavelets with either critically sampled preprocessing or repeated row preprocessing and different k-NN classifiers and compare their performances, evaluated by total misclassification rate (TMR). To evaluate sensitivity to noise, we add white Gaussian noise to images and compare the results (TMR's). We applied proposed methods to 100 images. We evaluated the first and second levels of decomposition using Geronimo, Hardin, and Massopust (GHM), Chui and Lian (CL), and Shen (SA4) multiwavelets. We also evaluated k-NN classifier for k=1,2,3,4,5. Experimental results illustrate that first level of decomposition is quite noisy. They also show that critically sampled preprocessing outperforms repeated row preprocessing and has less sensitivity to noise. Finally, comparison studies indicate that SA4 multiwavelet and k-NN classifier (k=1) generates optimal results (with smallest TMR of 3%).
Lee, Jinseok; Chon, Ki H
2010-09-01
We present particle filtering (PF) algorithms for an accurate respiratory rate extraction from pulse oximeter recordings over a broad range: 12-90 breaths/min. These methods are based on an autoregressive (AR) model, where the aim is to find the pole angle with the highest magnitude as it corresponds to the respiratory rate. However, when SNR is low, the pole angle with the highest magnitude may not always lead to accurate estimation of the respiratory rate. To circumvent this limitation, we propose a probabilistic approach, using a sequential Monte Carlo method, named PF, which is combined with the optimal parameter search (OPS) criterion for an accurate AR model-based respiratory rate extraction. The PF technique has been widely adopted in many tracking applications, especially for nonlinear and/or non-Gaussian problems. We examine the performances of five different likelihood functions of the PF algorithm: the strongest neighbor, nearest neighbor (NN), weighted nearest neighbor (WNN), probability data association (PDA), and weighted probability data association (WPDA). The performance of these five combined OPS-PF algorithms was measured against a solely OPS-based AR algorithm for respiratory rate extraction from pulse oximeter recordings. The pulse oximeter data were collected from 33 healthy subjects with breathing rates ranging from 12 to 90 breaths/ min. It was found that significant improvement in accuracy can be achieved by employing particle filters, and that the combined OPS-PF employing either the NN or WNN likelihood function achieved the best results for all respiratory rates considered in this paper. The main advantage of the combined OPS-PF with either the NN or WNN likelihood function is that for the first time, respiratory rates as high as 90 breaths/min can be accurately extracted from pulse oximeter recordings.
NASA Astrophysics Data System (ADS)
Brenden, T. O.; Clark, R. D.; Wiley, M. J.; Seelbach, P. W.; Wang, L.
2005-05-01
Remote sensing and geographic information systems have made it possible to attribute variables for streams at increasingly detailed resolutions (e.g., individual river reaches). Nevertheless, management decisions still must be made at large scales because land and stream managers typically lack sufficient resources to manage on an individual reach basis. Managers thus require a method for identifying stream management units that are ecologically similar and that can be expected to respond similarly to management decisions. We have developed a spatially-constrained clustering algorithm that can merge neighboring river reaches with similar ecological characteristics into larger management units. The clustering algorithm is based on the Cluster Affinity Search Technique (CAST), which was developed for clustering gene expression data. Inputs to the clustering algorithm are the neighbor relationships of the reaches that comprise the digital river network, the ecological attributes of the reaches, and an affinity value, which identifies the minimum similarity for merging river reaches. In this presentation, we describe the clustering algorithm in greater detail and contrast its use with other methods (expert opinion, classification approach, regular clustering) for identifying management units using several Michigan watersheds as a backdrop.
Maximal Neighbor Similarity Reveals Real Communities in Networks
Žalik, Krista Rizman
2015-01-01
An important problem in the analysis of network data is the detection of groups of densely interconnected nodes also called modules or communities. Community structure reveals functions and organizations of networks. Currently used algorithms for community detection in large-scale real-world networks are computationally expensive or require a priori information such as the number or sizes of communities or are not able to give the same resulting partition in multiple runs. In this paper we investigate a simple and fast algorithm that uses the network structure alone and requires neither optimization of pre-defined objective function nor information about number of communities. We propose a bottom up community detection algorithm in which starting from communities consisting of adjacent pairs of nodes and their maximal similar neighbors we find real communities. We show that the overall advantage of the proposed algorithm compared to the other community detection algorithms is its simple nature, low computational cost and its very high accuracy in detection communities of different sizes also in networks with blurred modularity structure consisting of poorly separated communities. All communities identified by the proposed method for facebook network and E-Coli transcriptional regulatory network have strong structural and functional coherence. PMID:26680448
Celik, Yuksel; Ulker, Erkan
2013-01-01
Marriage in honey bees optimization (MBO) is a metaheuristic optimization algorithm developed by inspiration of the mating and fertilization process of honey bees and is a kind of swarm intelligence optimizations. In this study we propose improved marriage in honey bees optimization (IMBO) by adding Levy flight algorithm for queen mating flight and neighboring for worker drone improving. The IMBO algorithm's performance and its success are tested on the well-known six unconstrained test functions and compared with other metaheuristic optimization algorithms. PMID:23935416
Efficient Fingercode Classification
NASA Astrophysics Data System (ADS)
Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang
In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.
A class of parallel algorithms for computation of the manipulator inertia matrix
NASA Technical Reports Server (NTRS)
Fijany, Amir; Bejczy, Antal K.
1989-01-01
Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
A distributed algorithm for machine learning
NASA Astrophysics Data System (ADS)
Chen, Shihong
2018-04-01
This paper considers a distributed learning problem in which a group of machines in a connected network, each learning its own local dataset, aim to reach a consensus at an optimal model, by exchanging information only with their neighbors but without transmitting data. A distributed algorithm is proposed to solve this problem under appropriate assumptions.
Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang
2016-11-16
The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
NASA Astrophysics Data System (ADS)
Jakovetic, Dusan; Xavier, João; Moura, José M. F.
2011-08-01
We study distributed optimization in networked systems, where nodes cooperate to find the optimal quantity of common interest, x=x^\\star. The objective function of the corresponding optimization problem is the sum of private (known only by a node,) convex, nodes' objectives and each node imposes a private convex constraint on the allowed values of x. We solve this problem for generic connected network topologies with asymmetric random link failures with a novel distributed, decentralized algorithm. We refer to this algorithm as AL-G (augmented Lagrangian gossiping,) and to its variants as AL-MG (augmented Lagrangian multi neighbor gossiping) and AL-BG (augmented Lagrangian broadcast gossiping.) The AL-G algorithm is based on the augmented Lagrangian dual function. Dual variables are updated by the standard method of multipliers, at a slow time scale. To update the primal variables, we propose a novel, Gauss-Seidel type, randomized algorithm, at a fast time scale. AL-G uses unidirectional gossip communication, only between immediate neighbors in the network and is resilient to random link failures. For networks with reliable communication (i.e., no failures,) the simplified, AL-BG (augmented Lagrangian broadcast gossiping) algorithm reduces communication, computation and data storage cost. We prove convergence for all proposed algorithms and demonstrate by simulations the effectiveness on two applications: l_1-regularized logistic regression for classification and cooperative spectrum sensing for cognitive radio networks.
Phase transitions and electrical behavior of lead-free (K0.50Na0.50)NbO3 thin film
NASA Astrophysics Data System (ADS)
Wu, Jiagang; Wang, John
2009-09-01
Lead-free (K0.50Na0.50)NbO3 (KNN) thin films with a high degree of (100) preferred orientation were deposited on the SrRuO3-buffered SrTiO3(100) substrate by off-axis radio frequency magnetron sputtering. They possess lower phase transition temperatures (To-t˜120 °C and Tc˜310 °C), as compared to those of KNN bulk ceramic (To-t˜190 °C and Tc˜400 °C). They also demonstrate enhanced ferroelectric behavior (e.g., 2Pr=24.1 μc/cm2) and fatigue endurance, together with a lower dielectric loss (tan δ ˜0.017) and a lower leakage current, as compared to the bulk ceramic counterpart. Oxygen vacancies are shown to be involved in the conduction of the KNN thin film.
A study on (K, Na) NbO3 based multilayer piezoelectric ceramics micro speaker
NASA Astrophysics Data System (ADS)
Gao, Renlong; Chu, Xiangcheng; Huan, Yu; Sun, Yiming; Liu, Jiayi; Wang, Xiaohui; Li, Longtu
2014-10-01
A flat panel micro speaker was fabricated from (K, Na) NbO3 (KNN)-based multilayer piezoelectric ceramics by a tape casting and cofiring process using Ag-Pd alloys as an inner electrode. The interface between ceramic and electrode was investigated by scanning electron microscope (SEM) and transmission electron microscope (TEM). The acoustic response was characterized by a standard audio test system. We found that the micro speaker with dimensions of 23 × 27 × 0.6 mm3, using three layers of 30 μm thickness KNN-based ceramic, has a high average sound pressure level (SPL) of 87 dB, between 100 Hz-20 kHz under five voltage. This result was even better than that of lead zirconate titanate (PZT)-based ceramics under the same conditions. The experimental results show that the KNN-based multilayer ceramics could be used as lead free piezoelectric micro speakers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bayar, M.; Yamagata-Sekihara, J.; Oset, E.
We have performed a calculation of the scattering amplitude for the three-body system KNN assuming K scattering against a NN cluster using the fixed center approximation to the Faddeev equations. The KN amplitudes, which we take from chiral unitary dynamics, govern the reaction and we find a KNN amplitude that peaks around 40 MeV below the KNN threshold, with a width in |T|{sup 2} of the order of 50 MeV for spin 0 and has another peak around 27 MeV with similar width for spin 1. The results are in line with those obtained using different methods but implementing chiralmore » dynamics. The simplicity of the approach allows one to see the important ingredients responsible for the results. In particular, we show the effects from the reduction of the size of the NN cluster due to the interaction with the K and those from the explicit consideration of the {pi}{Sigma}N channel in the three-body equations.« less
Salari, Nader; Shohaimi, Shamarina; Najafi, Farid; Nallappan, Meenakshii; Karishnarajah, Isthrinayagy
2014-01-01
Among numerous artificial intelligence approaches, k-Nearest Neighbor algorithms, genetic algorithms, and artificial neural networks are considered as the most common and effective methods in classification problems in numerous studies. In the present study, the results of the implementation of a novel hybrid feature selection-classification model using the above mentioned methods are presented. The purpose is benefitting from the synergies obtained from combining these technologies for the development of classification models. Such a combination creates an opportunity to invest in the strength of each algorithm, and is an approach to make up for their deficiencies. To develop proposed model, with the aim of obtaining the best array of features, first, feature ranking techniques such as the Fisher's discriminant ratio and class separability criteria were used to prioritize features. Second, the obtained results that included arrays of the top-ranked features were used as the initial population of a genetic algorithm to produce optimum arrays of features. Third, using a modified k-Nearest Neighbor method as well as an improved method of backpropagation neural networks, the classification process was advanced based on optimum arrays of the features selected by genetic algorithms. The performance of the proposed model was compared with thirteen well-known classification models based on seven datasets. Furthermore, the statistical analysis was performed using the Friedman test followed by post-hoc tests. The experimental findings indicated that the novel proposed hybrid model resulted in significantly better classification performance compared with all 13 classification methods. Finally, the performance results of the proposed model was benchmarked against the best ones reported as the state-of-the-art classifiers in terms of classification accuracy for the same data sets. The substantial findings of the comprehensive comparative study revealed that performance of the proposed model in terms of classification accuracy is desirable, promising, and competitive to the existing state-of-the-art classification models. PMID:25419659
Handling Neighbor Discovery and Rendezvous Consistency with Weighted Quorum-Based Approach
Own, Chung-Ming; Meng, Zhaopeng; Liu, Kehan
2015-01-01
Neighbor discovery and the power of sensors play an important role in the formation of Wireless Sensor Networks (WSNs) and mobile networks. Many asynchronous protocols based on wake-up time scheduling have been proposed to enable neighbor discovery among neighboring nodes for the energy saving, especially in the difficulty of clock synchronization. However, existing researches are divided two parts with the neighbor-discovery methods, one is the quorum-based protocols and the other is co-primality based protocols. Their distinction is on the arrangements of time slots, the former uses the quorums in the matrix, the latter adopts the numerical analysis. In our study, we propose the weighted heuristic quorum system (WQS), which is based on the quorum algorithm to eliminate redundant paths of active slots. We demonstrate the specification of our system: fewer active slots are required, the referring rate is balanced, and remaining power is considered particularly when a device maintains rendezvous with discovered neighbors. The evaluation results showed that our proposed method can effectively reschedule the active slots and save the computing time of the network system. PMID:26404297
Tighe, Patrick J.; Harle, Christopher A.; Hurley, Robert W.; Aytug, Haldun; Boezaart, Andre P.; Fillingim, Roger B.
2015-01-01
Background Given their ability to process highly dimensional datasets with hundreds of variables, machine learning algorithms may offer one solution to the vexing challenge of predicting postoperative pain. Methods Here, we report on the application of machine learning algorithms to predict postoperative pain outcomes in a retrospective cohort of 8071 surgical patients using 796 clinical variables. Five algorithms were compared in terms of their ability to forecast moderate to severe postoperative pain: Least Absolute Shrinkage and Selection Operator (LASSO), gradient-boosted decision tree, support vector machine, neural network, and k-nearest neighbor, with logistic regression included for baseline comparison. Results In forecasting moderate to severe postoperative pain for postoperative day (POD) 1, the LASSO algorithm, using all 796 variables, had the highest accuracy with an area under the receiver-operating curve (ROC) of 0.704. Next, the gradient-boosted decision tree had an ROC of 0.665 and the k-nearest neighbor algorithm had an ROC of 0.643. For POD 3, the LASSO algorithm, using all variables, again had the highest accuracy, with an ROC of 0.727. Logistic regression had a lower ROC of 0.5 for predicting pain outcomes on POD 1 and 3. Conclusions Machine learning algorithms, when combined with complex and heterogeneous data from electronic medical record systems, can forecast acute postoperative pain outcomes with accuracies similar to methods that rely only on variables specifically collected for pain outcome prediction. PMID:26031220
δ-Generalized Labeled Multi-Bernoulli Filter Using Amplitude Information of Neighboring Cells
Liu, Chao; Lei, Peng; Qi, Yaolong
2018-01-01
The amplitude information (AI) of echoed signals plays an important role in radar target detection and tracking. A lot of research shows that the introduction of AI enables the tracking algorithm to distinguish targets from clutter better and then improves the performance of data association. The current AI-aided tracking algorithms only consider the signal amplitude in the range-azimuth cell where measurement exists. However, since radar echoes always contain backscattered signals from multiple cells, the useful information of neighboring cells would be lost if directly applying those existing methods. In order to solve this issue, a new δ-generalized labeled multi-Bernoulli (δ-GLMB) filter is proposed. It exploits the AI of radar echoes from neighboring cells to construct a united amplitude likelihood ratio, and then plugs it into the update process and the measurement-track assignment cost matrix of the δ-GLMB filter. Simulation results show that the proposed approach has better performance in target’s state and number estimation than that of the δ-GLMB only using single-cell AI in low signal-to-clutter-ratio (SCR) environment. PMID:29642595
Hyperopt: a Python library for model selection and hyperparameter optimization
NASA Astrophysics Data System (ADS)
Bergstra, James; Komer, Brent; Eliasmith, Chris; Yamins, Dan; Cox, David D.
2015-01-01
Sequential model-based optimization (also known as Bayesian optimization) is one of the most efficient methods (per function evaluation) of function minimization. This efficiency makes it appropriate for optimizing the hyperparameters of machine learning algorithms that are slow to train. The Hyperopt library provides algorithms and parallelization infrastructure for performing hyperparameter optimization (model selection) in Python. This paper presents an introductory tutorial on the usage of the Hyperopt library, including the description of search spaces, minimization (in serial and parallel), and the analysis of the results collected in the course of minimization. This paper also gives an overview of Hyperopt-Sklearn, a software project that provides automatic algorithm configuration of the Scikit-learn machine learning library. Following Auto-Weka, we take the view that the choice of classifier and even the choice of preprocessing module can be taken together to represent a single large hyperparameter optimization problem. We use Hyperopt to define a search space that encompasses many standard components (e.g. SVM, RF, KNN, PCA, TFIDF) and common patterns of composing them together. We demonstrate, using search algorithms in Hyperopt and standard benchmarking data sets (MNIST, 20-newsgroups, convex shapes), that searching this space is practical and effective. In particular, we improve on best-known scores for the model space for both MNIST and convex shapes. The paper closes with some discussion of ongoing and future work.
Quantitative diagnosis of bladder cancer by morphometric analysis of HE images
NASA Astrophysics Data System (ADS)
Wu, Binlin; Nebylitsa, Samantha V.; Mukherjee, Sushmita; Jain, Manu
2015-02-01
In clinical practice, histopathological analysis of biopsied tissue is the main method for bladder cancer diagnosis and prognosis. The diagnosis is performed by a pathologist based on the morphological features in the image of a hematoxylin and eosin (HE) stained tissue sample. This manuscript proposes algorithms to perform morphometric analysis on the HE images, quantify the features in the images, and discriminate bladder cancers with different grades, i.e. high grade and low grade. The nuclei are separated from the background and other types of cells such as red blood cells (RBCs) and immune cells using manual outlining, color deconvolution and image segmentation. A mask of nuclei is generated for each image for quantitative morphometric analysis. The features of the nuclei in the mask image including size, shape, orientation, and their spatial distributions are measured. To quantify local clustering and alignment of nuclei, we propose a 1-nearest-neighbor (1-NN) algorithm which measures nearest neighbor distance and nearest neighbor parallelism. The global distributions of the features are measured using statistics of the proposed parameters. A linear support vector machine (SVM) algorithm is used to classify the high grade and low grade bladder cancers. The results show using a particular group of nuclei such as large ones, and combining multiple parameters can achieve better discrimination. This study shows the proposed approach can potentially help expedite pathological diagnosis by triaging potentially suspicious biopsies.
Frequency Diverse Array Radar: Signal Characterization and Measurement Accuracy
2010-03-25
W knN (C.14) and f [n] = N−1∑ k=0 F [k]W− knN (C.15) where f [n] = f(t)|t=nTs F [k] = F (ω)|ω=k∆ω WN = exp(−j2π/N) Ts = f −1 s ∆ω = 2π NTs , fs is the...Properties of the MIMO radar ambiguity function”. Proceedings 2008 International Conference on Acoustics, Speech and Signal Processing, 2309–2312. April 2008
NASA Astrophysics Data System (ADS)
Wani, Omar; Beckers, Joost V. L.; Weerts, Albrecht H.; Solomatine, Dimitri P.
2017-08-01
A non-parametric method is applied to quantify residual uncertainty in hydrologic streamflow forecasting. This method acts as a post-processor on deterministic model forecasts and generates a residual uncertainty distribution. Based on instance-based learning, it uses a k nearest-neighbour search for similar historical hydrometeorological conditions to determine uncertainty intervals from a set of historical errors, i.e. discrepancies between past forecast and observation. The performance of this method is assessed using test cases of hydrologic forecasting in two UK rivers: the Severn and Brue. Forecasts in retrospect were made and their uncertainties were estimated using kNN resampling and two alternative uncertainty estimators: quantile regression (QR) and uncertainty estimation based on local errors and clustering (UNEEC). Results show that kNN uncertainty estimation produces accurate and narrow uncertainty intervals with good probability coverage. Analysis also shows that the performance of this technique depends on the choice of search space. Nevertheless, the accuracy and reliability of uncertainty intervals generated using kNN resampling are at least comparable to those produced by QR and UNEEC. It is concluded that kNN uncertainty estimation is an interesting alternative to other post-processors, like QR and UNEEC, for estimating forecast uncertainty. Apart from its concept being simple and well understood, an advantage of this method is that it is relatively easy to implement.
Linear model for fast background subtraction in oligonucleotide microarrays.
Kroll, K Myriam; Barkema, Gerard T; Carlon, Enrico
2009-11-16
One important preprocessing step in the analysis of microarray data is background subtraction. In high-density oligonucleotide arrays this is recognized as a crucial step for the global performance of the data analysis from raw intensities to expression values. We propose here an algorithm for background estimation based on a model in which the cost function is quadratic in a set of fitting parameters such that minimization can be performed through linear algebra. The model incorporates two effects: 1) Correlated intensities between neighboring features in the chip and 2) sequence-dependent affinities for non-specific hybridization fitted by an extended nearest-neighbor model. The algorithm has been tested on 360 GeneChips from publicly available data of recent expression experiments. The algorithm is fast and accurate. Strong correlations between the fitted values for different experiments as well as between the free-energy parameters and their counterparts in aqueous solution indicate that the model captures a significant part of the underlying physical chemistry.
Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi
2017-11-02
Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.
A Novel Color Image Encryption Algorithm Based on Quantum Chaos Sequence
NASA Astrophysics Data System (ADS)
Liu, Hui; Jin, Cong
2017-03-01
In this paper, a novel algorithm of image encryption based on quantum chaotic is proposed. The keystreams are generated by the two-dimensional logistic map as initial conditions and parameters. And then general Arnold scrambling algorithm with keys is exploited to permute the pixels of color components. In diffusion process, a novel encryption algorithm, folding algorithm, is proposed to modify the value of diffused pixels. In order to get the high randomness and complexity, the two-dimensional logistic map and quantum chaotic map are coupled with nearest-neighboring coupled-map lattices. Theoretical analyses and computer simulations confirm that the proposed algorithm has high level of security.
Computationally-Efficient Minimum-Time Aircraft Routes in the Presence of Winds
NASA Technical Reports Server (NTRS)
Jardin, Matthew R.
2004-01-01
A computationally efficient algorithm for minimizing the flight time of an aircraft in a variable wind field has been invented. The algorithm, referred to as Neighboring Optimal Wind Routing (NOWR), is based upon neighboring-optimal-control (NOC) concepts and achieves minimum-time paths by adjusting aircraft heading according to wind conditions at an arbitrary number of wind measurement points along the flight route. The NOWR algorithm may either be used in a fast-time mode to compute minimum- time routes prior to flight, or may be used in a feedback mode to adjust aircraft heading in real-time. By traveling minimum-time routes instead of direct great-circle (direct) routes, flights across the United States can save an average of about 7 minutes, and as much as one hour of flight time during periods of strong jet-stream winds. The neighboring optimal routes computed via the NOWR technique have been shown to be within 1.5 percent of the absolute minimum-time routes for flights across the continental United States. On a typical 450-MHz Sun Ultra workstation, the NOWR algorithm produces complete minimum-time routes in less than 40 milliseconds. This corresponds to a rate of 25 optimal routes per second. The closest comparable optimization technique runs approximately 10 times slower. Airlines currently use various trial-and-error search techniques to determine which of a set of commonly traveled routes will minimize flight time. These algorithms are too computationally expensive for use in real-time systems, or in systems where many optimal routes need to be computed in a short amount of time. Instead of operating in real-time, airlines will typically plan a trajectory several hours in advance using wind forecasts. If winds change significantly from forecasts, the resulting flights will no longer be minimum-time. The need for a computationally efficient wind-optimal routing algorithm is even greater in the case of new air-traffic-control automation concepts. For air-traffic-control automation, thousands of wind-optimal routes may need to be computed and checked for conflicts in just a few minutes. These factors motivated the need for a more efficient wind-optimal routing algorithm.
Wang, Shunfang; Liu, Shuhui
2015-12-19
An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one.
Recognition of hand movements in a trans-radial amputated subject by sEMG.
Atzori, Manfredo; Muller, Henning; Baechler, Micheal
2013-06-01
Trans-radially amputated persons who own a myoelectric prosthesis have currently some control via surface electromyography (sEMG). However, the control systems are still limited (as they include very few movements) and not always natural (as the subject has to learn to associate movements of the muscles with the movements of the prosthesis). The Ninapro project tries helping the scientific community to overcome these limits through the creation of electromyography data sources to test machine learning algorithms. In this paper the results gained from first tests made on an amputated subject with the Ninapro acquisition protocol are detailed. In agreement with neurological studies on cortical plasticity and on the anatomy of the forearm, the amputee produced stable signals for each movement in the test. Using a k-NN classification algorithm, we obtain an average classification rate of 61.5% on all 53 movements. Successively, we simplify the task reducing the number of movements to 13, resulting in no misclassified movements. This shows that for fewer movements a very high classification accuracy is possible without the subject having to learn the movements specifically.
Wang, Shunfang; Liu, Shuhui
2015-01-01
An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one. PMID:26703574
NASA Astrophysics Data System (ADS)
Zhu, W. L.; Zhu, J. L.; Meng, Y.; Wang, M. S.; Zhu, B.; Zhu, X. H.; Zhu, J. G.; Xiao, D. Q.; Pezzotti, G.
2011-12-01
This paper presents a Raman spectroscopic study of compositional-change-induced structure variation and of the related mechanism of Mg doping in LiSbO3 (LS)-modified (K0.5Na0.5)NbO3 (KNN) ceramics. With increasing LS content from 0 to 0.06, a discontinuous shift towards higher wavenumbers was found for the band position of the A1g(v1) stretching mode of KNN, accompanied by a clearly nonlinear broadening of this band and a decrease in its intensity. Such morphological changes in the Raman spectrum result from two factors: (i) changes in polarizability/binding strength of the O-Nb-O vibration upon incorporation of Li ions in the KNN perovskitic structure and (ii) a polymorphic phase transition (PPT) from orthorhombic to tetragonal (O → T) phase at x > 0.04. Upon increasing the amount, w, of Mg dopant incorporated into the (1-x)KNN-xLS ceramic structure, the intensity of the Raman bands are enhanced, while the peak position and the full width at half maximum of the A1g(v1) mode was found to experience a clear dependence on both w and x. Raman characterization revealed that the mechanism of Mg doping is strongly correlated with the concentration of Li in the perovskite structure: Mg2+ ions will preferentially replace Li+ ions for low Mg doping while replace K/Na ions for higher doping of Mg. The PPT O → T was also found to be altered by the introduction of Mg and the critical value of LS concentration, xO-T, for incipient O → T transition in the KNN-xLS-wMT system was strongly dependent on Mg content, with xO → T being roughly equal to 0.04 + 2w, for the case of dilute Mg alloying.
A new image enhancement algorithm with applications to forestry stand mapping
NASA Technical Reports Server (NTRS)
Kan, E. P. F. (Principal Investigator); Lo, J. K.
1975-01-01
The author has identified the following significant results. Results show that the new algorithm produced cleaner classification maps in which holes of small predesignated sizes were eliminated and significant boundary information was preserved. These cleaner post-processed maps better resemble true life timber stand maps and are thus more usable products than the pre-post-processing ones: Compared to an accepted neighbor-checking post-processing technique, the new algorithm is more appropriate for timber stand mapping.
A Discrete Fruit Fly Optimization Algorithm for the Traveling Salesman Problem.
Jiang, Zi-Bin; Yang, Qiong
2016-01-01
The fruit fly optimization algorithm (FOA) is a newly developed bio-inspired algorithm. The continuous variant version of FOA has been proven to be a powerful evolutionary approach to determining the optima of a numerical function on a continuous definition domain. In this study, a discrete FOA (DFOA) is developed and applied to the traveling salesman problem (TSP), a common combinatorial problem. In the DFOA, the TSP tour is represented by an ordering of city indices, and the bio-inspired meta-heuristic search processes are executed with two elaborately designed main procedures: the smelling and tasting processes. In the smelling process, an effective crossover operator is used by the fruit fly group to search for the neighbors of the best-known swarm location. During the tasting process, an edge intersection elimination (EXE) operator is designed to improve the neighbors of the non-optimum food location in order to enhance the exploration performance of the DFOA. In addition, benchmark instances from the TSPLIB are classified in order to test the searching ability of the proposed algorithm. Furthermore, the effectiveness of the proposed DFOA is compared to that of other meta-heuristic algorithms. The results indicate that the proposed DFOA can be effectively used to solve TSPs, especially large-scale problems.