A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
ERIC Educational Resources Information Center
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
A Survey on Sentiment Classification in Face Recognition
NASA Astrophysics Data System (ADS)
Qian, Jingyu
2018-01-01
Face recognition has been an important topic for both industry and academia for a long time. K-means clustering, autoencoder, and convolutional neural network, each representing a design idea for face recognition method, are three popular algorithms to deal with face recognition problems. It is worthwhile to summarize and compare these three different algorithms. This paper will focus on one specific face recognition problem-sentiment classification from images. Three different algorithms for sentiment classification problems will be summarized, including k-means clustering, autoencoder, and convolutional neural network. An experiment with the application of these algorithms on a specific dataset of human faces will be conducted to illustrate how these algorithms are applied and their accuracy. Finally, the three algorithms are compared based on the accuracy result.
The Mucciardi-Gose Clustering Algorithm and Its Applications in Automatic Pattern Recognition.
A procedure known as the Mucciardi- Gose clustering algorithm, CLUSTR, for determining the geometrical or statistical relationships among groups of N...discussion of clustering algorithms is given; the particular advantages of the Mucciardi- Gose procedure are described. The mathematical basis for, and the
Automated thematic mapping and change detection of ERTS-A images
NASA Technical Reports Server (NTRS)
Gramenopoulos, N. (Principal Investigator)
1975-01-01
The author has identified the following significant results. In the first part of the investigation, spatial and spectral features were developed which were employed to automatically recognize terrain features through a clustering algorithm. In this part of the investigation, the size of the cell which is the number of digital picture elements used for computing the spatial and spectral features was varied. It was determined that the accuracy of terrain recognition decreases slowly as the cell size is reduced and coincides with increased cluster diffuseness. It was also proven that a cell size of 17 x 17 pixels when used with the clustering algorithm results in high recognition rates for major terrain classes. ERTS-1 data from five diverse geographic regions of the United States were processed through the clustering algorithm with 17 x 17 pixel cells. Simple land use maps were produced and the average terrain recognition accuracy was 82 percent.
Spatial pattern recognition of seismic events in South West Colombia
NASA Astrophysics Data System (ADS)
Benítez, Hernán D.; Flórez, Juan F.; Duque, Diana P.; Benavides, Alberto; Lucía Baquero, Olga; Quintero, Jiber
2013-09-01
Recognition of seismogenic zones in geographical regions supports seismic hazard studies. This recognition is usually based on visual, qualitative and subjective analysis of data. Spatial pattern recognition provides a well founded means to obtain relevant information from large amounts of data. The purpose of this work is to identify and classify spatial patterns in instrumental data of the South West Colombian seismic database. In this research, clustering tendency analysis validates whether seismic database possesses a clustering structure. A non-supervised fuzzy clustering algorithm creates groups of seismic events. Given the sensitivity of fuzzy clustering algorithms to centroid initial positions, we proposed a methodology to initialize centroids that generates stable partitions with respect to centroid initialization. As a result of this work, a public software tool provides the user with the routines developed for clustering methodology. The analysis of the seismogenic zones obtained reveals meaningful spatial patterns in South-West Colombia. The clustering analysis provides a quantitative location and dispersion of seismogenic zones that facilitates seismological interpretations of seismic activities in South West Colombia.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-05-21
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.
A fuzzy clustering algorithm to detect planar and quadric shapes
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa
1992-01-01
In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.
Identification of Alfalfa Leaf Diseases Using Image Recognition Technology
Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang
2016-01-01
Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease. PMID:27977767
Identification of Alfalfa Leaf Diseases Using Image Recognition Technology.
Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang
2016-01-01
Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
High-speed cell recognition algorithm for ultrafast flow cytometer imaging system.
Zhao, Wanyue; Wang, Chao; Chen, Hongwei; Chen, Minghua; Yang, Sigang
2018-04-01
An optical time-stretch flow imaging system enables high-throughput examination of cells/particles with unprecedented high speed and resolution. A significant amount of raw image data is produced. A high-speed cell recognition algorithm is, therefore, highly demanded to analyze large amounts of data efficiently. A high-speed cell recognition algorithm consisting of two-stage cascaded detection and Gaussian mixture model (GMM) classification is proposed. The first stage of detection extracts cell regions. The second stage integrates distance transform and the watershed algorithm to separate clustered cells. Finally, the cells detected are classified by GMM. We compared the performance of our algorithm with support vector machine. Results show that our algorithm increases the running speed by over 150% without sacrificing the recognition accuracy. This algorithm provides a promising solution for high-throughput and automated cell imaging and classification in the ultrafast flow cytometer imaging platform. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).
High-speed cell recognition algorithm for ultrafast flow cytometer imaging system
NASA Astrophysics Data System (ADS)
Zhao, Wanyue; Wang, Chao; Chen, Hongwei; Chen, Minghua; Yang, Sigang
2018-04-01
An optical time-stretch flow imaging system enables high-throughput examination of cells/particles with unprecedented high speed and resolution. A significant amount of raw image data is produced. A high-speed cell recognition algorithm is, therefore, highly demanded to analyze large amounts of data efficiently. A high-speed cell recognition algorithm consisting of two-stage cascaded detection and Gaussian mixture model (GMM) classification is proposed. The first stage of detection extracts cell regions. The second stage integrates distance transform and the watershed algorithm to separate clustered cells. Finally, the cells detected are classified by GMM. We compared the performance of our algorithm with support vector machine. Results show that our algorithm increases the running speed by over 150% without sacrificing the recognition accuracy. This algorithm provides a promising solution for high-throughput and automated cell imaging and classification in the ultrafast flow cytometer imaging platform.
Clustered Multi-Task Learning for Automatic Radar Target Recognition
Li, Cong; Bao, Weimin; Xu, Luping; Zhang, Hua
2017-01-01
Model training is a key technique for radar target recognition. Traditional model training algorithms in the framework of single task leaning ignore the relationships among multiple tasks, which degrades the recognition performance. In this paper, we propose a clustered multi-task learning, which can reveal and share the multi-task relationships for radar target recognition. To further make full use of these relationships, the latent multi-task relationships in the projection space are taken into consideration. Specifically, a constraint term in the projection space is proposed, the main idea of which is that multiple tasks within a close cluster should be close to each other in the projection space. In the proposed method, the cluster structures and multi-task relationships can be autonomously learned and utilized in both of the original and projected space. In view of the nonlinear characteristics of radar targets, the proposed method is extended to a non-linear kernel version and the corresponding non-linear multi-task solving method is proposed. Comprehensive experimental studies on simulated high-resolution range profile dataset and MSTAR SAR public database verify the superiority of the proposed method to some related algorithms. PMID:28953267
Park, Rachel; O'Brien, Thomas F; Huang, Susan S; Baker, Meghan A; Yokoe, Deborah S; Kulldorff, Martin; Barrett, Craig; Swift, Jamie; Stelling, John
2016-11-01
While antimicrobial resistance threatens the prevention, treatment, and control of infectious diseases, systematic analysis of routine microbiology laboratory test results worldwide can alert new threats and promote timely response. This study explores statistical algorithms for recognizing geographic clustering of multi-resistant microbes within a healthcare network and monitoring the dissemination of new strains over time. Escherichia coli antimicrobial susceptibility data from a three-year period stored in WHONET were analyzed across ten facilities in a healthcare network utilizing SaTScan's spatial multinomial model with two models for defining geographic proximity. We explored geographic clustering of multi-resistance phenotypes within the network and changes in clustering over time. Geographic clustering identified from both latitude/longitude and non-parametric facility groupings geographic models were similar, while the latter was offers greater flexibility and generalizability. Iterative application of the clustering algorithms suggested the possible recognition of the initial appearance of invasive E. coli ST131 in the clinical database of a single hospital and subsequent dissemination to others. Systematic analysis of routine antimicrobial resistance susceptibility test results supports the recognition of geographic clustering of microbial phenotypic subpopulations with WHONET and SaTScan, and iterative application of these algorithms can detect the initial appearance in and dissemination across a region prompting early investigation, response, and containment measures.
Possibilistic clustering for shape recognition
NASA Technical Reports Server (NTRS)
Keller, James M.; Krishnapuram, Raghu
1993-01-01
Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, the clustering problem was cast into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. An appropriate objective function whose minimum will characterize a good possibilistic partition of the data was constructed, and the membership and prototype update equations from necessary conditions for minimization of our criterion function were derived. The ability of this approach to detect linear and quartic curves in the presence of considerable noise is shown.
Possibilistic clustering for shape recognition
NASA Technical Reports Server (NTRS)
Keller, James M.; Krishnapuram, Raghu
1992-01-01
Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, we cast the clustering problem into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We constructed an appropriate objective function whose minimum will characterize a good possibilistic partition of the data, and we derived the membership and prototype update equations from necessary conditions for minimization of our criterion function. In this paper, we show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.
An incremental DPMM-based method for trajectory clustering, modeling, and retrieval.
Hu, Weiming; Li, Xi; Tian, Guodong; Maybank, Stephen; Zhang, Zhongfei
2013-05-01
Trajectory analysis is the basis for many applications, such as indexing of motion events in videos, activity recognition, and surveillance. In this paper, the Dirichlet process mixture model (DPMM) is applied to trajectory clustering, modeling, and retrieval. We propose an incremental version of a DPMM-based clustering algorithm and apply it to cluster trajectories. An appropriate number of trajectory clusters is determined automatically. When trajectories belonging to new clusters arrive, the new clusters can be identified online and added to the model without any retraining using the previous data. A time-sensitive Dirichlet process mixture model (tDPMM) is applied to each trajectory cluster for learning the trajectory pattern which represents the time-series characteristics of the trajectories in the cluster. Then, a parameterized index is constructed for each cluster. A novel likelihood estimation algorithm for the tDPMM is proposed, and a trajectory-based video retrieval model is developed. The tDPMM-based probabilistic matching method and the DPMM-based model growing method are combined to make the retrieval model scalable and adaptable. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our algorithm.
A Human Activity Recognition System Using Skeleton Data from RGBD Sensors.
Cippitelli, Enea; Gasparrini, Samuele; Gambi, Ennio; Spinsante, Susanna
2016-01-01
The aim of Active and Assisted Living is to develop tools to promote the ageing in place of elderly people, and human activity recognition algorithms can help to monitor aged people in home environments. Different types of sensors can be used to address this task and the RGBD sensors, especially the ones used for gaming, are cost-effective and provide much information about the environment. This work aims to propose an activity recognition algorithm exploiting skeleton data extracted by RGBD sensors. The system is based on the extraction of key poses to compose a feature vector, and a multiclass Support Vector Machine to perform classification. Computation and association of key poses are carried out using a clustering algorithm, without the need of a learning algorithm. The proposed approach is evaluated on five publicly available datasets for activity recognition, showing promising results especially when applied for the recognition of AAL related actions. Finally, the current applicability of this solution in AAL scenarios and the future improvements needed are discussed.
Park, Rachel; O'Brien, Thomas F.; Huang, Susan S.; Baker, Meghan A.; Yokoe, Deborah S.; Kulldorff, Martin; Barrett, Craig; Swift, Jamie; Stelling, John
2016-01-01
Objectives While antimicrobial resistance threatens the prevention, treatment, and control of infectious diseases, systematic analysis of routine microbiology laboratory test results worldwide can alert new threats and promote timely response. This study explores statistical algorithms for recognizing geographic clustering of multi-resistant microbes within a healthcare network and monitoring the dissemination of new strains over time. Methods Escherichia coli antimicrobial susceptibility data from a three-year period stored in WHONET were analyzed across ten facilities in a healthcare network utilizing SaTScan's spatial multinomial model with two models for defining geographic proximity. We explored geographic clustering of multi-resistance phenotypes within the network and changes in clustering over time. Results Geographic clustering identified from both latitude/longitude and non-parametric facility groupings geographic models were similar, while the latter was offers greater flexibility and generalizability. Iterative application of the clustering algorithms suggested the possible recognition of the initial appearance of invasive E. coli ST131 in the clinical database of a single hospital and subsequent dissemination to others. Conclusion Systematic analysis of routine antimicrobial resistance susceptibility test results supports the recognition of geographic clustering of microbial phenotypic subpopulations with WHONET and SaTScan, and iterative application of these algorithms can detect the initial appearance in and dissemination across a region prompting early investigation, response, and containment measures. PMID:27530311
NASA Astrophysics Data System (ADS)
Liu, Jianjun; Kan, Jianquan
2018-04-01
In this paper, based on the terahertz spectrum, a new identification method of genetically modified material by support vector machine (SVM) based on affinity propagation clustering is proposed. This algorithm mainly uses affinity propagation clustering algorithm to make cluster analysis and labeling on unlabeled training samples, and in the iterative process, the existing SVM training data are continuously updated, when establishing the identification model, it does not need to manually label the training samples, thus, the error caused by the human labeled samples is reduced, and the identification accuracy of the model is greatly improved.
NASA Astrophysics Data System (ADS)
Chen, Dan; Guo, Lin-yuan; Wang, Chen-hao; Ke, Xi-zheng
2017-07-01
Equalization can compensate channel distortion caused by channel multipath effects, and effectively improve convergent of modulation constellation diagram in optical wireless system. In this paper, the subspace blind equalization algorithm is used to preprocess M-ary phase shift keying (MPSK) subcarrier modulation signal in receiver. Mountain clustering is adopted to get the clustering centers of MPSK modulation constellation diagram, and the modulation order is automatically identified through the k-nearest neighbor (KNN) classifier. The experiment has been done under four different weather conditions. Experimental results show that the convergent of constellation diagram is improved effectively after using the subspace blind equalization algorithm, which means that the accuracy of modulation recognition is increased. The correct recognition rate of 16PSK can be up to 85% in any kind of weather condition which is mentioned in paper. Meanwhile, the correct recognition rate is the highest in cloudy and the lowest in heavy rain condition.
NASA Astrophysics Data System (ADS)
Cannata, A.; Montalto, P.; Aliotta, M.; Cassisi, C.; Pulvirenti, A.; Privitera, E.; Patanè, D.
2011-04-01
Active volcanoes generate sonic and infrasonic signals, whose investigation provides useful information for both monitoring purposes and the study of the dynamics of explosive phenomena. At Mt. Etna volcano (Italy), a pattern recognition system based on infrasonic waveform features has been developed. First, by a parametric power spectrum method, the features describing and characterizing the infrasound events were extracted: peak frequency and quality factor. Then, together with the peak-to-peak amplitude, these features constituted a 3-D ‘feature space’; by Density-Based Spatial Clustering of Applications with Noise algorithm (DBSCAN) three clusters were recognized inside it. After the clustering process, by using a common location method (semblance method) and additional volcanological information concerning the intensity of the explosive activity, we were able to associate each cluster to a particular source vent and/or a kind of volcanic activity. Finally, for automatic event location, clusters were used to train a model based on Support Vector Machine, calculating optimal hyperplanes able to maximize the margins of separation among the clusters. After the training phase this system automatically allows recognizing the active vent with no location algorithm and by using only a single station.
A Class of Manifold Regularized Multiplicative Update Algorithms for Image Clustering.
Yang, Shangming; Yi, Zhang; He, Xiaofei; Li, Xuelong
2015-12-01
Multiplicative update algorithms are important tools for information retrieval, image processing, and pattern recognition. However, when the graph regularization is added to the cost function, different classes of sample data may be mapped to the same subspace, which leads to the increase of data clustering error rate. In this paper, an improved nonnegative matrix factorization (NMF) cost function is introduced. Based on the cost function, a class of novel graph regularized NMF algorithms is developed, which results in a class of extended multiplicative update algorithms with manifold structure regularization. Analysis shows that in the learning, the proposed algorithms can efficiently minimize the rank of the data representation matrix. Theoretical results presented in this paper are confirmed by simulations. For different initializations and data sets, variation curves of cost functions and decomposition data are presented to show the convergence features of the proposed update rules. Basis images, reconstructed images, and clustering results are utilized to present the efficiency of the new algorithms. Last, the clustering accuracies of different algorithms are also investigated, which shows that the proposed algorithms can achieve state-of-the-art performance in applications of image clustering.
Infrared vehicle recognition using unsupervised feature learning based on K-feature
NASA Astrophysics Data System (ADS)
Lin, Jin; Tan, Yihua; Xia, Haijiao; Tian, Jinwen
2018-02-01
Subject to the complex battlefield environment, it is difficult to establish a complete knowledge base in practical application of vehicle recognition algorithms. The infrared vehicle recognition is always difficult and challenging, which plays an important role in remote sensing. In this paper we propose a new unsupervised feature learning method based on K-feature to recognize vehicle in infrared images. First, we use the target detection algorithm which is based on the saliency to detect the initial image. Then, the unsupervised feature learning based on K-feature, which is generated by Kmeans clustering algorithm that extracted features by learning a visual dictionary from a large number of samples without label, is calculated to suppress the false alarm and improve the accuracy. Finally, the vehicle target recognition image is finished by some post-processing. Large numbers of experiments demonstrate that the proposed method has satisfy recognition effectiveness and robustness for vehicle recognition in infrared images under complex backgrounds, and it also improve the reliability of it.
Constrained Metric Learning by Permutation Inducing Isometries.
Bosveld, Joel; Mahmood, Arif; Huynh, Du Q; Noakes, Lyle
2016-01-01
The choice of metric critically affects the performance of classification and clustering algorithms. Metric learning algorithms attempt to improve performance, by learning a more appropriate metric. Unfortunately, most of the current algorithms learn a distance function which is not invariant to rigid transformations of images. Therefore, the distances between two images and their rigidly transformed pair may differ, leading to inconsistent classification or clustering results. We propose to constrain the learned metric to be invariant to the geometry preserving transformations of images that induce permutations in the feature space. The constraint that these transformations are isometries of the metric ensures consistent results and improves accuracy. Our second contribution is a dimension reduction technique that is consistent with the isometry constraints. Our third contribution is the formulation of the isometry constrained logistic discriminant metric learning (IC-LDML) algorithm, by incorporating the isometry constraints within the objective function of the LDML algorithm. The proposed algorithm is compared with the existing techniques on the publicly available labeled faces in the wild, viewpoint-invariant pedestrian recognition, and Toy Cars data sets. The IC-LDML algorithm has outperformed existing techniques for the tasks of face recognition, person identification, and object classification by a significant margin.
Iris recognition using image moments and k-means algorithm.
Khan, Yaser Daanial; Khan, Sher Afzal; Ahmad, Farooq; Islam, Saeed
2014-01-01
This paper presents a biometric technique for identification of a person using the iris image. The iris is first segmented from the acquired image of an eye using an edge detection algorithm. The disk shaped area of the iris is transformed into a rectangular form. Described moments are extracted from the grayscale image which yields a feature vector containing scale, rotation, and translation invariant moments. Images are clustered using the k-means algorithm and centroids for each cluster are computed. An arbitrary image is assumed to belong to the cluster whose centroid is the nearest to the feature vector in terms of Euclidean distance computed. The described model exhibits an accuracy of 98.5%.
Iris Recognition Using Image Moments and k-Means Algorithm
Khan, Yaser Daanial; Khan, Sher Afzal; Ahmad, Farooq; Islam, Saeed
2014-01-01
This paper presents a biometric technique for identification of a person using the iris image. The iris is first segmented from the acquired image of an eye using an edge detection algorithm. The disk shaped area of the iris is transformed into a rectangular form. Described moments are extracted from the grayscale image which yields a feature vector containing scale, rotation, and translation invariant moments. Images are clustered using the k-means algorithm and centroids for each cluster are computed. An arbitrary image is assumed to belong to the cluster whose centroid is the nearest to the feature vector in terms of Euclidean distance computed. The described model exhibits an accuracy of 98.5%. PMID:24977221
NASA Astrophysics Data System (ADS)
Wei, B. G.; Huo, K. X.; Yao, Z. F.; Lou, J.; Li, X. Y.
2018-03-01
It is one of the difficult problems encountered in the research of condition maintenance technology of transformers to recognize partial discharge (PD) pattern. According to the main physical characteristics of PD, three models of oil-paper insulation defects were set up in laboratory to study the PD of transformers, and phase resolved partial discharge (PRPD) was constructed. By using least square method, the grey-scale images of PRPD were constructed and features of each grey-scale image were 28 box dimensions and 28 information dimensions. Affinity propagation algorithm based on manifold distance (AP-MD) for transformers PD pattern recognition was established, and the data of box dimension and information dimension were clustered based on AP-MD. Study shows that clustering result of AP-MD is better than the results of affinity propagation (AP), k-means and fuzzy c-means algorithm (FCM). By choosing different k values of k-nearest neighbor, we find clustering accuracy of AP-MD falls when k value is larger or smaller, and the optimal k value depends on sample size.
Fully convolutional network with cluster for semantic segmentation
NASA Astrophysics Data System (ADS)
Ma, Xiao; Chen, Zhongbi; Zhang, Jianlin
2018-04-01
At present, image semantic segmentation technology has been an active research topic for scientists in the field of computer vision and artificial intelligence. Especially, the extensive research of deep neural network in image recognition greatly promotes the development of semantic segmentation. This paper puts forward a method based on fully convolutional network, by cluster algorithm k-means. The cluster algorithm using the image's low-level features and initializing the cluster centers by the super-pixel segmentation is proposed to correct the set of points with low reliability, which are mistakenly classified in great probability, by the set of points with high reliability in each clustering regions. This method refines the segmentation of the target contour and improves the accuracy of the image segmentation.
Chaotic map clustering algorithm for EEG analysis
NASA Astrophysics Data System (ADS)
Bellotti, R.; De Carlo, F.; Stramaglia, S.
2004-03-01
The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
Adaptive fuzzy leader clustering of complex data sets in pattern recognition
NASA Technical Reports Server (NTRS)
Newton, Scott C.; Pemmaraju, Surya; Mitra, Sunanda
1992-01-01
A modular, unsupervised neural network architecture for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns on-line in a stable and efficient manner. The initial classification is performed in two stages: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from fuzzy C-means system equations for the centroids and the membership values. The AFLC algorithm is applied to the Anderson Iris data and laser-luminescent fingerprint image data. It is concluded that the AFLC algorithm successfully classifies features extracted from real data, discrete or continuous.
Designing a robust activity recognition framework for health and exergaming using wearable sensors.
Alshurafa, Nabil; Xu, Wenyao; Liu, Jason J; Huang, Ming-Chun; Mortazavi, Bobak; Roberts, Christian K; Sarrafzadeh, Majid
2014-09-01
Detecting human activity independent of intensity is essential in many applications, primarily in calculating metabolic equivalent rates and extracting human context awareness. Many classifiers that train on an activity at a subset of intensity levels fail to recognize the same activity at other intensity levels. This demonstrates weakness in the underlying classification method. Training a classifier for an activity at every intensity level is also not practical. In this paper, we tackle a novel intensity-independent activity recognition problem where the class labels exhibit large variability, the data are of high dimensionality, and clustering algorithms are necessary. We propose a new robust stochastic approximation framework for enhanced classification of such data. Experiments are reported using two clustering techniques, K-Means and Gaussian Mixture Models. The stochastic approximation algorithm consistently outperforms other well-known classification schemes which validate the use of our proposed clustered data representation. We verify the motivation of our framework in two applications that benefit from intensity-independent activity recognition. The first application shows how our framework can be used to enhance energy expenditure calculations. The second application is a novel exergaming environment aimed at using games to reward physical activity performed throughout the day, to encourage a healthy lifestyle.
Saeed, Faisal; Salim, Naomie; Abdo, Ammar
2013-07-01
Many consensus clustering methods have been applied in different areas such as pattern recognition, machine learning, information theory and bioinformatics. However, few methods have been used for chemical compounds clustering. In this paper, an information theory and voting based algorithm (Adaptive Cumulative Voting-based Aggregation Algorithm A-CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the ability of the clustering method to separate active from inactive molecules in each cluster, and the results were compared with Ward's method. The chemical dataset MDL Drug Data Report (MDDR) and the Maximum Unbiased Validation (MUV) dataset were used. Experiments suggest that the adaptive cumulative voting-based consensus method can improve the effectiveness of combining multiple clusterings of chemical structures. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Automated recognition of microcalcification clusters in mammograms
NASA Astrophysics Data System (ADS)
Bankman, Isaac N.; Christens-Barry, William A.; Kim, Dong W.; Weinberg, Irving N.; Gatewood, Olga B.; Brody, William R.
1993-07-01
The widespread and increasing use of mammographic screening for early breast cancer detection is placing a significant strain on clinical radiologists. Large numbers of radiographic films have to be visually interpreted in fine detail to determine the subtle hallmarks of cancer that may be present. We developed an algorithm for detecting microcalcification clusters, the most common and useful signs of early, potentially curable breast cancer. We describe this algorithm, which utilizes contour map representations of digitized mammographic films, and discuss its benefits in overcoming difficulties often encountered in algorithmic approaches to radiographic image processing. We present experimental analyses of mammographic films employing this contour-based algorithm and discuss practical issues relevant to its use in an automated film interpretation instrument.
Clustering of Farsi sub-word images for whole-book recognition
NASA Astrophysics Data System (ADS)
Soheili, Mohammad Reza; Kabir, Ehsanollah; Stricker, Didier
2015-01-01
Redundancy of word and sub-word occurrences in large documents can be effectively utilized in an OCR system to improve recognition results. Most OCR systems employ language modeling techniques as a post-processing step; however these techniques do not use important pictorial information that exist in the text image. In case of large-scale recognition of degraded documents, this information is even more valuable. In our previous work, we proposed a subword image clustering method for the applications dealing with large printed documents. In our clustering method, the ideal case is when all equivalent sub-word images lie in one cluster. To overcome the issues of low print quality, the clustering method uses an image matching algorithm for measuring the distance between two sub-word images. The measured distance with a set of simple shape features were used to cluster all sub-word images. In this paper, we analyze the effects of adding more shape features on processing time, purity of clustering, and the final recognition rate. Previously published experiments have shown the efficiency of our method on a book. Here we present extended experimental results and evaluate our method on another book with totally different font face. Also we show that the number of the new created clusters in a page can be used as a criteria for assessing the quality of print and evaluating preprocessing phases.
Dynamics of fragment formation in neutron-rich matter
NASA Astrophysics Data System (ADS)
Alcain, P. N.; Dorso, C. O.
2018-01-01
Background: Neutron stars are astronomical systems with nucleons subjected to extreme conditions. Due to the longer range Coulomb repulsion between protons, the system has structural inhomogeneities. Several interactions tailored to reproduce nuclear matter plus a screened Coulomb term reproduce these inhomogeneities known as nuclear pasta. These structural inhomogeneities, located in the crusts of neutron stars, can also arise in expanding systems depending on the thermodynamic conditions (temperature, proton fraction, etc.) and the expansion velocity. Purpose: We aim to find the dynamics of the fragment formation for expanding systems simulated according to the little big bang model. This expansion resembles the evolution of merging neutron stars. Method: We study the dynamics of the nucleons with semiclassical molecular dynamics models. Starting with an equilibrium configuration, we expand the system homogeneously until we arrive at an asymptotic configuration (i.e., very low final densities). We study, with four different cluster recognition algorithms, the fragment distribution throughout this expansion and the dynamics of the cluster formation. Results: Studying the topology of the equilibrium states, before the expansion, we reproduced the known pasta phases plus a novel phase we called pregnocchi, consisting of proton aggregates embedded in a neutron sea. We have identified different fragmentation regimes, depending on the initial temperature and fragment velocity. In particular, for the already mentioned pregnocchi, a neutron cloud surrounds the clusters during the early stages of the expansion, resulting in systems that give rise to configurations compatible with the emergence of the r process. Conclusions: We showed that a proper identification of the cluster distribution is highly dependent on the cluster recognition algorithm chosen, and found that the early cluster recognition algorithm (ECRA) was the most stable one. This approach allowed us to identify the dynamics of the fragment formation. These calculations pave the way to a comparison between Earth experiments and neutron star studies.
NASA Astrophysics Data System (ADS)
Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.
2014-06-01
The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test Γ coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, Daniela I.; Brumby, Steven P.; Rowland, Joel C.
Neuromimetic machine vision and pattern recognition algorithms are of great interest for landscape characterization and change detection in satellite imagery in support of global climate change science and modeling. We present results from an ongoing effort to extend machine vision methods to the environmental sciences, using adaptive sparse signal processing combined with machine learning. A Hebbian learning rule is used to build multispectral, multiresolution dictionaries from regional satellite normalized band difference index data. Land cover labels are automatically generated via our CoSA algorithm: Clustering of Sparse Approximations, using a clustering distance metric that combines spectral and spatial textural characteristics tomore » help separate geologic, vegetative, and hydrologie features. We demonstrate our method on example Worldview-2 satellite images of an Arctic region, and use CoSA labels to detect seasonal surface changes. In conclusion, our results suggest that neuroscience-based models are a promising approach to practical pattern recognition and change detection problems in remote sensing.« less
Moody, Daniela I.; Brumby, Steven P.; Rowland, Joel C.; ...
2014-10-01
Neuromimetic machine vision and pattern recognition algorithms are of great interest for landscape characterization and change detection in satellite imagery in support of global climate change science and modeling. We present results from an ongoing effort to extend machine vision methods to the environmental sciences, using adaptive sparse signal processing combined with machine learning. A Hebbian learning rule is used to build multispectral, multiresolution dictionaries from regional satellite normalized band difference index data. Land cover labels are automatically generated via our CoSA algorithm: Clustering of Sparse Approximations, using a clustering distance metric that combines spectral and spatial textural characteristics tomore » help separate geologic, vegetative, and hydrologie features. We demonstrate our method on example Worldview-2 satellite images of an Arctic region, and use CoSA labels to detect seasonal surface changes. In conclusion, our results suggest that neuroscience-based models are a promising approach to practical pattern recognition and change detection problems in remote sensing.« less
Mixed Pattern Matching-Based Traffic Abnormal Behavior Recognition
Cui, Zhiming; Zhao, Pengpeng
2014-01-01
A motion trajectory is an intuitive representation form in time-space domain for a micromotion behavior of moving target. Trajectory analysis is an important approach to recognize abnormal behaviors of moving targets. Against the complexity of vehicle trajectories, this paper first proposed a trajectory pattern learning method based on dynamic time warping (DTW) and spectral clustering. It introduced the DTW distance to measure the distances between vehicle trajectories and determined the number of clusters automatically by a spectral clustering algorithm based on the distance matrix. Then, it clusters sample data points into different clusters. After the spatial patterns and direction patterns learned from the clusters, a recognition method for detecting vehicle abnormal behaviors based on mixed pattern matching was proposed. The experimental results show that the proposed technical scheme can recognize main types of traffic abnormal behaviors effectively and has good robustness. The real-world application verified its feasibility and the validity. PMID:24605045
Sub-word image clustering in Farsi printed books
NASA Astrophysics Data System (ADS)
Soheili, Mohammad Reza; Kabir, Ehsanollah; Stricker, Didier
2015-02-01
Most OCR systems are designed for the recognition of a single page. In case of unfamiliar font faces, low quality papers and degraded prints, the performance of these products drops sharply. However, an OCR system can use redundancy of word occurrences in large documents to improve recognition results. In this paper, we propose a sub-word image clustering method for the applications dealing with large printed documents. We assume that the whole document is printed by a unique unknown font with low quality print. Our proposed method finds clusters of equivalent sub-word images with an incremental algorithm. Due to the low print quality, we propose an image matching algorithm for measuring the distance between two sub-word images, based on Hamming distance and the ratio of the area to the perimeter of the connected components. We built a ground-truth dataset of more than 111000 sub-word images to evaluate our method. All of these images were extracted from an old Farsi book. We cluster all of these sub-words, including isolated letters and even punctuation marks. Then all centers of created clusters are labeled manually. We show that all sub-words of the book can be recognized with more than 99.7% accuracy by assigning the label of each cluster center to all of its members.
Nie, Haitao; Long, Kehui; Ma, Jun; Yue, Dan; Liu, Jinguo
2015-01-01
Partial occlusions, large pose variations, and extreme ambient illumination conditions generally cause the performance degradation of object recognition systems. Therefore, this paper presents a novel approach for fast and robust object recognition in cluttered scenes based on an improved scale invariant feature transform (SIFT) algorithm and a fuzzy closed-loop control method. First, a fast SIFT algorithm is proposed by classifying SIFT features into several clusters based on several attributes computed from the sub-orientation histogram (SOH), in the feature matching phase only features that share nearly the same corresponding attributes are compared. Second, a feature matching step is performed following a prioritized order based on the scale factor, which is calculated between the object image and the target object image, guaranteeing robust feature matching. Finally, a fuzzy closed-loop control strategy is applied to increase the accuracy of the object recognition and is essential for autonomous object manipulation process. Compared to the original SIFT algorithm for object recognition, the result of the proposed method shows that the number of SIFT features extracted from an object has a significant increase, and the computing speed of the object recognition processes increases by more than 40%. The experimental results confirmed that the proposed method performs effectively and accurately in cluttered scenes. PMID:25714094
Real Time Intelligent Target Detection and Analysis with Machine Vision
NASA Technical Reports Server (NTRS)
Howard, Ayanna; Padgett, Curtis; Brown, Kenneth
2000-01-01
We present an algorithm for detecting a specified set of targets for an Automatic Target Recognition (ATR) application. ATR involves processing images for detecting, classifying, and tracking targets embedded in a background scene. We address the problem of discriminating between targets and nontarget objects in a scene by evaluating 40x40 image blocks belonging to an image. Each image block is first projected onto a set of templates specifically designed to separate images of targets embedded in a typical background scene from those background images without targets. These filters are found using directed principal component analysis which maximally separates the two groups. The projected images are then clustered into one of n classes based on a minimum distance to a set of n cluster prototypes. These cluster prototypes have previously been identified using a modified clustering algorithm based on prior sensed data. Each projected image pattern is then fed into the associated cluster's trained neural network for classification. A detailed description of our algorithm will be given in this paper. We outline our methodology for designing the templates, describe our modified clustering algorithm, and provide details on the neural network classifiers. Evaluation of the overall algorithm demonstrates that our detection rates approach 96% with a false positive rate of less than 0.03%.
Kamali, Tahereh; Stashuk, Daniel
2016-10-01
Robust and accurate segmentation of brain white matter (WM) fiber bundles assists in diagnosing and assessing progression or remission of neuropsychiatric diseases such as schizophrenia, autism and depression. Supervised segmentation methods are infeasible in most applications since generating gold standards is too costly. Hence, there is a growing interest in designing unsupervised methods. However, most conventional unsupervised methods require the number of clusters be known in advance which is not possible in most applications. The purpose of this study is to design an unsupervised segmentation algorithm for brain white matter fiber bundles which can automatically segment fiber bundles using intrinsic diffusion tensor imaging data information without considering any prior information or assumption about data distributions. Here, a new density based clustering algorithm called neighborhood distance entropy consistency (NDEC), is proposed which discovers natural clusters within data by simultaneously utilizing both local and global density information. The performance of NDEC is compared with other state of the art clustering algorithms including chameleon, spectral clustering, DBSCAN and k-means using Johns Hopkins University publicly available diffusion tensor imaging data. The performance of NDEC and other employed clustering algorithms were evaluated using dice ratio as an external evaluation criteria and density based clustering validation (DBCV) index as an internal evaluation metric. Across all employed clustering algorithms, NDEC obtained the highest average dice ratio (0.94) and DBCV value (0.71). NDEC can find clusters with arbitrary shapes and densities and consequently can be used for WM fiber bundle segmentation where there is no distinct boundary between various bundles. NDEC may also be used as an effective tool in other pattern recognition and medical diagnostic systems in which discovering natural clusters within data is a necessity. Copyright © 2016 Elsevier B.V. All rights reserved.
Is it worth changing pattern recognition methods for structural health monitoring?
NASA Astrophysics Data System (ADS)
Bull, L. A.; Worden, K.; Cross, E. J.; Dervilis, N.
2017-05-01
The key element of this work is to demonstrate alternative strategies for using pattern recognition algorithms whilst investigating structural health monitoring. This paper looks to determine if it makes any difference in choosing from a range of established classification techniques: from decision trees and support vector machines, to Gaussian processes. Classification algorithms are tested on adjustable synthetic data to establish performance metrics, then all techniques are applied to real SHM data. To aid the selection of training data, an informative chain of artificial intelligence tools is used to explore an active learning interaction between meaningful clusters of data.
NASA Astrophysics Data System (ADS)
Tian, Fuyang; Cao, Dong; Dong, Xiaoning; Zhao, Xinqiang; Li, Fade; Wang, Zhonghua
2017-06-01
Behavioral features recognition was an important effect to detect oestrus and sickness in dairy herds and there is a need for heat detection aid. The detection method was based on the measure of the individual behavioural activity, standing time, and temperature of dairy using vibrational sensor and temperature sensor in this paper. The data of behavioural activity index, standing time, lying time and walking time were sent to computer by lower power consumption wireless communication system. The fast approximate K-means algorithm (FAKM) was proposed to deal the data of the sensor for behavioral features recognition. As a result of technical progress in monitoring cows using computers, automatic oestrus detection has become possible.
Multi-exemplar affinity propagation.
Wang, Chang-Dong; Lai, Jian-Huang; Suen, Ching Y; Zhu, Jun-Yong
2013-09-01
The affinity propagation (AP) clustering algorithm has received much attention in the past few years. AP is appealing because it is efficient, insensitive to initialization, and it produces clusters at a lower error rate than other exemplar-based methods. However, its single-exemplar model becomes inadequate when applied to model multisubclasses in some situations such as scene analysis and character recognition. To remedy this deficiency, we have extended the single-exemplar model to a multi-exemplar one to create a new multi-exemplar affinity propagation (MEAP) algorithm. This new model automatically determines the number of exemplars in each cluster associated with a super exemplar to approximate the subclasses in the category. Solving the model is NP-hard and we tackle it with the max-sum belief propagation to produce neighborhood maximum clusters, with no need to specify beforehand the number of clusters, multi-exemplars, and superexemplars. Also, utilizing the sparsity in the data, we are able to reduce substantially the computational time and storage. Experimental studies have shown MEAP's significant improvements over other algorithms on unsupervised image categorization and the clustering of handwritten digits.
Pattern recognition for Space Applications Center director's discretionary fund
NASA Technical Reports Server (NTRS)
Singley, M. E.
1984-01-01
Results and conclusions are presented on the application of recent developments in pattern recognition to spacecraft star mapping systems. Sensor data for two representative starfields are processed by an adaptive shape-seeking version of the Fc-V algorithm with good results. Cluster validity measures are evaluated, but not found especially useful to this application. Recommendations are given two system configurations worthy of additional study,
Invariant-feature-based adaptive automatic target recognition in obscured 3D point clouds
NASA Astrophysics Data System (ADS)
Khuon, Timothy; Kershner, Charles; Mattei, Enrico; Alverio, Arnel; Rand, Robert
2014-06-01
Target recognition and classification in a 3D point cloud is a non-trivial process due to the nature of the data collected from a sensor system. The signal can be corrupted by noise from the environment, electronic system, A/D converter, etc. Therefore, an adaptive system with a desired tolerance is required to perform classification and recognition optimally. The feature-based pattern recognition algorithm architecture as described below is particularly devised for solving a single-sensor classification non-parametrically. Feature set is extracted from an input point cloud, normalized, and classifier a neural network classifier. For instance, automatic target recognition in an urban area would require different feature sets from one in a dense foliage area. The figure above (see manuscript) illustrates the architecture of the feature based adaptive signature extraction of 3D point cloud including LIDAR, RADAR, and electro-optical data. This network takes a 3D cluster and classifies it into a specific class. The algorithm is a supervised and adaptive classifier with two modes: the training mode and the performing mode. For the training mode, a number of novel patterns are selected from actual or artificial data. A particular 3D cluster is input to the network as shown above for the decision class output. The network consists of three sequential functional modules. The first module is for feature extraction that extracts the input cluster into a set of singular value features or feature vector. Then the feature vector is input into the feature normalization module to normalize and balance it before being fed to the neural net classifier for the classification. The neural net can be trained by actual or artificial novel data until each trained output reaches the declared output within the defined tolerance. In case new novel data is added after the neural net has been learned, the training is then resumed until the neural net has incrementally learned with the new novel data. The associative memory capability of the neural net enables the incremental learning. The back propagation algorithm or support vector machine can be utilized for the classification and recognition.
Fuzzy Set Methods for Object Recognition in Space Applications
NASA Technical Reports Server (NTRS)
Keller, James M. (Editor)
1992-01-01
Progress on the following four tasks is described: (1) fuzzy set based decision methodologies; (2) membership calculation; (3) clustering methods (including derivation of pose estimation parameters), and (4) acquisition of images and testing of algorithms.
A possibilistic approach to clustering
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Keller, James M.
1993-01-01
Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering methods in that total commitment of a vector to a given class is not required at each image pattern recognition iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from the 'Fuzzy C-Means' (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Recently, we cast the clustering problem into the framework of possibility theory using an approach in which the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.
Adaptive fuzzy system for 3-D vision
NASA Technical Reports Server (NTRS)
Mitra, Sunanda
1993-01-01
An adaptive fuzzy system using the concept of the Adaptive Resonance Theory (ART) type neural network architecture and incorporating fuzzy c-means (FCM) system equations for reclassification of cluster centers was developed. The Adaptive Fuzzy Leader Clustering (AFLC) architecture is a hybrid neural-fuzzy system which learns on-line in a stable and efficient manner. The system uses a control structure similar to that found in the Adaptive Resonance Theory (ART-1) network to identify the cluster centers initially. The initial classification of an input takes place in a two stage process; a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from Fuzzy c-Means (FCM) system equations for the centroids and the membership values. The operational characteristics of AFLC and the critical parameters involved in its operation are discussed. The performance of the AFLC algorithm is presented through application of the algorithm to the Anderson Iris data, and laser-luminescent fingerprint image data. The AFLC algorithm successfully classifies features extracted from real data, discrete or continuous, indicating the potential strength of this new clustering algorithm in analyzing complex data sets. The hybrid neuro-fuzzy AFLC algorithm will enhance analysis of a number of difficult recognition and control problems involved with Tethered Satellite Systems and on-orbit space shuttle attitude controller.
Human recognition based on head-shoulder contour extraction and BP neural network
NASA Astrophysics Data System (ADS)
Kong, Xiao-fang; Wang, Xiu-qin; Gu, Guohua; Chen, Qian; Qian, Wei-xian
2014-11-01
In practical application scenarios like video surveillance and human-computer interaction, human body movements are uncertain because the human body is a non-rigid object. Based on the fact that the head-shoulder part of human body can be less affected by the movement, and will seldom be obscured by other objects, in human detection and recognition, a head-shoulder model with its stable characteristics can be applied as a detection feature to describe the human body. In order to extract the head-shoulder contour accurately, a head-shoulder model establish method with combination of edge detection and the mean-shift algorithm in image clustering has been proposed in this paper. First, an adaptive method of mixture Gaussian background update has been used to extract targets from the video sequence. Second, edge detection has been used to extract the contour of moving objects, and the mean-shift algorithm has been combined to cluster parts of target's contour. Third, the head-shoulder model can be established, according to the width and height ratio of human head-shoulder combined with the projection histogram of the binary image, and the eigenvectors of the head-shoulder contour can be acquired. Finally, the relationship between head-shoulder contour eigenvectors and the moving objects will be formed by the training of back-propagation (BP) neural network classifier, and the human head-shoulder model can be clustered for human detection and recognition. Experiments have shown that the method combined with edge detection and mean-shift algorithm proposed in this paper can extract the complete head-shoulder contour, with low calculating complexity and high efficiency.
Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm
NASA Technical Reports Server (NTRS)
Mitra, Sunanda; Pemmaraju, Surya
1992-01-01
Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.
Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang Xiaojia; Mao Qirong; Zhan Yongzhao
There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions.more » The experiments show that this method can improve the recognition rate and the time of feature extraction.« less
Fingerprint recognition of wavelet-based compressed images by neuro-fuzzy clustering
NASA Astrophysics Data System (ADS)
Liu, Ti C.; Mitra, Sunanda
1996-06-01
Image compression plays a crucial role in many important and diverse applications requiring efficient storage and transmission. This work mainly focuses on a wavelet transform (WT) based compression of fingerprint images and the subsequent classification of the reconstructed images. The algorithm developed involves multiresolution wavelet decomposition, uniform scalar quantization, entropy and run- length encoder/decoder and K-means clustering of the invariant moments as fingerprint features. The performance of the WT-based compression algorithm has been compared with JPEG current image compression standard. Simulation results show that WT outperforms JPEG in high compression ratio region and the reconstructed fingerprint image yields proper classification.
Clustering by reordering of similarity and Laplacian matrices: Application to galaxy clusters
NASA Astrophysics Data System (ADS)
Mahmoud, E.; Shoukry, A.; Takey, A.
2018-04-01
Similarity metrics, kernels and similarity-based algorithms have gained much attention due to their increasing applications in information retrieval, data mining, pattern recognition and machine learning. Similarity Graphs are often adopted as the underlying representation of similarity matrices and are at the origin of known clustering algorithms such as spectral clustering. Similarity matrices offer the advantage of working in object-object (two-dimensional) space where visualization of clusters similarities is available instead of object-features (multi-dimensional) space. In this paper, sparse ɛ-similarity graphs are constructed and decomposed into strong components using appropriate methods such as Dulmage-Mendelsohn permutation (DMperm) and/or Reverse Cuthill-McKee (RCM) algorithms. The obtained strong components correspond to groups (clusters) in the input (feature) space. Parameter ɛi is estimated locally, at each data point i from a corresponding narrow range of the number of nearest neighbors. Although more advanced clustering techniques are available, our method has the advantages of simplicity, better complexity and direct visualization of the clusters similarities in a two-dimensional space. Also, no prior information about the number of clusters is needed. We conducted our experiments on two and three dimensional, low and high-sized synthetic datasets as well as on an astronomical real-dataset. The results are verified graphically and analyzed using gap statistics over a range of neighbors to verify the robustness of the algorithm and the stability of the results. Combining the proposed algorithm with gap statistics provides a promising tool for solving clustering problems. An astronomical application is conducted for confirming the existence of 45 galaxy clusters around the X-ray positions of galaxy clusters in the redshift range [0.1..0.8]. We re-estimate the photometric redshifts of the identified galaxy clusters and obtain acceptable values compared to published spectroscopic redshifts with a 0.029 standard deviation of their differences.
NASA Astrophysics Data System (ADS)
Krasilenko, Vladimir G.; Lazarev, Alexander A.; Nikitovich, Diana V.
2018-03-01
The biologically-motivated self-learning equivalence-convolutional recurrent-multilayer neural structures (BLM_SL_EC_RMNS) for fragments images clustering and recognition will be discussed. We shall consider these neural structures and their spatial-invariant equivalental models (SIEMs) based on proposed equivalent two-dimensional functions of image similarity and the corresponding matrix-matrix (or tensor) procedures using as basic operations of continuous logic and nonlinear processing. These SIEMs can simply describe the signals processing during the all training and recognition stages and they are suitable for unipolar-coding multilevel signals. The clustering efficiency in such models and their implementation depends on the discriminant properties of neural elements of hidden layers. Therefore, the main models and architecture parameters and characteristics depends on the applied types of non-linear processing and function used for image comparison or for adaptive-equivalent weighing of input patterns. We show that these SL_EC_RMNSs have several advantages, such as the self-study and self-identification of features and signs of the similarity of fragments, ability to clustering and recognize of image fragments with best efficiency and strong mutual correlation. The proposed combined with learning-recognition clustering method of fragments with regard to their structural features is suitable not only for binary, but also color images and combines self-learning and the formation of weight clustered matrix-patterns. Its model is constructed and designed on the basis of recursively continuous logic and nonlinear processing algorithms and to k-average method or method the winner takes all (WTA). The experimental results confirmed that fragments with a large numbers of elements may be clustered. For the first time the possibility of generalization of these models for space invariant case is shown. The experiment for an images of different dimensions (a reference array) and fragments with diferent dimensions for clustering is carried out. The experiments, using the software environment Mathcad showed that the proposed method is universal, has a significant convergence, the small number of iterations is easily, displayed on the matrix structure, and confirmed its prospects. Thus, to understand the mechanisms of self-learning equivalence-convolutional clustering, accompanying her to the competitive processes in neurons, and the neural auto-encoding-decoding and recognition principles with the use of self-learning cluster patterns is very important which used the algorithm and the principles of non-linear processing of two-dimensional spatial functions of images comparison. The experimental results show that such models can be successfully used for auto- and hetero-associative recognition. Also they can be used to explain some mechanisms, known as "the reinforcementinhibition concept". Also we demonstrate a real model experiments, which confirm that the nonlinear processing by equivalent function allow to determine the neuron-winners and customize the weight matrix. At the end of the report, we will show how to use the obtained results and to propose new more efficient hardware architecture of SL_EC_RMNS based on matrix-tensor multipliers. Also we estimate the parameters and performance of such architectures.
A Human Activity Recognition System Based on Dynamic Clustering of Skeleton Data.
Manzi, Alessandro; Dario, Paolo; Cavallo, Filippo
2017-05-11
Human activity recognition is an important area in computer vision, with its wide range of applications including ambient assisted living. In this paper, an activity recognition system based on skeleton data extracted from a depth camera is presented. The system makes use of machine learning techniques to classify the actions that are described with a set of a few basic postures. The training phase creates several models related to the number of clustered postures by means of a multiclass Support Vector Machine (SVM), trained with Sequential Minimal Optimization (SMO). The classification phase adopts the X-means algorithm to find the optimal number of clusters dynamically. The contribution of the paper is twofold. The first aim is to perform activity recognition employing features based on a small number of informative postures, extracted independently from each activity instance; secondly, it aims to assess the minimum number of frames needed for an adequate classification. The system is evaluated on two publicly available datasets, the Cornell Activity Dataset (CAD-60) and the Telecommunication Systems Team (TST) Fall detection dataset. The number of clusters needed to model each instance ranges from two to four elements. The proposed approach reaches excellent performances using only about 4 s of input data (~100 frames) and outperforms the state of the art when it uses approximately 500 frames on the CAD-60 dataset. The results are promising for the test in real context.
Tang, Jialin; Soua, Slim; Mares, Cristinel; Gan, Tat-Hean
2017-01-01
The identification of particular types of damage in wind turbine blades using acoustic emission (AE) techniques is a significant emerging field. In this work, a 45.7-m turbine blade was subjected to flap-wise fatigue loading for 21 days, during which AE was measured by internally mounted piezoelectric sensors. This paper focuses on using unsupervised pattern recognition methods to characterize different AE activities corresponding to different fracture mechanisms. A sequential feature selection method based on a k-means clustering algorithm is used to achieve a fine classification accuracy. The visualization of clusters in peak frequency−frequency centroid features is used to correlate the clustering results with failure modes. The positions of these clusters in time domain features, average frequency−MARSE, and average frequency−peak amplitude are also presented in this paper (where MARSE represents the Measured Area under Rectified Signal Envelope). The results show that these parameters are representative for the classification of the failure modes. PMID:29104245
Tang, Jialin; Soua, Slim; Mares, Cristinel; Gan, Tat-Hean
2017-11-01
The identification of particular types of damage in wind turbine blades using acoustic emission (AE) techniques is a significant emerging field. In this work, a 45.7-m turbine blade was subjected to flap-wise fatigue loading for 21 days, during which AE was measured by internally mounted piezoelectric sensors. This paper focuses on using unsupervised pattern recognition methods to characterize different AE activities corresponding to different fracture mechanisms. A sequential feature selection method based on a k-means clustering algorithm is used to achieve a fine classification accuracy. The visualization of clusters in peak frequency-frequency centroid features is used to correlate the clustering results with failure modes. The positions of these clusters in time domain features, average frequency-MARSE, and average frequency-peak amplitude are also presented in this paper (where MARSE represents the Measured Area under Rectified Signal Envelope). The results show that these parameters are representative for the classification of the failure modes.
NASA Technical Reports Server (NTRS)
Gramenopoulos, N. (Principal Investigator)
1974-01-01
The author has identified the following significant results. A diffraction pattern analysis of MSS images led to the development of spatial signatures for farm land, urban areas and mountains. Four spatial features are employed to describe the spatial characteristics of image cells in the digital data. Three spectral features are combined with the spatial features to form a seven dimensional vector describing each cell. Then, the classification of the feature vectors is accomplished by using the maximum likelihood criterion. It was determined that the recognition accuracy with the maximum likelihood criterion depends on the statistics of the feature vectors. It was also determined that for a given geographic area the statistics of the classes remain invariable for a period of a month, but vary substantially between seasons. Three ERTS-1 images from the Phoenix, Arizona area were processed, and recognition rates between 85% and 100% were obtained for the terrain classes of desert, farms, mountains, and urban areas. To eliminate the need for training data, a new clustering algorithm has been developed. Seven ERTS-1 images from four test sites have been processed through the clustering algorithm, and high recognition rates have been achieved for all terrain classes.
Detection of maize kernels breakage rate based on K-means clustering
NASA Astrophysics Data System (ADS)
Yang, Liang; Wang, Zhuo; Gao, Lei; Bai, Xiaoping
2017-04-01
In order to optimize the recognition accuracy of maize kernels breakage detection and improve the detection efficiency of maize kernels breakage, this paper using computer vision technology and detecting of the maize kernels breakage based on K-means clustering algorithm. First, the collected RGB images are converted into Lab images, then the original images clarity evaluation are evaluated by the energy function of Sobel 8 gradient. Finally, the detection of maize kernels breakage using different pixel acquisition equipments and different shooting angles. In this paper, the broken maize kernels are identified by the color difference between integrity kernels and broken kernels. The original images clarity evaluation and different shooting angles are taken to verify that the clarity and shooting angles of the images have a direct influence on the feature extraction. The results show that K-means clustering algorithm can distinguish the broken maize kernels effectively.
BioTextQuest: a web-based biomedical text mining suite for concept discovery.
Papanikolaou, Nikolas; Pafilis, Evangelos; Nikolaou, Stavros; Ouzounis, Christos A; Iliopoulos, Ioannis; Promponas, Vasilis J
2011-12-01
BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. http://biotextquest.biol.ucy.ac.cy vprobon@ucy.ac.cy; iliopj@med.uoc.gr Supplementary data are available at Bioinformatics online.
NASA Astrophysics Data System (ADS)
Hortos, William S.
2008-04-01
Proposed distributed wavelet-based algorithms are a means to compress sensor data received at the nodes forming a wireless sensor network (WSN) by exchanging information between neighboring sensor nodes. Local collaboration among nodes compacts the measurements, yielding a reduced fused set with equivalent information at far fewer nodes. Nodes may be equipped with multiple sensor types, each capable of sensing distinct phenomena: thermal, humidity, chemical, voltage, or image signals with low or no frequency content as well as audio, seismic or video signals within defined frequency ranges. Compression of the multi-source data through wavelet-based methods, distributed at active nodes, reduces downstream processing and storage requirements along the paths to sink nodes; it also enables noise suppression and more energy-efficient query routing within the WSN. Targets are first detected by the multiple sensors; then wavelet compression and data fusion are applied to the target returns, followed by feature extraction from the reduced data; feature data are input to target recognition/classification routines; targets are tracked during their sojourns through the area monitored by the WSN. Algorithms to perform these tasks are implemented in a distributed manner, based on a partition of the WSN into clusters of nodes. In this work, a scheme of collaborative processing is applied for hierarchical data aggregation and decorrelation, based on the sensor data itself and any redundant information, enabled by a distributed, in-cluster wavelet transform with lifting that allows multiple levels of resolution. The wavelet-based compression algorithm significantly decreases RF bandwidth and other resource use in target processing tasks. Following wavelet compression, features are extracted. The objective of feature extraction is to maximize the probabilities of correct target classification based on multi-source sensor measurements, while minimizing the resource expenditures at participating nodes. Therefore, the feature-extraction method based on the Haar DWT is presented that employs a maximum-entropy measure to determine significant wavelet coefficients. Features are formed by calculating the energy of coefficients grouped around the competing clusters. A DWT-based feature extraction algorithm used for vehicle classification in WSNs can be enhanced by an added rule for selecting the optimal number of resolution levels to improve the correct classification rate and reduce energy consumption expended in local algorithm computations. Published field trial data for vehicular ground targets, measured with multiple sensor types, are used to evaluate the wavelet-assisted algorithms. Extracted features are used in established target recognition routines, e.g., the Bayesian minimum-error-rate classifier, to compare the effects on the classification performance of the wavelet compression. Simulations of feature sets and recognition routines at different resolution levels in target scenarios indicate the impact on classification rates, while formulas are provided to estimate reduction in resource use due to distributed compression.
A method of depth image based human action recognition
NASA Astrophysics Data System (ADS)
Li, Pei; Cheng, Wanli
2017-05-01
In this paper, we propose an action recognition algorithm framework based on human skeleton joint information. In order to extract the feature of human motion, we use the information of body posture, speed and acceleration of movement to construct spatial motion feature that can describe and reflect the joint. On the other hand, we use the classical temporal pyramid matching algorithm to construct temporal feature and describe the motion sequence variation from different time scales. Then, we use bag of words to represent these actions, which is to present every action in the histogram by clustering these extracted feature. Finally, we employ Hidden Markov Model to train and test the extracted motion features. In the experimental part, the correctness and effectiveness of the proposed model are comprehensively verified on two well-known datasets.
Software tool for data mining and its applications
NASA Astrophysics Data System (ADS)
Yang, Jie; Ye, Chenzhou; Chen, Nianyi
2002-03-01
A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.
Competitive Deep-Belief Networks for Underwater Acoustic Target Recognition
Shen, Sheng; Yao, Xiaohui; Sheng, Meiping; Wang, Chen
2018-01-01
Underwater acoustic target recognition based on ship-radiated noise belongs to the small-sample-size recognition problems. A competitive deep-belief network is proposed to learn features with more discriminative information from labeled and unlabeled samples. The proposed model consists of four stages: (1) A standard restricted Boltzmann machine is pretrained using a large number of unlabeled data to initialize its parameters; (2) the hidden units are grouped according to categories, which provides an initial clustering model for competitive learning; (3) competitive training and back-propagation algorithms are used to update the parameters to accomplish the task of clustering; (4) by applying layer-wise training and supervised fine-tuning, a deep neural network is built to obtain features. Experimental results show that the proposed method can achieve classification accuracy of 90.89%, which is 8.95% higher than the accuracy obtained by the compared methods. In addition, the highest accuracy of our method is obtained with fewer features than other methods. PMID:29570642
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms
NASA Astrophysics Data System (ADS)
Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel
2016-04-01
Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and Diamantaras, K.: 'Programming and architecture of parallel processing systems', 1st Edition, Eds. Kleidarithmos, 2011 [4] NVIDIA.: 'NVidia CUDA C Programming Guide', version 5.0, NVidia (reference book) [5] Konstantaras, A.: 'Classification of Distinct Seismic Regions and Regional Temporal Modelling of Seismicity in the Vicinity of the Hellenic Seismic Arc', IEEE Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6 (4), pp. 1857-1863, 2013 [6] Konstantaras, A. Varley, M.R.,. Valianatos, F., Collins, G. and Holifield, P.: 'Recognition of electric earthquake precursors using neuro-fuzzy models: methodology and simulation results', Proc. IASTED International Conference on Signal Processing Pattern Recognition and Applications (SPPRA 2002), Crete, Greece, 2002, pp 303-308, 2002 [7] Konstantaras, A., Katsifarakis, E., Maravelakis, E., Skounakis, E., Kokkinos, E. and Karapidakis, E.: 'Intelligent Spatial-Clustering of Seismicity in the Vicinity of the Hellenic Seismic Arc', Earth Science Research, vol. 1 (2), pp. 1-10, 2012 [8] Georgoulas, G., Konstantaras, A., Katsifarakis, E., Stylios, C.D., Maravelakis, E. and Vachtsevanos, G.: '"Seismic-Mass" Density-based Algorithm for Spatio-Temporal Clustering', Expert Systems with Applications, vol. 40 (10), pp. 4183-4189, 2013 [9] Konstantaras, A. J.: 'Expert knowledge-based algorithm for the dynamic discrimination of interactive natural clusters', Earth Science Informatics, 2015 (In Press, see: www.scopus.com) [10] Drakatos, G. and Latoussakis, J.: 'A catalog of aftershock sequences in Greece (1971-1997): Their spatial and temporal characteristics', Journal of Seismology, vol. 5, pp. 137-145, 2001
State Recognition and Visualization of Hoisting Motor of Quayside Container Crane Based on SOFM
NASA Astrophysics Data System (ADS)
Yang, Z. Q.; He, P.; Tang, G.; Hu, X.
2017-07-01
The neural network structure and algorithm of self-organizing feature map (SOFM) are researched and analysed. The method is applied to state recognition and visualization of the quayside container crane hoisting motor. By using SOFM, the clustering and visualization of attribute reduction of data are carried out, and three kinds motor states are obtained with Root Mean Square(RMS), Impulse Index and Margin Index, and the simulation visualization interface is realized by MATLAB. Through the processing of the sample data, it can realize the accurate identification of the motor state, thus provide better monitoring of the quayside container crane hoisting motor and a new way for the mechanical state recognition.
Data Mining Technologies Inspired from Visual Principle
NASA Astrophysics Data System (ADS)
Xu, Zongben
In this talk we review the recent work done by our group on data mining (DM) technologies deduced from simulating visual principle. Through viewing a DM problem as a cognition problems and treading a data set as an image with each light point located at a datum position, we developed a series of high efficient algorithms for clustering, classification and regression via mimicking visual principles. In pattern recognition, human eyes seem to possess a singular aptitude to group objects and find important structure in an efficient way. Thus, a DM algorithm simulating visual system may solve some basic problems in DM research. From this point of view, we proposed a new approach for data clustering by modeling the blurring effect of lateral retinal interconnections based on scale space theory. In this approach, as the data image blurs, smaller light blobs merge into large ones until the whole image becomes one light blob at a low enough level of resolution. By identifying each blob with a cluster, the blurring process then generates a family of clustering along the hierarchy. The proposed approach provides unique solutions to many long standing problems, such as the cluster validity and the sensitivity to initialization problems, in clustering. We extended such an approach to classification and regression problems, through combatively employing the Weber's law in physiology and the cell response classification facts. The resultant classification and regression algorithms are proven to be very efficient and solve the problems of model selection and applicability to huge size of data set in DM technologies. We finally applied the similar idea to the difficult parameter setting problem in support vector machine (SVM). Viewing the parameter setting problem as a recognition problem of choosing a visual scale at which the global and local structures of a data set can be preserved, and the difference between the two structures be maximized in the feature space, we derived a direct parameter setting formula for the Gaussian SVM. The simulations and applications show that the suggested formula significantly outperforms the known model selection methods in terms of efficiency and precision.
NASA Astrophysics Data System (ADS)
Sarparandeh, Mohammadali; Hezarkhani, Ardeshir
2017-12-01
The use of efficient methods for data processing has always been of interest to researchers in the field of earth sciences. Pattern recognition techniques are appropriate methods for high-dimensional data such as geochemical data. Evaluation of the geochemical distribution of rare earth elements (REEs) requires the use of such methods. In particular, the multivariate nature of REE data makes them a good target for numerical analysis. The main subject of this paper is application of unsupervised pattern recognition approaches in evaluating geochemical distribution of REEs in the Kiruna type magnetite-apatite deposit of Se-Chahun. For this purpose, 42 bulk lithology samples were collected from the Se-Chahun iron ore deposit. In this study, 14 rare earth elements were measured with inductively coupled plasma mass spectrometry (ICP-MS). Pattern recognition makes it possible to evaluate the relations between the samples based on all these 14 features, simultaneously. In addition to providing easy solutions, discovery of the hidden information and relations of data samples is the advantage of these methods. Therefore, four clustering methods (unsupervised pattern recognition) - including a modified basic sequential algorithmic scheme (MBSAS), hierarchical (agglomerative) clustering, k-means clustering and self-organizing map (SOM) - were applied and results were evaluated using the silhouette criterion. Samples were clustered in four types. Finally, the results of this study were validated with geological facts and analysis results from, for example, scanning electron microscopy (SEM), X-ray diffraction (XRD), ICP-MS and optical mineralogy. The results of the k-means clustering and SOM methods have the best matches with reality, with experimental studies of samples and with field surveys. Since only the rare earth elements are used in this division, a good agreement of the results with lithology is considerable. It is concluded that the combination of the proposed methods and geological studies leads to finding some hidden information, and this approach has the best results compared to using only one of them.
A super resolution framework for low resolution document image OCR
NASA Astrophysics Data System (ADS)
Ma, Di; Agam, Gady
2013-01-01
Optical character recognition is widely used for converting document images into digital media. Existing OCR algorithms and tools produce good results from high resolution, good quality, document images. In this paper, we propose a machine learning based super resolution framework for low resolution document image OCR. Two main techniques are used in our proposed approach: a document page segmentation algorithm and a modified K-means clustering algorithm. Using this approach, by exploiting coherence in the document, we reconstruct from a low resolution document image a better resolution image and improve OCR results. Experimental results show substantial gain in low resolution documents such as the ones captured from video.
NASA Astrophysics Data System (ADS)
Bushel, Pierre R.; Bennett, Lee; Hamadeh, Hisham; Green, James; Ableson, Alan; Misener, Steve; Paules, Richard; Afshari, Cynthia
2002-06-01
We present an analysis of pattern recognition procedures used to predict the classes of samples exposed to pharmacologic agents by comparing gene expression patterns from samples treated with two classes of compounds. Rat liver mRNA samples following exposure for 24 hours with phenobarbital or peroxisome proliferators were analyzed using a 1700 rat cDNA microarray platform. Sets of genes that were consistently differentially expressed in the rat liver samples following treatment were stored in the MicroArray Project System (MAPS) database. MAPS identified 238 genes in common that possessed a low probability (P < 0.01) of being randomly detected as differentially expressed at the 95% confidence level. Hierarchical cluster analysis on the 238 genes clustered specific gene expression profiles that separated samples based on exposure to a particular class of compound.
Partitioning of the degradation space for OCR training
NASA Astrophysics Data System (ADS)
Barney Smith, Elisa H.; Andersen, Tim
2006-01-01
Generally speaking optical character recognition algorithms tend to perform better when presented with homogeneous data. This paper studies a method that is designed to increase the homogeneity of training data, based on an understanding of the types of degradations that occur during the printing and scanning process, and how these degradations affect the homogeneity of the data. While it has been shown that dividing the degradation space by edge spread improves recognition accuracy over dividing the degradation space by threshold or point spread function width alone, the challenge is in deciding how many partitions and at what value of edge spread the divisions should be made. Clustering of different types of character features, fonts, sizes, resolutions and noise levels shows that edge spread is indeed shown to be a strong indicator of the homogeneity of character data clusters.
Unsupervised pattern recognition methods in ciders profiling based on GCE voltammetric signals.
Jakubowska, Małgorzata; Sordoń, Wanda; Ciepiela, Filip
2016-07-15
This work presents a complete methodology of distinguishing between different brands of cider and ageing degrees, based on voltammetric signals, utilizing dedicated data preprocessing procedures and unsupervised multivariate analysis. It was demonstrated that voltammograms recorded on glassy carbon electrode in Britton-Robinson buffer at pH 2 are reproducible for each brand. By application of clustering algorithms and principal component analysis visible homogenous clusters were obtained. Advanced signal processing strategy which included automatic baseline correction, interval scaling and continuous wavelet transform with dedicated mother wavelet, was a key step in the correct recognition of the objects. The results show that voltammetry combined with optimized univariate and multivariate data processing is a sufficient tool to distinguish between ciders from various brands and to evaluate their freshness. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Moody, Daniela I.; Wilson, Cathy J.; Rowland, Joel C.; Altmann, Garrett L.
2015-06-01
Advanced pattern recognition and computer vision algorithms are of great interest for landscape characterization, change detection, and change monitoring in satellite imagery, in support of global climate change science and modeling. We present results from an ongoing effort to extend neuroscience-inspired models for feature extraction to the environmental sciences, and we demonstrate our work using Worldview-2 multispectral satellite imagery. We use a Hebbian learning rule to derive multispectral, multiresolution dictionaries directly from regional satellite normalized band difference index data. These feature dictionaries are used to build sparse scene representations, from which we automatically generate land cover labels via our CoSA algorithm: Clustering of Sparse Approximations. These data adaptive feature dictionaries use joint spectral and spatial textural characteristics to help separate geologic, vegetative, and hydrologic features. Land cover labels are estimated in example Worldview-2 satellite images of Barrow, Alaska, taken at two different times, and are used to detect and discuss seasonal surface changes. Our results suggest that an approach that learns from both spectral and spatial features is promising for practical pattern recognition problems in high resolution satellite imagery.
Egocentric daily activity recognition via multitask clustering.
Yan, Yan; Ricci, Elisa; Liu, Gaowen; Sebe, Nicu
2015-10-01
Recognizing human activities from videos is a fundamental research problem in computer vision. Recently, there has been a growing interest in analyzing human behavior from data collected with wearable cameras. First-person cameras continuously record several hours of their wearers' life. To cope with this vast amount of unlabeled and heterogeneous data, novel algorithmic solutions are required. In this paper, we propose a multitask clustering framework for activity of daily living analysis from visual data gathered from wearable cameras. Our intuition is that, even if the data are not annotated, it is possible to exploit the fact that the tasks of recognizing everyday activities of multiple individuals are related, since typically people perform the same actions in similar environments, e.g., people working in an office often read and write documents). In our framework, rather than clustering data from different users separately, we propose to look for clustering partitions which are coherent among related tasks. In particular, two novel multitask clustering algorithms, derived from a common optimization problem, are introduced. Our experimental evaluation, conducted both on synthetic data and on publicly available first-person vision data sets, shows that the proposed approach outperforms several single-task and multitask learning methods.
Tone perception in Mandarin-speaking school age children with otitis media with effusion
McPherson, Bradley; Li, Caiwei; Yang, Feng
2017-01-01
Objectives The present study explored tone perception ability in school age Mandarin-speaking children with otitis media with effusion (OME) in noisy listening environments. The study investigated the interaction effects of noise, tone type, age, and hearing status on monaural tone perception, and assessed the application of a hierarchical clustering algorithm for profiling hearing impairment in children with OME. Methods Forty-one children with normal hearing and normal middle ear status and 84 children with OME with or without hearing loss participated in this study. The children with OME were further divided into two subgroups based on their severity and pattern of hearing loss using a hierarchical clustering algorithm. Monaural tone recognition was measured using a picture-identification test format incorporating six sets of monosyllabic words conveying four lexical tones under speech spectrum noise, with the signal-to-noise ratio (SNR) conditions ranging from -9 to -21 dB. Results Linear correlation indicated tone recognition thresholds of children with OME were significantly correlated with age and pure tone hearing thresholds at every frequency tested. Children with hearing thresholds less affected by OME performed similarly to their peers with normal hearing. Tone recognition thresholds of children with auditory status more affected by OME were significantly inferior to those of children with normal hearing or with minor hearing loss. Younger children demonstrated poorer tone recognition performance than older children with OME. A mixed design repeated-measure ANCOVA showed significant main effects of listening condition, hearing status, and tone type on tone recognition. Contrast comparisons revealed that tone recognition scores were significantly better under -12 dB SNR than under -15 dB SNR conditions and tone recognition scores were significantly worse under -18 dB SNR than those obtained under -15 dB SNR conditions. Tone 1 was the easiest tone to identify and Tone 3 was the most difficult tone to identify for all participants, when considering -12, -15, and -18 dB SNR as within-subject variables. The interaction effect between hearing status and tone type indicated that children with greater levels of OME-related hearing loss had more impaired tone perception of Tone 1 and Tone 2 compared to their peers with lesser levels of OME-related hearing loss. However, tone perception of Tone 3 and Tone 4 remained similar among all three groups. Tone 2 and Tone 3 were the most perceptually difficult tones for children with or without OME-related hearing loss in all listening conditions. Conclusions The hierarchical clustering algorithm demonstrated usefulness in risk stratification for tone perception deficiency in children with OME-related hearing loss. There was marked impairment in tone perception in noise for children with greater levels of OME-related hearing loss. Monaural lexical tone perception in younger children was more vulnerable to noise and OME-related hearing loss than that in older children. PMID:28829840
Tone perception in Mandarin-speaking school age children with otitis media with effusion.
Cai, Ting; McPherson, Bradley; Li, Caiwei; Yang, Feng
2017-01-01
The present study explored tone perception ability in school age Mandarin-speaking children with otitis media with effusion (OME) in noisy listening environments. The study investigated the interaction effects of noise, tone type, age, and hearing status on monaural tone perception, and assessed the application of a hierarchical clustering algorithm for profiling hearing impairment in children with OME. Forty-one children with normal hearing and normal middle ear status and 84 children with OME with or without hearing loss participated in this study. The children with OME were further divided into two subgroups based on their severity and pattern of hearing loss using a hierarchical clustering algorithm. Monaural tone recognition was measured using a picture-identification test format incorporating six sets of monosyllabic words conveying four lexical tones under speech spectrum noise, with the signal-to-noise ratio (SNR) conditions ranging from -9 to -21 dB. Linear correlation indicated tone recognition thresholds of children with OME were significantly correlated with age and pure tone hearing thresholds at every frequency tested. Children with hearing thresholds less affected by OME performed similarly to their peers with normal hearing. Tone recognition thresholds of children with auditory status more affected by OME were significantly inferior to those of children with normal hearing or with minor hearing loss. Younger children demonstrated poorer tone recognition performance than older children with OME. A mixed design repeated-measure ANCOVA showed significant main effects of listening condition, hearing status, and tone type on tone recognition. Contrast comparisons revealed that tone recognition scores were significantly better under -12 dB SNR than under -15 dB SNR conditions and tone recognition scores were significantly worse under -18 dB SNR than those obtained under -15 dB SNR conditions. Tone 1 was the easiest tone to identify and Tone 3 was the most difficult tone to identify for all participants, when considering -12, -15, and -18 dB SNR as within-subject variables. The interaction effect between hearing status and tone type indicated that children with greater levels of OME-related hearing loss had more impaired tone perception of Tone 1 and Tone 2 compared to their peers with lesser levels of OME-related hearing loss. However, tone perception of Tone 3 and Tone 4 remained similar among all three groups. Tone 2 and Tone 3 were the most perceptually difficult tones for children with or without OME-related hearing loss in all listening conditions. The hierarchical clustering algorithm demonstrated usefulness in risk stratification for tone perception deficiency in children with OME-related hearing loss. There was marked impairment in tone perception in noise for children with greater levels of OME-related hearing loss. Monaural lexical tone perception in younger children was more vulnerable to noise and OME-related hearing loss than that in older children.
An Intelligent Decision Support System for Leukaemia Diagnosis using Microscopic Blood Images.
Chin Neoh, Siew; Srisukkham, Worawut; Zhang, Li; Todryk, Stephen; Greystoke, Brigit; Peng Lim, Chee; Alamgir Hossain, Mohammed; Aslam, Nauman
2015-10-09
This research proposes an intelligent decision support system for acute lymphoblastic leukaemia diagnosis from microscopic blood images. A novel clustering algorithm with stimulating discriminant measures (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of nucleus and cytoplasm of lymphocytes/lymphoblasts. Specifically, the proposed between-cluster evaluation is formulated based on the trade-off of several between-cluster measures of well-known feature extraction methods. The SDM measures are used in conjuction with Genetic Algorithm for clustering nucleus, cytoplasm, and background regions. Subsequently, a total of eighty features consisting of shape, texture, and colour information of the nucleus and cytoplasm sub-images are extracted. A number of classifiers (multi-layer perceptron, Support Vector Machine (SVM) and Dempster-Shafer ensemble) are employed for lymphocyte/lymphoblast classification. Evaluated with the ALL-IDB2 database, the proposed SDM-based clustering overcomes the shortcomings of Fuzzy C-means which focuses purely on within-cluster scatter variance. It also outperforms Linear Discriminant Analysis and Fuzzy Compactness and Separation for nucleus-cytoplasm separation. The overall system achieves superior recognition rates of 96.72% and 96.67% accuracies using bootstrapping and 10-fold cross validation with Dempster-Shafer and SVM, respectively. The results also compare favourably with those reported in the literature, indicating the usefulness of the proposed SDM-based clustering method.
An Intelligent Decision Support System for Leukaemia Diagnosis using Microscopic Blood Images
Chin Neoh, Siew; Srisukkham, Worawut; Zhang, Li; Todryk, Stephen; Greystoke, Brigit; Peng Lim, Chee; Alamgir Hossain, Mohammed; Aslam, Nauman
2015-01-01
This research proposes an intelligent decision support system for acute lymphoblastic leukaemia diagnosis from microscopic blood images. A novel clustering algorithm with stimulating discriminant measures (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of nucleus and cytoplasm of lymphocytes/lymphoblasts. Specifically, the proposed between-cluster evaluation is formulated based on the trade-off of several between-cluster measures of well-known feature extraction methods. The SDM measures are used in conjuction with Genetic Algorithm for clustering nucleus, cytoplasm, and background regions. Subsequently, a total of eighty features consisting of shape, texture, and colour information of the nucleus and cytoplasm sub-images are extracted. A number of classifiers (multi-layer perceptron, Support Vector Machine (SVM) and Dempster-Shafer ensemble) are employed for lymphocyte/lymphoblast classification. Evaluated with the ALL-IDB2 database, the proposed SDM-based clustering overcomes the shortcomings of Fuzzy C-means which focuses purely on within-cluster scatter variance. It also outperforms Linear Discriminant Analysis and Fuzzy Compactness and Separation for nucleus-cytoplasm separation. The overall system achieves superior recognition rates of 96.72% and 96.67% accuracies using bootstrapping and 10-fold cross validation with Dempster-Shafer and SVM, respectively. The results also compare favourably with those reported in the literature, indicating the usefulness of the proposed SDM-based clustering method. PMID:26450665
Kazakh Traditional Dance Gesture Recognition
NASA Astrophysics Data System (ADS)
Nussipbekov, A. K.; Amirgaliyev, E. N.; Hahn, Minsoo
2014-04-01
Full body gesture recognition is an important and interdisciplinary research field which is widely used in many application spheres including dance gesture recognition. The rapid growth of technology in recent years brought a lot of contribution in this domain. However it is still challenging task. In this paper we implement Kazakh traditional dance gesture recognition. We use Microsoft Kinect camera to obtain human skeleton and depth information. Then we apply tree-structured Bayesian network and Expectation Maximization algorithm with K-means clustering to calculate conditional linear Gaussians for classifying poses. And finally we use Hidden Markov Model to detect dance gestures. Our main contribution is that we extend Kinect skeleton by adding headwear as a new skeleton joint which is calculated from depth image. This novelty allows us to significantly improve the accuracy of head gesture recognition of a dancer which in turn plays considerable role in whole body gesture recognition. Experimental results show the efficiency of the proposed method and that its performance is comparable to the state-of-the-art system performances.
A coarse to fine minutiae-based latent palmprint matching.
Liu, Eryun; Jain, Anil K; Tian, Jie
2013-10-01
With the availability of live-scan palmprint technology, high resolution palmprint recognition has started to receive significant attention in forensics and law enforcement. In forensic applications, latent palmprints provide critical evidence as it is estimated that about 30 percent of the latents recovered at crime scenes are those of palms. Most of the available high-resolution palmprint matching algorithms essentially follow the minutiae-based fingerprint matching strategy. Considering the large number of minutiae (about 1,000 minutiae in a full palmprint compared to about 100 minutiae in a rolled fingerprint) and large area of foreground region in full palmprints, novel strategies need to be developed for efficient and robust latent palmprint matching. In this paper, a coarse to fine matching strategy based on minutiae clustering and minutiae match propagation is designed specifically for palmprint matching. To deal with the large number of minutiae, a local feature-based minutiae clustering algorithm is designed to cluster minutiae into several groups such that minutiae belonging to the same group have similar local characteristics. The coarse matching is then performed within each cluster to establish initial minutiae correspondences between two palmprints. Starting with each initial correspondence, a minutiae match propagation algorithm searches for mated minutiae in the full palmprint. The proposed palmprint matching algorithm has been evaluated on a latent-to-full palmprint database consisting of 446 latents and 12,489 background full prints. The matching results show a rank-1 identification accuracy of 79.4 percent, which is significantly higher than the 60.8 percent identification accuracy of a state-of-the-art latent palmprint matching algorithm on the same latent database. The average computation time of our algorithm for a single latent-to-full match is about 141 ms for genuine match and 50 ms for impostor match, on a Windows XP desktop system with 2.2-GHz CPU and 1.00-GB RAM. The computation time of our algorithm is an order of magnitude faster than a previously published state-of-the-art-algorithm.
Cohen, Mitchell J; Grossman, Adam D; Morabito, Diane; Knudson, M Margaret; Butte, Atul J; Manley, Geoffrey T
2010-01-01
Advances in technology have made extensive monitoring of patient physiology the standard of care in intensive care units (ICUs). While many systems exist to compile these data, there has been no systematic multivariate analysis and categorization across patient physiological data. The sheer volume and complexity of these data make pattern recognition or identification of patient state difficult. Hierarchical cluster analysis allows visualization of high dimensional data and enables pattern recognition and identification of physiologic patient states. We hypothesized that processing of multivariate data using hierarchical clustering techniques would allow identification of otherwise hidden patient physiologic patterns that would be predictive of outcome. Multivariate physiologic and ventilator data were collected continuously using a multimodal bioinformatics system in the surgical ICU at San Francisco General Hospital. These data were incorporated with non-continuous data and stored on a server in the ICU. A hierarchical clustering algorithm grouped each minute of data into 1 of 10 clusters. Clusters were correlated with outcome measures including incidence of infection, multiple organ failure (MOF), and mortality. We identified 10 clusters, which we defined as distinct patient states. While patients transitioned between states, they spent significant amounts of time in each. Clusters were enriched for our outcome measures: 2 of the 10 states were enriched for infection, 6 of 10 were enriched for MOF, and 3 of 10 were enriched for death. Further analysis of correlations between pairs of variables within each cluster reveals significant differences in physiology between clusters. Here we show for the first time the feasibility of clustering physiological measurements to identify clinically relevant patient states after trauma. These results demonstrate that hierarchical clustering techniques can be useful for visualizing complex multivariate data and may provide new insights for the care of critically injured patients.
Nef, Tobias; Urwyler, Prabitha; Büchler, Marcel; Tarnanas, Ioannis; Stucki, Reto; Cazzoli, Dario; Müri, René; Mosimann, Urs
2012-01-01
Smart homes for the aging population have recently started attracting the attention of the research community. The “health state” of smart homes is comprised of many different levels; starting with the physical health of citizens, it also includes longer-term health norms and outcomes, as well as the arena of positive behavior changes. One of the problems of interest is to monitor the activities of daily living (ADL) of the elderly, aiming at their protection and well-being. For this purpose, we installed passive infrared (PIR) sensors to detect motion in a specific area inside a smart apartment and used them to collect a set of ADL. In a novel approach, we describe a technology that allows the ground truth collected in one smart home to train activity recognition systems for other smart homes. We asked the users to label all instances of all ADL only once and subsequently applied data mining techniques to cluster in-home sensor firings. Each cluster would therefore represent the instances of the same activity. Once the clusters were associated to their corresponding activities, our system was able to recognize future activities. To improve the activity recognition accuracy, our system preprocessed raw sensor data by identifying overlapping activities. To evaluate the recognition performance from a 200-day dataset, we implemented three different active learning classification algorithms and compared their performance: naive Bayesian (NB), support vector machine (SVM) and random forest (RF). Based on our results, the RF classifier recognized activities with an average specificity of 96.53%, a sensitivity of 68.49%, a precision of 74.41% and an F-measure of 71.33%, outperforming both the NB and SVM classifiers. Further clustering markedly improved the results of the RF classifier. An activity recognition system based on PIR sensors in conjunction with a clustering classification approach was able to detect ADL from datasets collected from different homes. Thus, our PIR-based smart home technology could improve care and provide valuable information to better understand the functioning of our societies, as well as to inform both individual and collective action in a smart city scenario. PMID:26007727
Nef, Tobias; Urwyler, Prabitha; Büchler, Marcel; Tarnanas, Ioannis; Stucki, Reto; Cazzoli, Dario; Müri, René; Mosimann, Urs
2015-05-21
Smart homes for the aging population have recently started attracting the attention of the research community. The "health state" of smart homes is comprised of many different levels; starting with the physical health of citizens, it also includes longer-term health norms and outcomes, as well as the arena of positive behavior changes. One of the problems of interest is to monitor the activities of daily living (ADL) of the elderly, aiming at their protection and well-being. For this purpose, we installed passive infrared (PIR) sensors to detect motion in a specific area inside a smart apartment and used them to collect a set of ADL. In a novel approach, we describe a technology that allows the ground truth collected in one smart home to train activity recognition systems for other smart homes. We asked the users to label all instances of all ADL only once and subsequently applied data mining techniques to cluster in-home sensor firings. Each cluster would therefore represent the instances of the same activity. Once the clusters were associated to their corresponding activities, our system was able to recognize future activities. To improve the activity recognition accuracy, our system preprocessed raw sensor data by identifying overlapping activities. To evaluate the recognition performance from a 200-day dataset, we implemented three different active learning classification algorithms and compared their performance: naive Bayesian (NB), support vector machine (SVM) and random forest (RF). Based on our results, the RF classifier recognized activities with an average specificity of 96.53%, a sensitivity of 68.49%, a precision of 74.41% and an F-measure of 71.33%, outperforming both the NB and SVM classifiers. Further clustering markedly improved the results of the RF classifier. An activity recognition system based on PIR sensors in conjunction with a clustering classification approach was able to detect ADL from datasets collected from different homes. Thus, our PIR-based smart home technology could improve care and provide valuable information to better understand the functioning of our societies, as well as to inform both individual and collective action in a smart city scenario.
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
Low-level processing for real-time image analysis
NASA Technical Reports Server (NTRS)
Eskenazi, R.; Wilf, J. M.
1979-01-01
A system that detects object outlines in television images in real time is described. A high-speed pipeline processor transforms the raw image into an edge map and a microprocessor, which is integrated into the system, clusters the edges, and represents them as chain codes. Image statistics, useful for higher level tasks such as pattern recognition, are computed by the microprocessor. Peak intensity and peak gradient values are extracted within a programmable window and are used for iris and focus control. The algorithms implemented in hardware and the pipeline processor architecture are described. The strategy for partitioning functions in the pipeline was chosen to make the implementation modular. The microprocessor interface allows flexible and adaptive control of the feature extraction process. The software algorithms for clustering edge segments, creating chain codes, and computing image statistics are also discussed. A strategy for real time image analysis that uses this system is given.
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Qin, Jiangyi; Huang, Zhiping; Liu, Chunwu; Su, Shaojing; Zhou, Jing
2015-01-01
A novel blind recognition algorithm of frame synchronization words is proposed to recognize the frame synchronization words parameters in digital communication systems. In this paper, a blind recognition method of frame synchronization words based on the hard-decision is deduced in detail. And the standards of parameter recognition are given. Comparing with the blind recognition based on the hard-decision, utilizing the soft-decision can improve the accuracy of blind recognition. Therefore, combining with the characteristics of Quadrature Phase Shift Keying (QPSK) signal, an improved blind recognition algorithm based on the soft-decision is proposed. Meanwhile, the improved algorithm can be extended to other signal modulation forms. Then, the complete blind recognition steps of the hard-decision algorithm and the soft-decision algorithm are given in detail. Finally, the simulation results show that both the hard-decision algorithm and the soft-decision algorithm can recognize the parameters of frame synchronization words blindly. What's more, the improved algorithm can enhance the accuracy of blind recognition obviously.
Novel probabilistic neuroclassifier
NASA Astrophysics Data System (ADS)
Hong, Jiang; Serpen, Gursel
2003-09-01
A novel probabilistic potential function neural network classifier algorithm to deal with classes which are multi-modally distributed and formed from sets of disjoint pattern clusters is proposed in this paper. The proposed classifier has a number of desirable properties which distinguish it from other neural network classifiers. A complete description of the algorithm in terms of its architecture and the pseudocode is presented. Simulation analysis of the newly proposed neuro-classifier algorithm on a set of benchmark problems is presented. Benchmark problems tested include IRIS, Sonar, Vowel Recognition, Two-Spiral, Wisconsin Breast Cancer, Cleveland Heart Disease and Thyroid Gland Disease. Simulation results indicate that the proposed neuro-classifier performs consistently better for a subset of problems for which other neural classifiers perform relatively poorly.
Optimization of Support Vector Machine (SVM) for Object Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew; Dhingra, Neil; Lu, Thomas T.; Chao, Tien-Hsin
2012-01-01
The Support Vector Machine (SVM) is a powerful algorithm, useful in classifying data into species. The SVMs implemented in this research were used as classifiers for the final stage in a Multistage Automatic Target Recognition (ATR) system. A single kernel SVM known as SVMlight, and a modified version known as a SVM with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SVM as a method for classification. From trial to trial, SVM produces consistent results.
Numerical linear algebra in data mining
NASA Astrophysics Data System (ADS)
Eldén, Lars
Ideas and algorithms from numerical linear algebra are important in several areas of data mining. We give an overview of linear algebra methods in text mining (information retrieval), pattern recognition (classification of handwritten digits), and PageRank computations for web search engines. The emphasis is on rank reduction as a method of extracting information from a data matrix, low-rank approximation of matrices using the singular value decomposition and clustering, and on eigenvalue methods for network analysis.
Topological side-chain classification of beta-turns: ideal motifs for peptidomimetic development.
Tran, Tran Trung; McKie, Jim; Meutermans, Wim D F; Bourne, Gregory T; Andrews, Peter R; Smythe, Mark L
2005-08-01
Beta-turns are important topological motifs for biological recognition of proteins and peptides. Organic molecules that sample the side chain positions of beta-turns have shown broad binding capacity to multiple different receptors, for example benzodiazepines. Beta-turns have traditionally been classified into various types based on the backbone dihedral angles (phi2, psi2, phi3 and psi3). Indeed, 57-68% of beta-turns are currently classified into 8 different backbone families (Type I, Type II, Type I', Type II', Type VIII, Type VIa1, Type VIa2 and Type VIb and Type IV which represents unclassified beta-turns). Although this classification of beta-turns has been useful, the resulting beta-turn types are not ideal for the design of beta-turn mimetics as they do not reflect topological features of the recognition elements, the side chains. To overcome this, we have extracted beta-turns from a data set of non-homologous and high-resolution protein crystal structures. The side chain positions, as defined by C(alpha)-C(beta) vectors, of these turns have been clustered using the kth nearest neighbor clustering and filtered nearest centroid sorting algorithms. Nine clusters were obtained that cluster 90% of the data, and the average intra-cluster RMSD of the four C(alpha)-C(beta) vectors is 0.36. The nine clusters therefore represent the topology of the side chain scaffold architecture of the vast majority of beta-turns. The mean structures of the nine clusters are useful for the development of beta-turn mimetics and as biological descriptors for focusing combinatorial chemistry towards biologically relevant topological space.
Mammographic images segmentation based on chaotic map clustering algorithm
2014-01-01
Background This work investigates the applicability of a novel clustering approach to the segmentation of mammographic digital images. The chaotic map clustering algorithm is used to group together similar subsets of image pixels resulting in a medically meaningful partition of the mammography. Methods The image is divided into pixels subsets characterized by a set of conveniently chosen features and each of the corresponding points in the feature space is associated to a map. A mutual coupling strength between the maps depending on the associated distance between feature space points is subsequently introduced. On the system of maps, the simulated evolution through chaotic dynamics leads to its natural partitioning, which corresponds to a particular segmentation scheme of the initial mammographic image. Results The system provides a high recognition rate for small mass lesions (about 94% correctly segmented inside the breast) and the reproduction of the shape of regions with denser micro-calcifications in about 2/3 of the cases, while being less effective on identification of larger mass lesions. Conclusions We can summarize our analysis by asserting that due to the particularities of the mammographic images, the chaotic map clustering algorithm should not be used as the sole method of segmentation. It is rather the joint use of this method along with other segmentation techniques that could be successfully used for increasing the segmentation performance and for providing extra information for the subsequent analysis stages such as the classification of the segmented ROI. PMID:24666766
A triboelectric motion sensor in wearable body sensor network for human activity recognition.
Hui Huang; Xian Li; Ye Sun
2016-08-01
The goal of this study is to design a novel triboelectric motion sensor in wearable body sensor network for human activity recognition. Physical activity recognition is widely used in well-being management, medical diagnosis and rehabilitation. Other than traditional accelerometers, we design a novel wearable sensor system based on triboelectrification. The triboelectric motion sensor can be easily attached to human body and collect motion signals caused by physical activities. The experiments are conducted to collect five common activity data: sitting and standing, walking, climbing upstairs, downstairs, and running. The k-Nearest Neighbor (kNN) clustering algorithm is adopted to recognize these activities and validate the feasibility of this new approach. The results show that our system can perform physical activity recognition with a successful rate over 80% for walking, sitting and standing. The triboelectric structure can also be used as an energy harvester for motion harvesting due to its high output voltage in random low-frequency motion.
Digital signal processing algorithms for automatic voice recognition
NASA Technical Reports Server (NTRS)
Botros, Nazeih M.
1987-01-01
The current digital signal analysis algorithms are investigated that are implemented in automatic voice recognition algorithms. Automatic voice recognition means, the capability of a computer to recognize and interact with verbal commands. The digital signal is focused on, rather than the linguistic, analysis of speech signal. Several digital signal processing algorithms are available for voice recognition. Some of these algorithms are: Linear Predictive Coding (LPC), Short-time Fourier Analysis, and Cepstrum Analysis. Among these algorithms, the LPC is the most widely used. This algorithm has short execution time and do not require large memory storage. However, it has several limitations due to the assumptions used to develop it. The other 2 algorithms are frequency domain algorithms with not many assumptions, but they are not widely implemented or investigated. However, with the recent advances in the digital technology, namely signal processors, these 2 frequency domain algorithms may be investigated in order to implement them in voice recognition. This research is concerned with real time, microprocessor based recognition algorithms.
Kim, Hyunsoo; Park, Haesun
2007-06-15
Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Sparse non-negative matrix factorizations (NMFs) are useful when the degree of sparseness in the non-negative basis matrix or the non-negative coefficient matrix in an NMF needs to be controlled in approximating high-dimensional data in a lower dimensional space. In this article, we introduce a novel formulation of sparse NMF and show how the new formulation leads to a convergent sparse NMF algorithm via alternating non-negativity-constrained least squares. We apply our sparse NMF algorithm to cancer-class discovery and gene expression data analysis and offer biological analysis of the results obtained. Our experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms. The software is available as supplementary material.
NASA Astrophysics Data System (ADS)
Krasilenko, Vladimir G.; Lazarev, Alexander A.; Nikitovich, Diana V.
2017-08-01
Self-learning equivalent-convolutional neural structures (SLECNS) for auto-coding-decoding and image clustering are discussed. The SLECNS architectures and their spatially invariant equivalent models (SI EMs) using the corresponding matrix-matrix procedures with basic operations of continuous logic and non-linear processing are proposed. These SI EMs have several advantages, such as the ability to recognize image fragments with better efficiency and strong cross correlation. The proposed clustering method of fragments with regard to their structural features is suitable not only for binary, but also color images and combines self-learning and the formation of weight clustered matrix-patterns. Its model is constructed and designed on the basis of recursively processing algorithms and to k-average method. The experimental results confirmed that larger images and 2D binary fragments with a large numbers of elements may be clustered. For the first time the possibility of generalization of these models for space invariant case is shown. The experiment for an image with dimension of 256x256 (a reference array) and fragments with dimensions of 7x7 and 21x21 for clustering is carried out. The experiments, using the software environment Mathcad, showed that the proposed method is universal, has a significant convergence, the small number of iterations is easily, displayed on the matrix structure, and confirmed its prospects. Thus, to understand the mechanisms of self-learning equivalence-convolutional clustering, accompanying her to the competitive processes in neurons, and the neural auto-encoding-decoding and recognition principles with the use of self-learning cluster patterns is very important which used the algorithm and the principles of non-linear processing of two-dimensional spatial functions of images comparison. These SIEMs can simply describe the signals processing during the all training and recognition stages and they are suitable for unipolar-coding multilevel signals. We show that the implementation of SLECNS based on known equivalentors or traditional correlators is possible if they are based on proposed equivalental two-dimensional functions of image similarity. The clustering efficiency in such models and their implementation depends on the discriminant properties of neural elements of hidden layers. Therefore, the main models and architecture parameters and characteristics depends on the applied types of non-linear processing and function used for image comparison or for adaptive-equivalental weighing of input patterns. Real model experiments in Mathcad are demonstrated, which confirm that non-linear processing on equivalent functions allows you to determine the neuron winners and adjust the weight matrix. Experimental results have shown that such models can be successfully used for auto- and hetero-associative recognition. They can also be used to explain some mechanisms known as "focus" and "competing gain-inhibition concept". The SLECNS architecture and hardware implementations of its basic nodes based on multi-channel convolvers and correlators with time integration are proposed. The parameters and performance of such architectures are estimated.
OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes
Li, Li; Stoeckert, Christian J.; Roos, David S.
2003-01-01
The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of “recent” paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome. PMID:12952885
NASA Astrophysics Data System (ADS)
Zhang, Haiying; Bai, Jiaojiao; Li, Zhengjie; Liu, Yan; Liu, Kunhong
2017-06-01
The detection and discrimination of infrared small dim targets is a challenge in automatic target recognition (ATR), because there is no salient information of size, shape and texture. Many researchers focus on mining more discriminative information of targets in temporal-spatial. However, such information may not be available with the change of imaging environments, and the targets size and intensity keep changing in different imaging distance. So in this paper, we propose a novel research scheme using density-based clustering and backtracking strategy. In this scheme, the speeded up robust feature (SURF) detector is applied to capture candidate targets in single frame at first. And then, these points are mapped into one frame, so that target traces form a local aggregation pattern. In order to isolate the targets from noises, a newly proposed density-based clustering algorithm, fast search and find of density peak (FSFDP for short), is employed to cluster targets by the spatial intensive distribution. Two important factors of the algorithm, percent and γ , are exploited fully to determine the clustering scale automatically, so as to extract the trace with highest clutter suppression ratio. And at the final step, a backtracking algorithm is designed to detect and discriminate target trace as well as to eliminate clutter. The consistence and continuity of the short-time target trajectory in temporal-spatial is incorporated into the bounding function to speed up the pruning. Compared with several state-of-arts methods, our algorithm is more effective for the dim targets with lower signal-to clutter ratio (SCR). Furthermore, it avoids constructing the candidate target trajectory searching space, so its time complexity is limited to a polynomial level. The extensive experimental results show that it has superior performance in probability of detection (Pd) and false alarm suppressing rate aiming at variety of complex backgrounds.
NETRA: A parallel architecture for integrated vision systems. 1: Architecture and organization
NASA Technical Reports Server (NTRS)
Choudhary, Alok N.; Patel, Janak H.; Ahuja, Narendra
1989-01-01
Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is considered to be a system that uses vision algorithms from all levels of processing for a high level application (such as object recognition). A model of computation is presented for parallel processing for an IVS. Using the model, desired features and capabilities of a parallel architecture suitable for IVSs are derived. Then a multiprocessor architecture (called NETRA) is presented. This architecture is highly flexible without the use of complex interconnection schemes. The topology of NETRA is recursively defined and hence is easily scalable from small to large systems. Homogeneity of NETRA permits fault tolerance and graceful degradation under faults. It is a recursively defined tree-type hierarchical architecture where each of the leaf nodes consists of a cluster of processors connected with a programmable crossbar with selective broadcast capability to provide for desired flexibility. A qualitative evaluation of NETRA is presented. Then general schemes are described to map parallel algorithms onto NETRA. Algorithms are classified according to their communication requirements for parallel processing. An extensive analysis of inter-cluster communication strategies in NETRA is presented, and parameters affecting performance of parallel algorithms when mapped on NETRA are discussed. Finally, a methodology to evaluate performance of algorithms on NETRA is described.
Testing of the Support Vector Machine for Binary-Class Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew
2011-01-01
The Support Vector Machine is a powerful algorithm, useful in classifying data in to species. The Support Vector Machines implemented in this research were used as classifiers for the final stage in a Multistage Autonomous Target Recognition system. A single kernel SVM known as SVMlight, and a modified version known as a Support Vector Machine with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SMV as a method for classification. From trial to trial, SVM produces consistent results
Predicting thunderstorm evolution using ground-based lightning detection networks
NASA Technical Reports Server (NTRS)
Goodman, Steven J.
1990-01-01
Lightning measurements acquired principally by a ground-based network of magnetic direction finders are used to diagnose and predict the existence, temporal evolution, and decay of thunderstorms over a wide range of space and time scales extending over four orders of magnitude. The non-linear growth and decay of thunderstorms and their accompanying cloud-to-ground lightning activity is described by the three parameter logistic growth model. The growth rate is shown to be a function of the storm size and duration, and the limiting value of the total lightning activity is related to the available energy in the environment. A new technique is described for removing systematic bearing errors from direction finder data where radar echoes are used to constrain site error correction and optimization (best point estimate) algorithms. A nearest neighbor pattern recognition algorithm is employed to cluster the discrete lightning discharges into storm cells and the advantages and limitations of different clustering strategies for storm identification and tracking are examined.
Diagnostic Accuracy Comparison of Artificial Immune Algorithms for Primary Headaches.
Çelik, Ufuk; Yurtay, Nilüfer; Koç, Emine Rabia; Tepe, Nermin; Güllüoğlu, Halil; Ertaş, Mustafa
2015-01-01
The present study evaluated the diagnostic accuracy of immune system algorithms with the aim of classifying the primary types of headache that are not related to any organic etiology. They are divided into four types: migraine, tension, cluster, and other primary headaches. After we took this main objective into consideration, three different neurologists were required to fill in the medical records of 850 patients into our web-based expert system hosted on our project web site. In the evaluation process, Artificial Immune Systems (AIS) were used as the classification algorithms. The AIS are classification algorithms that are inspired by the biological immune system mechanism that involves significant and distinct capabilities. These algorithms simulate the specialties of the immune system such as discrimination, learning, and the memorizing process in order to be used for classification, optimization, or pattern recognition. According to the results, the accuracy level of the classifier used in this study reached a success continuum ranging from 95% to 99%, except for the inconvenient one that yielded 71% accuracy.
Visual based laser speckle pattern recognition method for structural health monitoring
NASA Astrophysics Data System (ADS)
Park, Kyeongtaek; Torbol, Marco
2017-04-01
This study performed the system identification of a target structure by analyzing the laser speckle pattern taken by a camera. The laser speckle pattern is generated by the diffuse reflection of the laser beam on a rough surface of the target structure. The camera, equipped with a red filter, records the scattered speckle particles of the laser light in real time and the raw speckle image of the pixel data is fed to the graphic processing unit (GPU) in the system. The algorithm for laser speckle contrast analysis (LASCA) computes: the laser speckle contrast images and the laser speckle flow images. The k-mean clustering algorithm is used to classify the pixels in each frame and the clusters' centroids, which function as virtual sensors, track the displacement between different frames in time domain. The fast Fourier transform (FFT) and the frequency domain decomposition (FDD) compute the modal properties of the structure: natural frequencies and damping ratios. This study takes advantage of the large scale computational capability of GPU. The algorithm is written in Compute Unifies Device Architecture (CUDA C) that allows the processing of speckle images in real time.
NASA Astrophysics Data System (ADS)
Qu, Hongquan; Yuan, Shijiao; Wang, Yanping; Yang, Dan
2018-04-01
To improve the recognition performance of optical fiber prewarning system (OFPS), this study proposed a hierarchical recognition algorithm (HRA). Compared with traditional methods, which employ only a complex algorithm that includes multiple extracted features and complex classifiers to increase the recognition rate with a considerable decrease in recognition speed, HRA takes advantage of the continuity of intrusion events, thereby creating a staged recognition flow inspired by stress reaction. HRA is expected to achieve high-level recognition accuracy with less time consumption. First, this work analyzed the continuity of intrusion events and then presented the algorithm based on the mechanism of stress reaction. Finally, it verified the time consumption through theoretical analysis and experiments, and the recognition accuracy was obtained through experiments. Experiment results show that the processing speed of HRA is 3.3 times faster than that of a traditional complicated algorithm and has a similar recognition rate of 98%. The study is of great significance to fast intrusion event recognition in OFPS.
Homaeinezhad, M R; Erfanianmoshiri-Nejad, M; Naseri, H
2014-01-01
The goal of this study is to introduce a simple, standard and safe procedure to detect and to delineate P and T waves of the electrocardiogram (ECG) signal in real conditions. The proposed method consists of four major steps: (1) a secure QRS detection and delineation algorithm, (2) a pattern recognition algorithm designed for distinguishing various ECG clusters which take place between consecutive R-waves, (3) extracting template of the dominant events of each cluster waveform and (4) application of the correlation analysis in order to delineate automatically the P- and T-waves in noisy conditions. The performance characteristics of the proposed P and T detection-delineation algorithm are evaluated versus various ECG signals whose qualities are altered from the best to the worst cases based on the random-walk noise theory. Also, the method is applied to the MIT-BIH Arrhythmia and the QT databases for comparing some parts of its performance characteristics with a number of P and T detection-delineation algorithms. The conducted evaluations indicate that in a signal with low quality value of about 0.6, the proposed method detects the P and T events with sensitivity Se=85% and positive predictive value of P+=89%, respectively. In addition, at the same quality, the average delineation errors associated with those ECG events are 45 and 63ms, respectively. Stable delineation error, high detection accuracy and high noise tolerance were the most important aspects considered during development of the proposed method. © 2013 Elsevier Ltd. All rights reserved.
Recognition of strong earthquake-prone areas with a single learning class
NASA Astrophysics Data System (ADS)
Gvishiani, A. D.; Agayan, S. M.; Dzeboev, B. A.; Belov, I. O.
2017-05-01
This article presents a new Barrier recognition algorithm with learning, designed for recognition of earthquake-prone areas. In comparison to the Crust (Kora) algorithm, used by the classical EPA approach, the Barrier algorithm proceeds with learning just on one "pure" high-seismic class. The new algorithm operates in the space of absolute values of the geological-geophysical parameters of the objects. The algorithm is used for recognition of earthquake-prone areas with M ≥ 6.0 in the Caucasus region. Comparative analysis of the Crust and Barrier algorithms justifies their productive coherence.
Automatic three-dimensional measurement of large-scale structure based on vision metrology.
Zhu, Zhaokun; Guan, Banglei; Zhang, Xiaohu; Li, Daokui; Yu, Qifeng
2014-01-01
All relevant key techniques involved in photogrammetric vision metrology for fully automatic 3D measurement of large-scale structure are studied. A new kind of coded target consisting of circular retroreflective discs is designed, and corresponding detection and recognition algorithms based on blob detection and clustering are presented. Then a three-stage strategy starting with view clustering is proposed to achieve automatic network orientation. As for matching of noncoded targets, the concept of matching path is proposed, and matches for each noncoded target are found by determination of the optimal matching path, based on a novel voting strategy, among all possible ones. Experiments on a fixed keel of airship have been conducted to verify the effectiveness and measuring accuracy of the proposed methods.
Improved Ant Colony Clustering Algorithm and Its Performance Study
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
A handheld computer-aided diagnosis system and simulated analysis
NASA Astrophysics Data System (ADS)
Su, Mingjian; Zhang, Xuejun; Liu, Brent; Su, Kening; Louie, Ryan
2016-03-01
This paper describes a Computer Aided Diagnosis (CAD) system based on cellphone and distributed cluster. One of the bottlenecks in building a CAD system for clinical practice is the storage and process of mass pathology samples freely among different devices, and normal pattern matching algorithm on large scale image set is very time consuming. Distributed computation on cluster has demonstrated the ability to relieve this bottleneck. We develop a system enabling the user to compare the mass image to a dataset with feature table by sending datasets to Generic Data Handler Module in Hadoop, where the pattern recognition is undertaken for the detection of skin diseases. A single and combination retrieval algorithm to data pipeline base on Map Reduce framework is used in our system in order to make optimal choice between recognition accuracy and system cost. The profile of lesion area is drawn by doctors manually on the screen, and then uploads this pattern to the server. In our evaluation experiment, an accuracy of 75% diagnosis hit rate is obtained by testing 100 patients with skin illness. Our system has the potential help in building a novel medical image dataset by collecting large amounts of gold standard during medical diagnosis. Once the project is online, the participants are free to join and eventually an abundant sample dataset will soon be gathered enough for learning. These results demonstrate our technology is very promising and expected to be used in clinical practice.
Enabling analytical and Modeling Tools for Enhanced Disease Surveillance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dawn K. Manley
2003-04-01
Early detection, identification, and warning are essential to minimize casualties from a biological attack. For covert attacks, sick people are likely to provide the first indication of an attack. An enhanced medical surveillance system that synthesizes distributed health indicator information and rapidly analyzes the information can dramatically increase the number of lives saved. Current surveillance methods to detect both biological attacks and natural outbreaks are hindered by factors such as distributed ownership of information, incompatible data storage and analysis programs, and patient privacy concerns. Moreover, because data are not widely shared, few data mining algorithms have been tested on andmore » applied to diverse health indicator data. This project addressed both integration of multiple data sources and development and integration of analytical tools for rapid detection of disease outbreaks. As a first prototype, we developed an application to query and display distributed patient records. This application incorporated need-to-know access control and incorporated data from standard commercial databases. We developed and tested two different algorithms for outbreak recognition. The first is a pattern recognition technique that searches for space-time data clusters that may signal a disease outbreak. The second is a genetic algorithm to design and train neural networks (GANN) that we applied toward disease forecasting. We tested these algorithms against influenza, respiratory illness, and Dengue Fever data. Through this LDRD in combination with other internal funding, we delivered a distributed simulation capability to synthesize disparate information and models for earlier recognition and improved decision-making in the event of a biological attack. The architecture incorporates user feedback and control so that a user's decision inputs can impact the scenario outcome as well as integrated security and role-based access-control for communicating between distributed data and analytical tools. This work included construction of interfaces to various commercial database products and to one of the data analysis algorithms developed through this LDRD.« less
Handwritten digits recognition based on immune network
NASA Astrophysics Data System (ADS)
Li, Yangyang; Wu, Yunhui; Jiao, Lc; Wu, Jianshe
2011-11-01
With the development of society, handwritten digits recognition technique has been widely applied to production and daily life. It is a very difficult task to solve these problems in the field of pattern recognition. In this paper, a new method is presented for handwritten digit recognition. The digit samples firstly are processed and features extraction. Based on these features, a novel immune network classification algorithm is designed and implemented to the handwritten digits recognition. The proposed algorithm is developed by Jerne's immune network model for feature selection and KNN method for classification. Its characteristic is the novel network with parallel commutating and learning. The performance of the proposed method is experimented to the handwritten number datasets MNIST and compared with some other recognition algorithms-KNN, ANN and SVM algorithm. The result shows that the novel classification algorithm based on immune network gives promising performance and stable behavior for handwritten digits recognition.
Flumignan, Danilo Luiz; Boralle, Nivaldo; Oliveira, José Eduardo de
2010-06-30
In this work, the combination of carbon nuclear magnetic resonance ((13)C NMR) fingerprinting with pattern-recognition analyses provides an original and alternative approach to screening commercial gasoline quality. Soft Independent Modelling of Class Analogy (SIMCA) was performed on spectroscopic fingerprints to classify representative commercial gasoline samples, which were selected by Hierarchical Cluster Analyses (HCA) over several months in retails services of gas stations, into previously quality-defined classes. Following optimized (13)C NMR-SIMCA algorithm, sensitivity values were obtained in the training set (99.0%), with leave-one-out cross-validation, and external prediction set (92.0%). Governmental laboratories could employ this method as a rapid screening analysis to discourage adulteration practices. Copyright 2010 Elsevier B.V. All rights reserved.
A novel complex networks clustering algorithm based on the core influence of nodes.
Tong, Chao; Niu, Jianwei; Dai, Bin; Xie, Zhongyu
2014-01-01
In complex networks, cluster structure, identified by the heterogeneity of nodes, has become a common and important topological property. Network clustering methods are thus significant for the study of complex networks. Currently, many typical clustering algorithms have some weakness like inaccuracy and slow convergence. In this paper, we propose a clustering algorithm by calculating the core influence of nodes. The clustering process is a simulation of the process of cluster formation in sociology. The algorithm detects the nodes with core influence through their betweenness centrality, and builds the cluster's core structure by discriminant functions. Next, the algorithm gets the final cluster structure after clustering the rest of the nodes in the network by optimizing method. Experiments on different datasets show that the clustering accuracy of this algorithm is superior to the classical clustering algorithm (Fast-Newman algorithm). It clusters faster and plays a positive role in revealing the real cluster structure of complex networks precisely.
An Improved Iris Recognition Algorithm Based on Hybrid Feature and ELM
NASA Astrophysics Data System (ADS)
Wang, Juan
2018-03-01
The iris image is easily polluted by noise and uneven light. This paper proposed an improved extreme learning machine (ELM) based iris recognition algorithm with hybrid feature. 2D-Gabor filters and GLCM is employed to generate a multi-granularity hybrid feature vector. 2D-Gabor filter and GLCM feature work for capturing low-intermediate frequency and high frequency texture information, respectively. Finally, we utilize extreme learning machine for iris recognition. Experimental results reveal our proposed ELM based multi-granularity iris recognition algorithm (ELM-MGIR) has higher accuracy of 99.86%, and lower EER of 0.12% under the premise of real-time performance. The proposed ELM-MGIR algorithm outperforms other mainstream iris recognition algorithms.
CNN universal machine as classificaton platform: an art-like clustering algorithm.
Bálya, David
2003-12-01
Fast and robust classification of feature vectors is a crucial task in a number of real-time systems. A cellular neural/nonlinear network universal machine (CNN-UM) can be very efficient as a feature detector. The next step is to post-process the results for object recognition. This paper shows how a robust classification scheme based on adaptive resonance theory (ART) can be mapped to the CNN-UM. Moreover, this mapping is general enough to include different types of feed-forward neural networks. The designed analogic CNN algorithm is capable of classifying the extracted feature vectors keeping the advantages of the ART networks, such as robust, plastic and fault-tolerant behaviors. An analogic algorithm is presented for unsupervised classification with tunable sensitivity and automatic new class creation. The algorithm is extended for supervised classification. The presented binary feature vector classification is implemented on the existing standard CNN-UM chips for fast classification. The experimental evaluation shows promising performance after 100% accuracy on the training set.
Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei
2014-09-01
In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Wang, Chenglin; Tang, Yunchao; Zou, Xiangjun; Luo, Lufeng; Chen, Xiong
2017-01-01
Recognition and matching of litchi fruits are critical steps for litchi harvesting robots to successfully grasp litchi. However, due to the randomness of litchi growth, such as clustered growth with uncertain number of fruits and random occlusion by leaves, branches and other fruits, the recognition and matching of the fruit become a challenge. Therefore, this study firstly defined mature litchi fruit as three clustered categories. Then an approach for recognition and matching of clustered mature litchi fruit was developed based on litchi color images acquired by binocular charge-coupled device (CCD) color cameras. The approach mainly included three steps: (1) calibration of binocular color cameras and litchi image acquisition; (2) segmentation of litchi fruits using four kinds of supervised classifiers, and recognition of the pre-defined categories of clustered litchi fruit using a pixel threshold method; and (3) matching the recognized clustered fruit using a geometric center-based matching method. The experimental results showed that the proposed recognition method could be robust against the influences of varying illumination and occlusion conditions, and precisely recognize clustered litchi fruit. In the tested 432 clustered litchi fruits, the highest and lowest average recognition rates were 94.17% and 92.00% under sunny back-lighting and partial occlusion, and sunny front-lighting and non-occlusion conditions, respectively. From 50 pairs of tested images, the highest and lowest matching success rates were 97.37% and 91.96% under sunny back-lighting and non-occlusion, and sunny front-lighting and partial occlusion conditions, respectively. PMID:29112177
The Pandora multi-algorithm approach to automated pattern recognition in LAr TPC detectors
NASA Astrophysics Data System (ADS)
Marshall, J. S.; Blake, A. S. T.; Thomson, M. A.; Escudero, L.; de Vries, J.; Weston, J.;
2017-09-01
The development and operation of Liquid Argon Time Projection Chambers (LAr TPCs) for neutrino physics has created a need for new approaches to pattern recognition, in order to fully exploit the superb imaging capabilities offered by this technology. The Pandora Software Development Kit provides functionality to aid the process of designing, implementing and running pattern recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition: individual algorithms each address a specific task in a particular topology; a series of many tens of algorithms then carefully builds-up a picture of the event. The input to the Pandora pattern recognition is a list of 2D Hits. The output from the chain of over 70 algorithms is a hierarchy of reconstructed 3D Particles, each with an identified particle type, vertex and direction.
NASA Astrophysics Data System (ADS)
Holtzman, B. K.; Paté, A.; Paisley, J.; Waldhauser, F.; Repetto, D.; Boschi, L.
2017-12-01
The earthquake process reflects complex interactions of stress, fracture and frictional properties. New machine learning methods reveal patterns in time-dependent spectral properties of seismic signals and enable identification of changes in faulting processes. Our methods are based closely on those developed for music information retrieval and voice recognition, using the spectrogram instead of the waveform directly. Unsupervised learning involves identification of patterns based on differences among signals without any additional information provided to the algorithm. Clustering of 46,000 earthquakes of $0.3
Artificial neural networks for acoustic target recognition
NASA Astrophysics Data System (ADS)
Robertson, James A.; Mossing, John C.; Weber, Bruce A.
1995-04-01
Acoustic sensors can be used to detect, track and identify non-line-of-sight targets passively. Attempts to alter acoustic emissions often result in an undesirable performance degradation. This research project investigates the use of neural networks for differentiating between features extracted from the acoustic signatures of sources. Acoustic data were filtered and digitized using a commercially available analog-digital convertor. The digital data was transformed to the frequency domain for additional processing using the FFT. Narrowband peak detection algorithms were incorporated to select peaks above a user defined SNR. These peaks were then used to generate a set of robust features which relate specifically to target components in varying background conditions. The features were then used as input into a backpropagation neural network. A K-means unsupervised clustering algorithm was used to determine the natural clustering of the observations. Comparisons between a feature set consisting of the normalized amplitudes of the first 250 frequency bins of the power spectrum and a set of 11 harmonically related features were made. Initial results indicate that even though some different target types had a tendency to group in the same clusters, the neural network was able to differentiate the targets. Successful identification of acoustic sources under varying operational conditions with high confidence levels was achieved.
NASA Astrophysics Data System (ADS)
Acciarri, R.; Adams, C.; An, R.; Anthony, J.; Asaadi, J.; Auger, M.; Bagby, L.; Balasubramanian, S.; Baller, B.; Barnes, C.; Barr, G.; Bass, M.; Bay, F.; Bishai, M.; Blake, A.; Bolton, T.; Camilleri, L.; Caratelli, D.; Carls, B.; Castillo Fernandez, R.; Cavanna, F.; Chen, H.; Church, E.; Cianci, D.; Cohen, E.; Collin, G. H.; Conrad, J. M.; Convery, M.; Crespo-Anadón, J. I.; Del Tutto, M.; Devitt, D.; Dytman, S.; Eberly, B.; Ereditato, A.; Escudero Sanchez, L.; Esquivel, J.; Fadeeva, A. A.; Fleming, B. T.; Foreman, W.; Furmanski, A. P.; Garcia-Gamez, D.; Garvey, G. T.; Genty, V.; Goeldi, D.; Gollapinni, S.; Graf, N.; Gramellini, E.; Greenlee, H.; Grosso, R.; Guenette, R.; Hackenburg, A.; Hamilton, P.; Hen, O.; Hewes, J.; Hill, C.; Ho, J.; Horton-Smith, G.; Hourlier, A.; Huang, E.-C.; James, C.; Jan de Vries, J.; Jen, C.-M.; Jiang, L.; Johnson, R. A.; Joshi, J.; Jostlein, H.; Kaleko, D.; Karagiorgi, G.; Ketchum, W.; Kirby, B.; Kirby, M.; Kobilarcik, T.; Kreslo, I.; Laube, A.; Li, Y.; Lister, A.; Littlejohn, B. R.; Lockwitz, S.; Lorca, D.; Louis, W. C.; Luethi, M.; Lundberg, B.; Luo, X.; Marchionni, A.; Mariani, C.; Marshall, J.; Martinez Caicedo, D. A.; Meddage, V.; Miceli, T.; Mills, G. B.; Moon, J.; Mooney, M.; Moore, C. D.; Mousseau, J.; Murrells, R.; Naples, D.; Nienaber, P.; Nowak, J.; Palamara, O.; Paolone, V.; Papavassiliou, V.; Pate, S. F.; Pavlovic, Z.; Piasetzky, E.; Porzio, D.; Pulliam, G.; Qian, X.; Raaf, J. L.; Rafique, A.; Rochester, L.; Rudolf von Rohr, C.; Russell, B.; Schmitz, D. W.; Schukraft, A.; Seligman, W.; Shaevitz, M. H.; Sinclair, J.; Smith, A.; Snider, E. L.; Soderberg, M.; Söldner-Rembold, S.; Soleti, S. R.; Spentzouris, P.; Spitz, J.; St. John, J.; Strauss, T.; Szelc, A. M.; Tagg, N.; Terao, K.; Thomson, M.; Toups, M.; Tsai, Y.-T.; Tufanli, S.; Usher, T.; Van De Pontseele, W.; Van de Water, R. G.; Viren, B.; Weber, M.; Wickremasinghe, D. A.; Wolbers, S.; Wongjirad, T.; Woodruff, K.; Yang, T.; Yates, L.; Zeller, G. P.; Zennamo, J.; Zhang, C.
2018-01-01
The development and operation of liquid-argon time-projection chambers for neutrino physics has created a need for new approaches to pattern recognition in order to fully exploit the imaging capabilities offered by this technology. Whereas the human brain can excel at identifying features in the recorded events, it is a significant challenge to develop an automated, algorithmic solution. The Pandora Software Development Kit provides functionality to aid the design and implementation of pattern-recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition, in which individual algorithms each address a specific task in a particular topology. Many tens of algorithms then carefully build up a picture of the event and, together, provide a robust automated pattern-recognition solution. This paper describes details of the chain of over one hundred Pandora algorithms and tools used to reconstruct cosmic-ray muon and neutrino events in the MicroBooNE detector. Metrics that assess the current pattern-recognition performance are presented for simulated MicroBooNE events, using a selection of final-state event topologies.
NASA Astrophysics Data System (ADS)
He, Xianjin; Zhang, Xinchang; Xin, Qinchuan
2018-02-01
Recognition of building group patterns (i.e., the arrangement and form exhibited by a collection of buildings at a given mapping scale) is important to the understanding and modeling of geographic space and is hence essential to a wide range of downstream applications such as map generalization. Most of the existing methods develop rigid rules based on the topographic relationships between building pairs to identify building group patterns and thus their applications are often limited. This study proposes a method to identify a variety of building group patterns that allow for map generalization. The method first identifies building group patterns from potential building clusters based on a machine-learning algorithm and further partitions the building clusters with no recognized patterns based on the graph partitioning method. The proposed method is applied to the datasets of three cities that are representative of the complex urban environment in Southern China. Assessment of the results based on the reference data suggests that the proposed method is able to recognize both regular (e.g., the collinear, curvilinear, and rectangular patterns) and irregular (e.g., the L-shaped, H-shaped, and high-density patterns) building group patterns well, given that the correctness values are consistently nearly 90% and the completeness values are all above 91% for three study areas. The proposed method shows promises in automated recognition of building group patterns that allows for map generalization.
Geometry-based populated chessboard recognition
NASA Astrophysics Data System (ADS)
Xie, Youye; Tang, Gongguo; Hoff, William
2018-04-01
Chessboards are commonly used to calibrate cameras, and many robust methods have been developed to recognize the unoccupied boards. However, when the chessboard is populated with chess pieces, such as during an actual game, the problem of recognizing the board is much harder. Challenges include occlusion caused by the chess pieces, the presence of outlier lines and low viewing angles of the chessboard. In this paper, we present a novel approach to address the above challenges and recognize the chessboard. The Canny edge detector and Hough transform are used to capture all possible lines in the scene. The k-means clustering and a k-nearest-neighbors inspired algorithm are applied to cluster and reject the outlier lines based on their Euclidean distances to the nearest neighbors in a scaled Hough transform space. Finally, based on prior knowledge of the chessboard structure, a geometric constraint is used to find the correspondences between image lines and the lines on the chessboard through the homography transformation. The proposed algorithm works for a wide range of the operating angles and achieves high accuracy in experiments.
SOTXTSTREAM: Density-based self-organizing clustering of text streams.
Bryant, Avory C; Cios, Krzysztof J
2017-01-01
A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets.
On the problem of earthquake correlation in space and time over large distances
NASA Astrophysics Data System (ADS)
Georgoulas, G.; Konstantaras, A.; Maravelakis, E.; Katsifarakis, E.; Stylios, C. D.
2012-04-01
A quick examination of geographical maps with the epicenters of earthquakes marked on them reveals a strong tendency of these points to form compact clusters of irregular shapes and various sizes often traversing with other clusters. According to [Saleur et al. 1996] "earthquakes are correlated in space and time over large distances". This implies that seismic sequences are not formatted randomly but they follow a spatial pattern with consequent triggering of events. Seismic cluster formation is believed to be due to underlying geological natural hazards, which: a) act as the energy storage elements of the phenomenon, and b) tend to form a complex network of numerous interacting faults [Vallianatos and Tzanis, 1998]. Therefore it is imperative to "isolate" meaningful structures (clusters) in order to mine information regarding the underlying mechanism and at a second stage to test the causality effect implied by what is known as the Domino theory [Burgman, 2009]. Ongoing work by Konstantaras et al. 2011 and Katsifarakis et al. 2011 on clustering seismic sequences in the area of the Southern Hellenic Arc and progressively throughout the Greek vicinity and the entire Mediterranean region based on an explicit segmentation of the data based both on their temporal and spatial stamp, following modelling assumptions proposed by Dobrovolsky et al. 1989 and Drakatos et al. 2001, managed to identify geologically validated seismic clusters. These results suggest that that the time component should be included as a dimension during the clustering process as seismic cluster formation is dynamic and the emerging clusters propagate in time. Another issue that has not been investigated yet explicitly is the role of the magnitude of each seismic event. In other words the major seismic event should be treated differently compared to pre or post seismic sequences. Moreover the sometimes irregular and elongated shapes that appear on geophysical maps means that clustering algorithms such as the well known k-means that tend to form "well-shaped" clusters may not suffice for the problem at hand and other families of unsupervised pattern recognition methods might be a better choice. One such algorithm is the DBSCAN algorithm which is based on the notion of density. In this proposed version the density is not estimated solely on the number of seismic events occurring at a specific spatio-temporal area, but also takes into account the size of the seismic event. A second method proposes the use of a modified measure of proximity that will also account for the size of the earthquake along with traditional clustering schemes such as k-means and agglomerative clustering (k-means is seeded with a quite large number for k and the results are fed to the hierarchical algorithm in order to alleviate the memory requirements on one hand and also allow for irregular shapes on the other hand). Preliminary results of seismic cluster formation using these algorithms appear promising as they are in agreement with geophysical observations on distinct seismic regions, such as those of the neighbouring regions in the Ionian sea and that of the southern Hellenic seismic arc; as well as by the location and orientation of the mapped network of underlying natural hazards beneath each clusters vicinity.
A GPU-paralleled implementation of an enhanced face recognition algorithm
NASA Astrophysics Data System (ADS)
Chen, Hao; Liu, Xiyang; Shao, Shuai; Zan, Jiguo
2013-03-01
Face recognition algorithm based on compressed sensing and sparse representation is hotly argued in these years. The scheme of this algorithm increases recognition rate as well as anti-noise capability. However, the computational cost is expensive and has become a main restricting factor for real world applications. In this paper, we introduce a GPU-accelerated hybrid variant of face recognition algorithm named parallel face recognition algorithm (pFRA). We describe here how to carry out parallel optimization design to take full advantage of many-core structure of a GPU. The pFRA is tested and compared with several other implementations under different data sample size. Finally, Our pFRA, implemented with NVIDIA GPU and Computer Unified Device Architecture (CUDA) programming model, achieves a significant speedup over the traditional CPU implementations.
Mining the modular structure of protein interaction networks.
Berenstein, Ariel José; Piñero, Janet; Furlong, Laura Inés; Chernomoretz, Ariel
2015-01-01
Cluster-based descriptions of biological networks have received much attention in recent years fostered by accumulated evidence of the existence of meaningful correlations between topological network clusters and biological functional modules. Several well-performing clustering algorithms exist to infer topological network partitions. However, due to respective technical idiosyncrasies they might produce dissimilar modular decompositions of a given network. In this contribution, we aimed to analyze how alternative modular descriptions could condition the outcome of follow-up network biology analysis. We considered a human protein interaction network and two paradigmatic cluster recognition algorithms, namely: the Clauset-Newman-Moore and the infomap procedures. We analyzed to what extent both methodologies yielded different results in terms of granularity and biological congruency. In addition, taking into account Guimera's cartographic role characterization of network nodes, we explored how the adoption of a given clustering methodology impinged on the ability to highlight relevant network meso-scale connectivity patterns. As a case study we considered a set of aging related proteins and showed that only the high-resolution modular description provided by infomap, could unveil statistically significant associations between them and inter/intra modular cartographic features. Besides reporting novel biological insights that could be gained from the discovered associations, our contribution warns against possible technical concerns that might affect the tools used to mine for interaction patterns in network biology studies. In particular our results suggested that sub-optimal partitions from the strict point of view of their modularity levels might still be worth being analyzed when meso-scale features were to be explored in connection with external source of biological knowledge.
Fast and robust segmentation in the SDO-AIA era
NASA Astrophysics Data System (ADS)
Verbeeck, Cis; Delouille, Véronique; Mampaey, Benjamin; Hochedez, Jean-François; Boyes, David; Barra, Vincent
Solar images from the Atmospheric Imaging Assembly (AIA) aboard the Solar Dynamics Ob-servatory (SDO) will flood the solar physics community with a wealth of information on solar variability, of great importance both in solar physics and in view of Space Weather applica-tions. Obtaining this information, however, requires the ability to automatically process large amounts of data in an objective fashion. In previous work, we have proposed a multi-channel unsupervised spatially-constrained multi-channel fuzzy clustering algorithm (SPoCA) that automatically segments EUV solar images into Active Regions (AR), Coronal Holes (CH), and Quiet Sun (QS). This algorithm will run in near real time on AIA data as part of the SDO Feature Finding Project, a suite of software pipeline modules for automated feature recognition and analysis for the imagery from SDO. After having corrected for the limb brightening effect, SPoCA computes an optimal clustering with respect to the regions of interest using fuzzy logic on a quality criterion to manage the various noises present in the images and the imprecision in the definition of the above regions. Next, the algorithm applies a morphological opening operation, smoothing the cluster edges while preserving their general shape. The process is fast and automatic. A lower size limit is used to distinguish AR from Bright Points. As the algorithm segments the coronal images according to their brightness, it might happen that an AR is detected as several disjoint pieces, if the brightness in between is somewhat lower. Morphological dilation is employed to reconstruct the AR themselves from their constituent pieces. Combining SPoCA's detection of AR, CH, and QS on subsequent images allows automatic tracking and naming of any region of interest. In the SDO software pipeline, SPoCA will auto-matically populate the Heliophysics Events Knowledgebase(HEK) with Active Region events. Further, the algorithm has a huge potential for correct and automatic identification of AR, CH, and QS in any study that aims to address properties of those specific regions in the corona. SPoCA is now ready and waiting to tackle solar cycle 24 using SDO data. While we presently apply SPoCA to EUV data, the method is generic enough to allow the introduction of other channels or data, e.g., Differential Emission Measure (DEM) maps. Because of the unprecedented challenges brought up by the quantity of SDO data, European partners have gathered within an ISSI team on `Mining and Exploiting the NASA Solar Dynam-ics Observatory data in Europe' (a.k.a. Soldyneuro). Its aim is to provide automated feature recognition algorithms for scanning the SDO archive, as well as conducting scientific studies that combine different algorithm's outputs. Within the Soldyneuro project, we will use data from the EUV Variability Experiment (EVE) spectrometer in order to estimate the full Sun DEM. This DEM will next be used to estimate the total flux from AIA images so as to provide a validation for the calibration of AIA.
Target recognition of ladar range images using slice image: comparison of four improved algorithms
NASA Astrophysics Data System (ADS)
Xia, Wenze; Han, Shaokun; Cao, Jingya; Wang, Liang; Zhai, Yu; Cheng, Yang
2017-07-01
Compared with traditional 3-D shape data, ladar range images possess properties of strong noise, shape degeneracy, and sparsity, which make feature extraction and representation difficult. The slice image is an effective feature descriptor to resolve this problem. We propose four improved algorithms on target recognition of ladar range images using slice image. In order to improve resolution invariance of the slice image, mean value detection instead of maximum value detection is applied in these four improved algorithms. In order to improve rotation invariance of the slice image, three new improved feature descriptors-which are feature slice image, slice-Zernike moments, and slice-Fourier moments-are applied to the last three improved algorithms, respectively. Backpropagation neural networks are used as feature classifiers in the last two improved algorithms. The performance of these four improved recognition systems is analyzed comprehensively in the aspects of the three invariances, recognition rate, and execution time. The final experiment results show that the improvements for these four algorithms reach the desired effect, the three invariances of feature descriptors are not directly related to the final recognition performance of recognition systems, and these four improved recognition systems have different performances under different conditions.
The global Minmax k-means algorithm.
Wang, Xiaoyan; Bai, Yanping
2016-01-01
The global k -means algorithm is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure from suitable initial positions, and employs k -means to minimize the sum of the intra-cluster variances. However the global k -means algorithm sometimes results singleton clusters and the initial positions sometimes are bad, after a bad initialization, poor local optimal can be easily obtained by k -means algorithm. In this paper, we modified the global k -means algorithm to eliminate the singleton clusters at first, and then we apply MinMax k -means clustering error method to global k -means algorithm to overcome the effect of bad initialization, proposed the global Minmax k -means algorithm. The proposed clustering method is tested on some popular data sets and compared to the k -means algorithm, the global k -means algorithm and the MinMax k -means algorithm. The experiment results show our proposed algorithm outperforms other algorithms mentioned in the paper.
Noise-enhanced clustering and competitive learning algorithms.
Osoba, Osonde; Kosko, Bart
2013-01-01
Noise can provably speed up convergence in many centroid-based clustering algorithms. This includes the popular k-means clustering algorithm. The clustering noise benefit follows from the general noise benefit for the expectation-maximization algorithm because many clustering algorithms are special cases of the expectation-maximization algorithm. Simulations show that noise also speeds up convergence in stochastic unsupervised competitive learning, supervised competitive learning, and differential competitive learning. Copyright © 2012 Elsevier Ltd. All rights reserved.
Hierarchical clustering of EMD based interest points for road sign detection
NASA Astrophysics Data System (ADS)
Khan, Jesmin; Bhuiyan, Sharif; Adhami, Reza
2014-04-01
This paper presents an automatic road traffic signs detection and recognition system based on hierarchical clustering of interest points and joint transform correlation. The proposed algorithm consists of the three following stages: interest points detection, clustering of those points and similarity search. At the first stage, good discriminative, rotation and scale invariant interest points are selected from the image edges based on the 1-D empirical mode decomposition (EMD). We propose a two-step unsupervised clustering technique, which is adaptive and based on two criterion. In this context, the detected points are initially clustered based on the stable local features related to the brightness and color, which are extracted using Gabor filter. Then points belonging to each partition are reclustered depending on the dispersion of the points in the initial cluster using position feature. This two-step hierarchical clustering yields the possible candidate road signs or the region of interests (ROIs). Finally, a fringe-adjusted joint transform correlation (JTC) technique is used for matching the unknown signs with the existing known reference road signs stored in the database. The presented framework provides a novel way to detect a road sign from the natural scenes and the results demonstrate the efficacy of the proposed technique, which yields a very low false hit rate.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Acciarri, R.; Adams, C.; An, R.
The development and operation of Liquid-Argon Time-Projection Chambers for neutrino physics has created a need for new approaches to pattern recognition in order to fully exploit the imaging capabilities offered by this technology. Whereas the human brain can excel at identifying features in the recorded events, it is a significant challenge to develop an automated, algorithmic solution. The Pandora Software Development Kit provides functionality to aid the design and implementation of pattern-recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition, in which individual algorithms each address a specific task in a particular topology. Many tens ofmore » algorithms then carefully build up a picture of the event and, together, provide a robust automated pattern-recognition solution. This paper describes details of the chain of over one hundred Pandora algorithms and tools used to reconstruct cosmic-ray muon and neutrino events in the MicroBooNE detector. Metrics that assess the current pattern-recognition performance are presented for simulated MicroBooNE events, using a selection of final-state event topologies.« less
Acciarri, R.; Adams, C.; An, R.; ...
2018-01-29
The development and operation of Liquid-Argon Time-Projection Chambers for neutrino physics has created a need for new approaches to pattern recognition in order to fully exploit the imaging capabilities offered by this technology. Whereas the human brain can excel at identifying features in the recorded events, it is a significant challenge to develop an automated, algorithmic solution. The Pandora Software Development Kit provides functionality to aid the design and implementation of pattern-recognition algorithms. It promotes the use of a multi-algorithm approach to pattern recognition, in which individual algorithms each address a specific task in a particular topology. Many tens ofmore » algorithms then carefully build up a picture of the event and, together, provide a robust automated pattern-recognition solution. This paper describes details of the chain of over one hundred Pandora algorithms and tools used to reconstruct cosmic-ray muon and neutrino events in the MicroBooNE detector. Metrics that assess the current pattern-recognition performance are presented for simulated MicroBooNE events, using a selection of final-state event topologies.« less
A Review of Subsequence Time Series Clustering
Teh, Ying Wah
2014-01-01
Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies. PMID:25140332
A review of subsequence time series clustering.
Zolhavarieh, Seyedjamal; Aghabozorgi, Saeed; Teh, Ying Wah
2014-01-01
Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.
Unsupervised EEG analysis for automated epileptic seizure detection
NASA Astrophysics Data System (ADS)
Birjandtalab, Javad; Pouyan, Maziyar Baran; Nourani, Mehrdad
2016-07-01
Epilepsy is a neurological disorder which can, if not controlled, potentially cause unexpected death. It is extremely crucial to have accurate automatic pattern recognition and data mining techniques to detect the onset of seizures and inform care-givers to help the patients. EEG signals are the preferred biosignals for diagnosis of epileptic patients. Most of the existing pattern recognition techniques used in EEG analysis leverage the notion of supervised machine learning algorithms. Since seizure data are heavily under-represented, such techniques are not always practical particularly when the labeled data is not sufficiently available or when disease progression is rapid and the corresponding EEG footprint pattern will not be robust. Furthermore, EEG pattern change is highly individual dependent and requires experienced specialists to annotate the seizure and non-seizure events. In this work, we present an unsupervised technique to discriminate seizures and non-seizures events. We employ power spectral density of EEG signals in different frequency bands that are informative features to accurately cluster seizure and non-seizure events. The experimental results tried so far indicate achieving more than 90% accuracy in clustering seizure and non-seizure events without having any prior knowledge on patient's history.
Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels
Maulik, Ujjwal; Sarkar, Anasua
2013-01-01
Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of “recent” paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr. PMID:23457439
Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.
Maulik, Ujjwal; Sarkar, Anasua
2013-01-01
Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. sarkar@labri.fr.
Score-Level Fusion of Phase-Based and Feature-Based Fingerprint Matching Algorithms
NASA Astrophysics Data System (ADS)
Ito, Koichi; Morita, Ayumi; Aoki, Takafumi; Nakajima, Hiroshi; Kobayashi, Koji; Higuchi, Tatsuo
This paper proposes an efficient fingerprint recognition algorithm combining phase-based image matching and feature-based matching. In our previous work, we have already proposed an efficient fingerprint recognition algorithm using Phase-Only Correlation (POC), and developed commercial fingerprint verification units for access control applications. The use of Fourier phase information of fingerprint images makes it possible to achieve robust recognition for weakly impressed, low-quality fingerprint images. This paper presents an idea of improving the performance of POC-based fingerprint matching by combining it with feature-based matching, where feature-based matching is introduced in order to improve recognition efficiency for images with nonlinear distortion. Experimental evaluation using two different types of fingerprint image databases demonstrates efficient recognition performance of the combination of the POC-based algorithm and the feature-based algorithm.
Bahlmann, Claus; Burkhardt, Hans
2004-03-01
In this paper, we give a comprehensive description of our writer-independent online handwriting recognition system frog on hand. The focus of this work concerns the presentation of the classification/training approach, which we call cluster generative statistical dynamic time warping (CSDTW). CSDTW is a general, scalable, HMM-based method for variable-sized, sequential data that holistically combines cluster analysis and statistical sequence modeling. It can handle general classification problems that rely on this sequential type of data, e.g., speech recognition, genome processing, robotics, etc. Contrary to previous attempts, clustering and statistical sequence modeling are embedded in a single feature space and use a closely related distance measure. We show character recognition experiments of frog on hand using CSDTW on the UNIPEN online handwriting database. The recognition accuracy is significantly higher than reported results of other handwriting recognition systems. Finally, we describe the real-time implementation of frog on hand on a Linux Compaq iPAQ embedded device.
Model and algorithmic framework for detection and correction of cognitive errors.
Feki, Mohamed Ali; Biswas, Jit; Tolstikov, Andrei
2009-01-01
This paper outlines an approach that we are taking for elder-care applications in the smart home, involving cognitive errors and their compensation. Our approach involves high level modeling of daily activities of the elderly by breaking down these activities into smaller units, which can then be automatically recognized at a low level by collections of sensors placed in the homes of the elderly. This separation allows us to employ plan recognition algorithms and systems at a high level, while developing stand-alone activity recognition algorithms and systems at a low level. It also allows the mixing and matching of multi-modality sensors of various kinds that go to support the same high level requirement. Currently our plan recognition algorithms are still at a conceptual stage, whereas a number of low level activity recognition algorithms and systems have been developed. Herein we present our model for plan recognition, providing a brief survey of the background literature. We also present some concrete results that we have achieved for activity recognition, emphasizing how these results are incorporated into the overall plan recognition system.
Physical environment virtualization for human activities recognition
NASA Astrophysics Data System (ADS)
Poshtkar, Azin; Elangovan, Vinayak; Shirkhodaie, Amir; Chan, Alex; Hu, Shuowen
2015-05-01
Human activity recognition research relies heavily on extensive datasets to verify and validate performance of activity recognition algorithms. However, obtaining real datasets are expensive and highly time consuming. A physics-based virtual simulation can accelerate the development of context based human activity recognition algorithms and techniques by generating relevant training and testing videos simulating diverse operational scenarios. In this paper, we discuss in detail the requisite capabilities of a virtual environment to aid as a test bed for evaluating and enhancing activity recognition algorithms. To demonstrate the numerous advantages of virtual environment development, a newly developed virtual environment simulation modeling (VESM) environment is presented here to generate calibrated multisource imagery datasets suitable for development and testing of recognition algorithms for context-based human activities. The VESM environment serves as a versatile test bed to generate a vast amount of realistic data for training and testing of sensor processing algorithms. To demonstrate the effectiveness of VESM environment, we present various simulated scenarios and processed results to infer proper semantic annotations from the high fidelity imagery data for human-vehicle activity recognition under different operational contexts.
Clustering PPI data by combining FA and SHC method.
Lei, Xiujuan; Ying, Chao; Wu, Fang-Xiang; Xu, Jin
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value.
Clustering PPI data by combining FA and SHC method
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632
Vehicle logo recognition using multi-level fusion model
NASA Astrophysics Data System (ADS)
Ming, Wei; Xiao, Jianli
2018-04-01
Vehicle logo recognition plays an important role in manufacturer identification and vehicle recognition. This paper proposes a new vehicle logo recognition algorithm. It has a hierarchical framework, which consists of two fusion levels. At the first level, a feature fusion model is employed to map the original features to a higher dimension feature space. In this space, the vehicle logos become more recognizable. At the second level, a weighted voting strategy is proposed to promote the accuracy and the robustness of the recognition results. To evaluate the performance of the proposed algorithm, extensive experiments are performed, which demonstrate that the proposed algorithm can achieve high recognition accuracy and work robustly.
Information Clustering Based on Fuzzy Multisets.
ERIC Educational Resources Information Center
Miyamoto, Sadaaki
2003-01-01
Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…
Shin, Young Hoon; Seo, Jiwon
2016-01-01
People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing. PMID:27801867
Shin, Young Hoon; Seo, Jiwon
2016-10-29
People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.
An improved clustering algorithm based on reverse learning in intelligent transportation
NASA Astrophysics Data System (ADS)
Qiu, Guoqing; Kou, Qianqian; Niu, Ting
2017-05-01
With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.
A roadmap of clustering algorithms: finding a match for a biomedical application.
Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael
2009-05-01
Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.
Efficient clustering aggregation based on data fragments.
Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing
2012-06-01
Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.
Multi-Scale Voxel Segmentation for Terrestrial Lidar Data within Marshes
NASA Astrophysics Data System (ADS)
Nguyen, C. T.; Starek, M. J.; Tissot, P.; Gibeaut, J. C.
2016-12-01
The resilience of marshes to a rising sea is dependent on their elevation response. Terrestrial laser scanning (TLS) is a detailed topographic approach for accurate, dense surface measurement with high potential for monitoring of marsh surface elevation response. The dense point cloud provides a 3D representation of the surface, which includes both terrain and non-terrain objects. Extraction of topographic information requires filtering of the data into like-groups or classes, therefore, methods must be incorporated to identify structure in the data prior to creation of an end product. A voxel representation of three-dimensional space provides quantitative visualization and analysis for pattern recognition. The objectives of this study are threefold: 1) apply a multi-scale voxel approach to effectively extract geometric features from the TLS point cloud data, 2) investigate the utility of K-means and Self Organizing Map (SOM) clustering algorithms for segmentation, and 3) utilize a variety of validity indices to measure the quality of the result. TLS data were collected at a marsh site along the central Texas Gulf Coast using a Riegl VZ 400 TLS. The site consists of both exposed and vegetated surface regions. To characterize structure of the point cloud, octree segmentation is applied to create a tree data structure of voxels containing the points. The flexibility of voxels in size and point density makes this algorithm a promising candidate to locally extract statistical and geometric features of the terrain including surface normal and curvature. The characteristics of the voxel itself such as the volume and point density are also computed and assigned to each point as are laser pulse characteristics. The features extracted from the voxelization are then used as input for clustering of the points using the K-means and SOM clustering algorithms. Optimal number of clusters are then determined based on evaluation of cluster separability criterions. Results for different combinations of the feature space vector and differences between K-means and SOM clustering will be presented. The developed method provides a novel approach for compressing TLS scene complexity in marshes, such as for vegetation biomass studies or erosion monitoring.
SemiBoost: boosting for semi-supervised learning.
Mallapragada, Pavan Kumar; Jin, Rong; Jain, Anil K; Liu, Yi
2009-11-01
Semi-supervised learning has attracted a significant amount of attention in pattern recognition and machine learning. Most previous studies have focused on designing special algorithms to effectively exploit the unlabeled data in conjunction with labeled data. Our goal is to improve the classification accuracy of any given supervised learning algorithm by using the available unlabeled examples. We call this as the Semi-supervised improvement problem, to distinguish the proposed approach from the existing approaches. We design a metasemi-supervised learning algorithm that wraps around the underlying supervised algorithm and improves its performance using unlabeled data. This problem is particularly important when we need to train a supervised learning algorithm with a limited number of labeled examples and a multitude of unlabeled examples. We present a boosting framework for semi-supervised learning, termed as SemiBoost. The key advantages of the proposed semi-supervised learning approach are: 1) performance improvement of any supervised learning algorithm with a multitude of unlabeled data, 2) efficient computation by the iterative boosting algorithm, and 3) exploiting both manifold and cluster assumption in training classification models. An empirical study on 16 different data sets and text categorization demonstrates that the proposed framework improves the performance of several commonly used supervised learning algorithms, given a large number of unlabeled examples. We also show that the performance of the proposed algorithm, SemiBoost, is comparable to the state-of-the-art semi-supervised learning algorithms.
Practical automatic Arabic license plate recognition system
NASA Astrophysics Data System (ADS)
Mohammad, Khader; Agaian, Sos; Saleh, Hani
2011-02-01
Since 1970's, the need of an automatic license plate recognition system, sometimes referred as Automatic License Plate Recognition system, has been increasing. A license plate recognition system is an automatic system that is able to recognize a license plate number, extracted from image sensors. In specific, Automatic License Plate Recognition systems are being used in conjunction with various transportation systems in application areas such as law enforcement (e.g. speed limit enforcement) and commercial usages such as parking enforcement and automatic toll payment private and public entrances, border control, theft and vandalism control. Vehicle license plate recognition has been intensively studied in many countries. Due to the different types of license plates being used, the requirement of an automatic license plate recognition system is different for each country. [License plate detection using cluster run length smoothing algorithm ].Generally, an automatic license plate localization and recognition system is made up of three modules; license plate localization, character segmentation and optical character recognition modules. This paper presents an Arabic license plate recognition system that is insensitive to character size, font, shape and orientation with extremely high accuracy rate. The proposed system is based on a combination of enhancement, license plate localization, morphological processing, and feature vector extraction using the Haar transform. The performance of the system is fast due to classification of alphabet and numerals based on the license plate organization. Experimental results for license plates of two different Arab countries show an average of 99 % successful license plate localization and recognition in a total of more than 20 different images captured from a complex outdoor environment. The results run times takes less time compared to conventional and many states of art methods.
NASA Astrophysics Data System (ADS)
Babayan, Pavel; Smirnov, Sergey; Strotov, Valery
2017-10-01
This paper describes the aerial object recognition algorithm for on-board and stationary vision system. Suggested algorithm is intended to recognize the objects of a specific kind using the set of the reference objects defined by 3D models. The proposed algorithm based on the outer contour descriptor building. The algorithm consists of two stages: learning and recognition. Learning stage is devoted to the exploring of reference objects. Using 3D models we can build the database containing training images by rendering the 3D model from viewpoints evenly distributed on a sphere. Sphere points distribution is made by the geosphere principle. Gathered training image set is used for calculating descriptors, which will be used in the recognition stage of the algorithm. The recognition stage is focusing on estimating the similarity of the captured object and the reference objects by matching an observed image descriptor and the reference object descriptors. The experimental research was performed using a set of the models of the aircraft of the different types (airplanes, helicopters, UAVs). The proposed orientation estimation algorithm showed good accuracy in all case studies. The real-time performance of the algorithm in FPGA-based vision system was demonstrated.
A clustering method of Chinese medicine prescriptions based on modified firefly algorithm.
Yuan, Feng; Liu, Hong; Chen, Shou-Qiang; Xu, Liang
2016-12-01
This paper is aimed to study the clustering method for Chinese medicine (CM) medical cases. The traditional K-means clustering algorithm had shortcomings such as dependence of results on the selection of initial value, trapping in local optimum when processing prescriptions form CM medical cases. Therefore, a new clustering method based on the collaboration of firefly algorithm and simulated annealing algorithm was proposed. This algorithm dynamically determined the iteration of firefly algorithm and simulates sampling of annealing algorithm by fitness changes, and increased the diversity of swarm through expansion of the scope of the sudden jump, thereby effectively avoiding premature problem. The results from confirmatory experiments for CM medical cases suggested that, comparing with traditional K-means clustering algorithms, this method was greatly improved in the individual diversity and the obtained clustering results, the computing results from this method had a certain reference value for cluster analysis on CM prescriptions.
ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.
Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi
2015-01-01
Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.
Jothi, R; Mohanty, Sraban Kumar; Ojha, Aparajita
2016-04-01
Gene expression data clustering is an important biological process in DNA microarray analysis. Although there have been many clustering algorithms for gene expression analysis, finding a suitable and effective clustering algorithm is always a challenging problem due to the heterogeneous nature of gene profiles. Minimum Spanning Tree (MST) based clustering algorithms have been successfully employed to detect clusters of varying shapes and sizes. This paper proposes a novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST). As MST of a set of points reflects the similarity of the points with their neighborhood, the proposed algorithm employs a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph). By studying the spectral properties of the similarity matrix obtained from k(')-MST graph, the proposed algorithm achieves improved clustering results. We demonstrate the efficacy of the proposed algorithm on 12 gene expression datasets. Experimental results show that the proposed algorithm performs better than the standard clustering algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Ming; Xie, Fei; Zhao, Jing; Sun, Rui; Zhang, Lei; Zhang, Yue
2018-04-01
The prosperity of license plate recognition technology has made great contribution to the development of Intelligent Transport System (ITS). In this paper, a robust and efficient license plate recognition method is proposed which is based on a combined feature extraction model and BPNN (Back Propagation Neural Network) algorithm. Firstly, the candidate region of the license plate detection and segmentation method is developed. Secondly, a new feature extraction model is designed considering three sets of features combination. Thirdly, the license plates classification and recognition method using the combined feature model and BPNN algorithm is presented. Finally, the experimental results indicate that the license plate segmentation and recognition both can be achieved effectively by the proposed algorithm. Compared with three traditional methods, the recognition accuracy of the proposed method has increased to 95.7% and the consuming time has decreased to 51.4ms.
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
Tensor Rank Preserving Discriminant Analysis for Facial Recognition.
Tao, Dapeng; Guo, Yanan; Li, Yaotang; Gao, Xinbo
2017-10-12
Facial recognition, one of the basic topics in computer vision and pattern recognition, has received substantial attention in recent years. However, for those traditional facial recognition algorithms, the facial images are reshaped to a long vector, thereby losing part of the original spatial constraints of each pixel. In this paper, a new tensor-based feature extraction algorithm termed tensor rank preserving discriminant analysis (TRPDA) for facial image recognition is proposed; the proposed method involves two stages: in the first stage, the low-dimensional tensor subspace of the original input tensor samples was obtained; in the second stage, discriminative locality alignment was utilized to obtain the ultimate vector feature representation for subsequent facial recognition. On the one hand, the proposed TRPDA algorithm fully utilizes the natural structure of the input samples, and it applies an optimization criterion that can directly handle the tensor spectral analysis problem, thereby decreasing the computation cost compared those traditional tensor-based feature selection algorithms. On the other hand, the proposed TRPDA algorithm extracts feature by finding a tensor subspace that preserves most of the rank order information of the intra-class input samples. Experiments on the three facial databases are performed here to determine the effectiveness of the proposed TRPDA algorithm.
Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection
Liu, Wenfen
2017-01-01
Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447
Karayiannis, N B
2000-01-01
This paper presents the development and investigates the properties of ordered weighted learning vector quantization (LVQ) and clustering algorithms. These algorithms are developed by using gradient descent to minimize reformulation functions based on aggregation operators. An axiomatic approach provides conditions for selecting aggregation operators that lead to admissible reformulation functions. Minimization of admissible reformulation functions based on ordered weighted aggregation operators produces a family of soft LVQ and clustering algorithms, which includes fuzzy LVQ and clustering algorithms as special cases. The proposed LVQ and clustering algorithms are used to perform segmentation of magnetic resonance (MR) images of the brain. The diagnostic value of the segmented MR images provides the basis for evaluating a variety of ordered weighted LVQ and clustering algorithms.
Multifeature-based high-resolution palmprint recognition.
Dai, Jifeng; Zhou, Jie
2011-05-01
Palmprint is a promising biometric feature for use in access control and forensic applications. Previous research on palmprint recognition mainly concentrates on low-resolution (about 100 ppi) palmprints. But for high-security applications (e.g., forensic usage), high-resolution palmprints (500 ppi or higher) are required from which more useful information can be extracted. In this paper, we propose a novel recognition algorithm for high-resolution palmprint. The main contributions of the proposed algorithm include the following: 1) use of multiple features, namely, minutiae, density, orientation, and principal lines, for palmprint recognition to significantly improve the matching performance of the conventional algorithm. 2) Design of a quality-based and adaptive orientation field estimation algorithm which performs better than the existing algorithm in case of regions with a large number of creases. 3) Use of a novel fusion scheme for an identification application which performs better than conventional fusion methods, e.g., weighted sum rule, SVMs, or Neyman-Pearson rule. Besides, we analyze the discriminative power of different feature combinations and find that density is very useful for palmprint recognition. Experimental results on the database containing 14,576 full palmprints show that the proposed algorithm has achieved a good performance. In the case of verification, the recognition system's False Rejection Rate (FRR) is 16 percent, which is 17 percent lower than the best existing algorithm at a False Acceptance Rate (FAR) of 10(-5), while in the identification experiment, the rank-1 live-scan partial palmprint recognition rate is improved from 82.0 to 91.7 percent.
Hierarchical Dirichlet process model for gene expression clustering
2013-01-01
Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447
Canonical PSO Based K-Means Clustering Approach for Real Datasets.
Dey, Lopamudra; Chakraborty, Sanjay
2014-01-01
"Clustering" the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.
Computational intelligence techniques for biological data mining: An overview
NASA Astrophysics Data System (ADS)
Faye, Ibrahima; Iqbal, Muhammad Javed; Said, Abas Md; Samir, Brahim Belhaouari
2014-10-01
Computational techniques have been successfully utilized for a highly accurate analysis and modeling of multifaceted and raw biological data gathered from various genome sequencing projects. These techniques are proving much more effective to overcome the limitations of the traditional in-vitro experiments on the constantly increasing sequence data. However, most critical problems that caught the attention of the researchers may include, but not limited to these: accurate structure and function prediction of unknown proteins, protein subcellular localization prediction, finding protein-protein interactions, protein fold recognition, analysis of microarray gene expression data, etc. To solve these problems, various classification and clustering techniques using machine learning have been extensively used in the published literature. These techniques include neural network algorithms, genetic algorithms, fuzzy ARTMAP, K-Means, K-NN, SVM, Rough set classifiers, decision tree and HMM based algorithms. Major difficulties in applying the above algorithms include the limitations found in the previous feature encoding and selection methods while extracting the best features, increasing classification accuracy and decreasing the running time overheads of the learning algorithms. The application of this research would be potentially useful in the drug design and in the diagnosis of some diseases. This paper presents a concise overview of the well-known protein classification techniques.
Unsupervised learning of structure in spectroscopic cubes
NASA Astrophysics Data System (ADS)
Araya, M.; Mendoza, M.; Solar, M.; Mardones, D.; Bayo, A.
2018-07-01
We consider the problem of analyzing the structure of spectroscopic cubes using unsupervised machine learning techniques. We propose representing the target's signal as a homogeneous set of volumes through an iterative algorithm that separates the structured emission from the background while not overestimating the flux. Besides verifying some basic theoretical properties, the algorithm is designed to be tuned by domain experts, because its parameters have meaningful values in the astronomical context. Nevertheless, we propose a heuristic to automatically estimate the signal-to-noise ratio parameter of the algorithm directly from data. The resulting light-weighted set of samples (≤ 1% compared to the original data) offer several advantages. For instance, it is statistically correct and computationally inexpensive to apply well-established techniques of the pattern recognition and machine learning domains; such as clustering and dimensionality reduction algorithms. We use ALMA science verification data to validate our method, and present examples of the operations that can be performed by using the proposed representation. Even though this approach is focused on providing faster and better analysis tools for the end-user astronomer, it also opens the possibility of content-aware data discovery by applying our algorithm to big data.
NASA Astrophysics Data System (ADS)
Gazis, A.; Katsiri, E.
2017-09-01
This paper presents a Wireless Sensor Network (WSN) system which was created as a project about protecting wildlife using sensor networks following the assistance of the department of Electrical and Computer Engineering of the Democritus University of Thrace. An automated process was implemented, regarding the recognition of a passenger (ie human, wolf, bear, etc.) traversing a box-shaped underground passage, such as the ones located along main highways fusing Width, Height and Weight values. These were measured using low-cost distance (beam) and weight (S-type load) micro-sensors and stored in a central repository. Moreover, the information provided by the WSN was analyzed, via a variety of methods including a neural pattern recognition network as well as clustering algorithms, which were able to recognize the kind of passenger, with certainty scores over 90%. The main concern, regarding the future, is the evaluation of these passages in respect to their effectiveness, i.e. whether they are frequently utilized by animals. This information was further analysed by appropriate information systems, in order to provide insights about the effectiveness of such mitigation structures.
Design method of ARM based embedded iris recognition system
NASA Astrophysics Data System (ADS)
Wang, Yuanbo; He, Yuqing; Hou, Yushi; Liu, Ting
2008-03-01
With the advantages of non-invasiveness, uniqueness, stability and low false recognition rate, iris recognition has been successfully applied in many fields. Up to now, most of the iris recognition systems are based on PC. However, a PC is not portable and it needs more power. In this paper, we proposed an embedded iris recognition system based on ARM. Considering the requirements of iris image acquisition and recognition algorithm, we analyzed the design method of the iris image acquisition module, designed the ARM processing module and its peripherals, studied the Linux platform and the recognition algorithm based on this platform, finally actualized the design method of ARM-based iris imaging and recognition system. Experimental results show that the ARM platform we used is fast enough to run the iris recognition algorithm, and the data stream can flow smoothly between the camera and the ARM chip based on the embedded Linux system. It's an effective method of using ARM to actualize portable embedded iris recognition system.
Lee, Jong-Seok; Park, Cheol Hoon
2010-08-01
We propose a novel stochastic optimization algorithm, hybrid simulated annealing (SA), to train hidden Markov models (HMMs) for visual speech recognition. In our algorithm, SA is combined with a local optimization operator that substitutes a better solution for the current one to improve the convergence speed and the quality of solutions. We mathematically prove that the sequence of the objective values converges in probability to the global optimum in the algorithm. The algorithm is applied to train HMMs that are used as visual speech recognizers. While the popular training method of HMMs, the expectation-maximization algorithm, achieves only local optima in the parameter space, the proposed method can perform global optimization of the parameters of HMMs and thereby obtain solutions yielding improved recognition performance. The superiority of the proposed algorithm to the conventional ones is demonstrated via isolated word recognition experiments.
A hybrid monkey search algorithm for clustering analysis.
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
Clustering for Binary Data Sets by Using Genetic Algorithm-Incremental K-means
NASA Astrophysics Data System (ADS)
Saharan, S.; Baragona, R.; Nor, M. E.; Salleh, R. M.; Asrah, N. M.
2018-04-01
This research was initially driven by the lack of clustering algorithms that specifically focus in binary data. To overcome this gap in knowledge, a promising technique for analysing this type of data became the main subject in this research, namely Genetic Algorithms (GA). For the purpose of this research, GA was combined with the Incremental K-means (IKM) algorithm to cluster the binary data streams. In GAIKM, the objective function was based on a few sufficient statistics that may be easily and quickly calculated on binary numbers. The implementation of IKM will give an advantage in terms of fast convergence. The results show that GAIKM is an efficient and effective new clustering algorithm compared to the clustering algorithms and to the IKM itself. In conclusion, the GAIKM outperformed other clustering algorithms such as GCUK, IKM, Scalable K-means (SKM) and K-means clustering and paves the way for future research involving missing data and outliers.
NASA Technical Reports Server (NTRS)
Knasel, T. Michael
1996-01-01
The primary goal of the Adaptive Vision Laboratory Research project was to develop advanced computer vision systems for automatic target recognition. The approach used in this effort combined several machine learning paradigms including evolutionary learning algorithms, neural networks, and adaptive clustering techniques to develop the E-MOR.PH system. This system is capable of generating pattern recognition systems to solve a wide variety of complex recognition tasks. A series of simulation experiments were conducted using E-MORPH to solve problems in OCR, military target recognition, industrial inspection, and medical image analysis. The bulk of the funds provided through this grant were used to purchase computer hardware and software to support these computationally intensive simulations. The payoff from this effort is the reduced need for human involvement in the design and implementation of recognition systems. We have shown that the techniques used in E-MORPH are generic and readily transition to other problem domains. Specifically, E-MORPH is multi-phase evolutionary leaming system that evolves cooperative sets of features detectors and combines their response using an adaptive classifier to form a complete pattern recognition system. The system can operate on binary or grayscale images. In our most recent experiments, we used multi-resolution images that are formed by applying a Gabor wavelet transform to a set of grayscale input images. To begin the leaming process, candidate chips are extracted from the multi-resolution images to form a training set and a test set. A population of detector sets is randomly initialized to start the evolutionary process. Using a combination of evolutionary programming and genetic algorithms, the feature detectors are enhanced to solve a recognition problem. The design of E-MORPH and recognition results for a complex problem in medical image analysis are described at the end of this report. The specific task involves the identification of vertebrae in x-ray images of human spinal columns. This problem is extremely challenging because the individual vertebra exhibit variation in shape, scale, orientation, and contrast. E-MORPH generated several accurate recognition systems to solve this task. This dual use of this ATR technology clearly demonstrates the flexibility and power of our approach.
NASA Astrophysics Data System (ADS)
Yi, Juan; Du, Qingyu; Zhang, Hong jiang; Zhang, Yao lei
2017-11-01
Target recognition is a leading key technology in intelligent image processing and application development at present, with the enhancement of computer processing ability, autonomous target recognition algorithm, gradually improve intelligence, and showed good adaptability. Taking the airport target as the research object, analysis the airport layout characteristics, construction of knowledge model, Gabor filter and Radon transform based on the target recognition algorithm of independent design, image processing and feature extraction of the airport, the algorithm was verified, and achieved better recognition results.
Canonical PSO Based K-Means Clustering Approach for Real Datasets
Dey, Lopamudra; Chakraborty, Sanjay
2014-01-01
“Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms. PMID:27355083
A method of operation scheduling based on video transcoding for cluster equipment
NASA Astrophysics Data System (ADS)
Zhou, Haojie; Yan, Chun
2018-04-01
Because of the cluster technology in real-time video transcoding device, the application of facing the massive growth in the number of video assignments and resolution and bit rate of diversity, task scheduling algorithm, and analyze the current mainstream of cluster for real-time video transcoding equipment characteristics of the cluster, combination with the characteristics of the cluster equipment task delay scheduling algorithm is proposed. This algorithm enables the cluster to get better performance in the generation of the job queue and the lower part of the job queue when receiving the operation instruction. In the end, a small real-time video transcode cluster is constructed to analyze the calculation ability, running time, resource occupation and other aspects of various algorithms in operation scheduling. The experimental results show that compared with traditional clustering task scheduling algorithm, task delay scheduling algorithm has more flexible and efficient characteristics.
[Cluster analysis in biomedical researches].
Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D
2013-01-01
Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.
Data depth based clustering analysis
Jeong, Myeong -Hun; Cai, Yaping; Sullivan, Clair J.; ...
2016-01-01
Here, this paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with different parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also significantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, themore » proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are affine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of affine invariance, and exceeds or matches the ro-bustness to noises of DBSCAN or HDBSCAN. The robust-ness to parameter selection is also demonstrated through the case study of clustering twitter data.« less
Clustering analysis of moving target signatures
NASA Astrophysics Data System (ADS)
Martone, Anthony; Ranney, Kenneth; Innocenti, Roberto
2010-04-01
Previously, we developed a moving target indication (MTI) processing approach to detect and track slow-moving targets inside buildings, which successfully detected moving targets (MTs) from data collected by a low-frequency, ultra-wideband radar. Our MTI algorithms include change detection, automatic target detection (ATD), clustering, and tracking. The MTI algorithms can be implemented in a real-time or near-real-time system; however, a person-in-the-loop is needed to select input parameters for the clustering algorithm. Specifically, the number of clusters to input into the cluster algorithm is unknown and requires manual selection. A critical need exists to automate all aspects of the MTI processing formulation. In this paper, we investigate two techniques that automatically determine the number of clusters: the adaptive knee-point (KP) algorithm and the recursive pixel finding (RPF) algorithm. The KP algorithm is based on a well-known heuristic approach for determining the number of clusters. The RPF algorithm is analogous to the image processing, pixel labeling procedure. Both algorithms are used to analyze the false alarm and detection rates of three operational scenarios of personnel walking inside wood and cinderblock buildings.
Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm
NASA Astrophysics Data System (ADS)
Umam, Khoirul; Bustamam, Alhadi; Lestari, Dian
2017-03-01
DNA is one of the carrier of genetic information of living organisms. Encoding, sequencing, and clustering DNA sequences has become the key jobs and routine in the world of molecular biology, in particular on bioinformatics application. There are two type of clustering, hierarchical clustering and partitioning clustering. In this paper, we combined two type clustering i.e. K-Means (partitioning clustering) and DIANA (hierarchical clustering), therefore it called Hybrid clustering. Application of hybrid clustering using Parallel K-Means algorithm and DIANA algorithm used to clustering DNA sequences of Human Papillomavirus (HPV). The clustering process is started with Collecting DNA sequences of HPV are obtained from NCBI (National Centre for Biotechnology Information), then performing characteristics extraction of DNA sequences. The characteristics extraction result is store in a matrix form, then normalize this matrix using Min-Max normalization and calculate genetic distance using Euclidian Distance. Furthermore, the hybrid clustering is applied by using implementation of Parallel K-Means algorithm and DIANA algorithm. The aim of using Hybrid Clustering is to obtain better clusters result. For validating the resulted clusters, to get optimum number of clusters, we use Davies-Bouldin Index (DBI). In this study, the result of implementation of Parallel K-Means clustering is data clustered become 5 clusters with minimal IDB value is 0.8741, and Hybrid Clustering clustered data become 13 sub-clusters with minimal IDB values = 0.8216, 0.6845, 0.3331, 0.1994 and 0.3952. The IDB value of hybrid clustering less than IBD value of Parallel K-Means clustering only that perform at 1ts stage. Its means clustering using Hybrid Clustering have the better result to clustered DNA sequence of HPV than perform parallel K-Means Clustering only.
Visual cluster analysis and pattern recognition methods
Osbourn, Gordon Cecil; Martinez, Rubel Francisco
2001-01-01
A method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.
Speech recognition for embedded automatic positioner for laparoscope
NASA Astrophysics Data System (ADS)
Chen, Xiaodong; Yin, Qingyun; Wang, Yi; Yu, Daoyin
2014-07-01
In this paper a novel speech recognition methodology based on Hidden Markov Model (HMM) is proposed for embedded Automatic Positioner for Laparoscope (APL), which includes a fixed point ARM processor as the core. The APL system is designed to assist the doctor in laparoscopic surgery, by implementing the specific doctor's vocal control to the laparoscope. Real-time respond to the voice commands asks for more efficient speech recognition algorithm for the APL. In order to reduce computation cost without significant loss in recognition accuracy, both arithmetic and algorithmic optimizations are applied in the method presented. First, depending on arithmetic optimizations most, a fixed point frontend for speech feature analysis is built according to the ARM processor's character. Then the fast likelihood computation algorithm is used to reduce computational complexity of the HMM-based recognition algorithm. The experimental results show that, the method shortens the recognition time within 0.5s, while the accuracy higher than 99%, demonstrating its ability to achieve real-time vocal control to the APL.
Yoo, Sung-Hoon; Oh, Sung-Kwun; Pedrycz, Witold
2015-09-01
In this study, we propose a hybrid method of face recognition by using face region information extracted from the detected face region. In the preprocessing part, we develop a hybrid approach based on the Active Shape Model (ASM) and the Principal Component Analysis (PCA) algorithm. At this step, we use a CCD (Charge Coupled Device) camera to acquire a facial image by using AdaBoost and then Histogram Equalization (HE) is employed to improve the quality of the image. ASM extracts the face contour and image shape to produce a personal profile. Then we use a PCA method to reduce dimensionality of face images. In the recognition part, we consider the improved Radial Basis Function Neural Networks (RBF NNs) to identify a unique pattern associated with each person. The proposed RBF NN architecture consists of three functional modules realizing the condition phase, the conclusion phase, and the inference phase completed with the help of fuzzy rules coming in the standard 'if-then' format. In the formation of the condition part of the fuzzy rules, the input space is partitioned with the use of Fuzzy C-Means (FCM) clustering. In the conclusion part of the fuzzy rules, the connections (weights) of the RBF NNs are represented by four kinds of polynomials such as constant, linear, quadratic, and reduced quadratic. The values of the coefficients are determined by running a gradient descent method. The output of the RBF NNs model is obtained by running a fuzzy inference method. The essential design parameters of the network (including learning rate, momentum coefficient and fuzzification coefficient used by the FCM) are optimized by means of Differential Evolution (DE). The proposed P-RBF NNs (Polynomial based RBF NNs) are applied to facial recognition and its performance is quantified from the viewpoint of the output performance and recognition rate. Copyright © 2015 Elsevier Ltd. All rights reserved.
Automated Recognition of 3D Features in GPIR Images
NASA Technical Reports Server (NTRS)
Park, Han; Stough, Timothy; Fijany, Amir
2007-01-01
A method of automated recognition of three-dimensional (3D) features in images generated by ground-penetrating imaging radar (GPIR) is undergoing development. GPIR 3D images can be analyzed to detect and identify such subsurface features as pipes and other utility conduits. Until now, much of the analysis of GPIR images has been performed manually by expert operators who must visually identify and track each feature. The present method is intended to satisfy a need for more efficient and accurate analysis by means of algorithms that can automatically identify and track subsurface features, with minimal supervision by human operators. In this method, data from multiple sources (for example, data on different features extracted by different algorithms) are fused together for identifying subsurface objects. The algorithms of this method can be classified in several different ways. In one classification, the algorithms fall into three classes: (1) image-processing algorithms, (2) feature- extraction algorithms, and (3) a multiaxis data-fusion/pattern-recognition algorithm that includes a combination of machine-learning, pattern-recognition, and object-linking algorithms. The image-processing class includes preprocessing algorithms for reducing noise and enhancing target features for pattern recognition. The feature-extraction algorithms operate on preprocessed data to extract such specific features in images as two-dimensional (2D) slices of a pipe. Then the multiaxis data-fusion/ pattern-recognition algorithm identifies, classifies, and reconstructs 3D objects from the extracted features. In this process, multiple 2D features extracted by use of different algorithms and representing views along different directions are used to identify and reconstruct 3D objects. In object linking, which is an essential part of this process, features identified in successive 2D slices and located within a threshold radius of identical features in adjacent slices are linked in a directed-graph data structure. Relative to past approaches, this multiaxis approach offers the advantages of more reliable detections, better discrimination of objects, and provision of redundant information, which can be helpful in filling gaps in feature recognition by one of the component algorithms. The image-processing class also includes postprocessing algorithms that enhance identified features to prepare them for further scrutiny by human analysts (see figure). Enhancement of images as a postprocessing step is a significant departure from traditional practice, in which enhancement of images is a preprocessing step.
Clustering algorithm for determining community structure in large networks
NASA Astrophysics Data System (ADS)
Pujol, Josep M.; Béjar, Javier; Delgado, Jordi
2006-07-01
We propose an algorithm to find the community structure in complex networks based on the combination of spectral analysis and modularity optimization. The clustering produced by our algorithm is as accurate as the best algorithms on the literature of modularity optimization; however, the main asset of the algorithm is its efficiency. The best match for our algorithm is Newman’s fast algorithm, which is the reference algorithm for clustering in large networks due to its efficiency. When both algorithms are compared, our algorithm outperforms the fast algorithm both in efficiency and accuracy of the clustering, in terms of modularity. Thus, the results suggest that the proposed algorithm is a good choice to analyze the community structure of medium and large networks in the range of tens and hundreds of thousand vertices.
Tracking and recognition face in videos with incremental local sparse representation model
NASA Astrophysics Data System (ADS)
Wang, Chao; Wang, Yunhong; Zhang, Zhaoxiang
2013-10-01
This paper addresses the problem of tracking and recognizing faces via incremental local sparse representation. First a robust face tracking algorithm is proposed via employing local sparse appearance and covariance pooling method. In the following face recognition stage, with the employment of a novel template update strategy, which combines incremental subspace learning, our recognition algorithm adapts the template to appearance changes and reduces the influence of occlusion and illumination variation. This leads to a robust video-based face tracking and recognition with desirable performance. In the experiments, we test the quality of face recognition in real-world noisy videos on YouTube database, which includes 47 celebrities. Our proposed method produces a high face recognition rate at 95% of all videos. The proposed face tracking and recognition algorithms are also tested on a set of noisy videos under heavy occlusion and illumination variation. The tracking results on challenging benchmark videos demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods. In the case of the challenging dataset in which faces undergo occlusion and illumination variation, and tracking and recognition experiments under significant pose variation on the University of California, San Diego (Honda/UCSD) database, our proposed method also consistently demonstrates a high recognition rate.
Codebook-based electrooculography data analysis towards cognitive activity recognition.
Lagodzinski, P; Shirahama, K; Grzegorzek, M
2018-04-01
With the advancement in mobile/wearable technology, people started to use a variety of sensing devices to track their daily activities as well as health and fitness conditions in order to improve the quality of life. This work addresses an idea of eye movement analysis, which due to the strong correlation with cognitive tasks can be successfully utilized in activity recognition. Eye movements are recorded using an electrooculographic (EOG) system built into the frames of glasses, which can be worn more unobtrusively and comfortably than other devices. Since the obtained information is low-level sensor data expressed as a sequence representing values in constant intervals (100 Hz), the cognitive activity recognition problem is formulated as sequence classification. However, it is unclear what kind of features are useful for accurate cognitive activity recognition. Thus, a machine learning algorithm like a codebook approach is applied, which instead of focusing on feature engineering is using a distribution of characteristic subsequences (codewords) to describe sequences of recorded EOG data, where the codewords are obtained by clustering a large number of subsequences. Further, statistical analysis of the codeword distribution results in discovering features which are characteristic to a certain activity class. Experimental results demonstrate good accuracy of the codebook-based cognitive activity recognition reflecting the effective usage of the codewords. Copyright © 2017 Elsevier Ltd. All rights reserved.
Image processing for x-ray inspection of pistachio nuts
NASA Astrophysics Data System (ADS)
Casasent, David P.
2001-03-01
A review is provided of image processing techniques that have been applied to the inspection of pistachio nuts using X-ray images. X-ray sensors provide non-destructive internal product detail not available from other sensors. The primary concern in this data is detecting the presence of worm infestations in nuts, since they have been linked to the presence of aflatoxin. We describe new techniques for segmentation, feature selection, selection of product categories (clusters), classifier design, etc. Specific novel results include: a new segmentation algorithm to produce images of isolated product items; preferable classifier operation (the classifier with the best probability of correct recognition Pc is not best); higher-order discrimination information is present in standard features (thus, high-order features appear useful); classifiers that use new cluster categories of samples achieve improved performance. Results are presented for X-ray images of pistachio nuts; however, all techniques have use in other product inspection applications.
Face sketch recognition based on edge enhancement via deep learning
NASA Astrophysics Data System (ADS)
Xie, Zhenzhu; Yang, Fumeng; Zhang, Yuming; Wu, Congzhong
2017-11-01
In this paper,we address the face sketch recognition problem. Firstly, we utilize the eigenface algorithm to convert a sketch image into a synthesized sketch face image. Subsequently, considering the low-level vision problem in synthesized face sketch image .Super resolution reconstruction algorithm based on CNN(convolutional neural network) is employed to improve the visual effect. To be specific, we uses a lightweight super-resolution structure to learn a residual mapping instead of directly mapping the feature maps from the low-level space to high-level patch representations, which making the networks are easier to optimize and have lower computational complexity. Finally, we adopt LDA(Linear Discriminant Analysis) algorithm to realize face sketch recognition on synthesized face image before super resolution and after respectively. Extensive experiments on the face sketch database(CUFS) from CUHK demonstrate that the recognition rate of SVM(Support Vector Machine) algorithm improves from 65% to 69% and the recognition rate of LDA(Linear Discriminant Analysis) algorithm improves from 69% to 75%.What'more,the synthesized face image after super resolution can not only better describer image details such as hair ,nose and mouth etc, but also improve the recognition accuracy effectively.
Cognitive object recognition system (CORS)
NASA Astrophysics Data System (ADS)
Raju, Chaitanya; Varadarajan, Karthik Mahesh; Krishnamurthi, Niyant; Xu, Shuli; Biederman, Irving; Kelley, Troy
2010-04-01
We have developed a framework, Cognitive Object Recognition System (CORS), inspired by current neurocomputational models and psychophysical research in which multiple recognition algorithms (shape based geometric primitives, 'geons,' and non-geometric feature-based algorithms) are integrated to provide a comprehensive solution to object recognition and landmarking. Objects are defined as a combination of geons, corresponding to their simple parts, and the relations among the parts. However, those objects that are not easily decomposable into geons, such as bushes and trees, are recognized by CORS using "feature-based" algorithms. The unique interaction between these algorithms is a novel approach that combines the effectiveness of both algorithms and takes us closer to a generalized approach to object recognition. CORS allows recognition of objects through a larger range of poses using geometric primitives and performs well under heavy occlusion - about 35% of object surface is sufficient. Furthermore, geon composition of an object allows image understanding and reasoning even with novel objects. With reliable landmarking capability, the system improves vision-based robot navigation in GPS-denied environments. Feasibility of the CORS system was demonstrated with real stereo images captured from a Pioneer robot. The system can currently identify doors, door handles, staircases, trashcans and other relevant landmarks in the indoor environment.
Face recognition algorithm using extended vector quantization histogram features.
Yan, Yan; Lee, Feifei; Wu, Xueqian; Chen, Qiu
2018-01-01
In this paper, we propose a face recognition algorithm based on a combination of vector quantization (VQ) and Markov stationary features (MSF). The VQ algorithm has been shown to be an effective method for generating features; it extracts a codevector histogram as a facial feature representation for face recognition. Still, the VQ histogram features are unable to convey spatial structural information, which to some extent limits their usefulness in discrimination. To alleviate this limitation of VQ histograms, we utilize Markov stationary features (MSF) to extend the VQ histogram-based features so as to add spatial structural information. We demonstrate the effectiveness of our proposed algorithm by achieving recognition results superior to those of several state-of-the-art methods on publicly available face databases.
A Random Forest-based ensemble method for activity recognition.
Feng, Zengtao; Mo, Lingfei; Li, Meng
2015-01-01
This paper presents a multi-sensor ensemble approach to human physical activity (PA) recognition, using random forest. We designed an ensemble learning algorithm, which integrates several independent Random Forest classifiers based on different sensor feature sets to build a more stable, more accurate and faster classifier for human activity recognition. To evaluate the algorithm, PA data collected from the PAMAP (Physical Activity Monitoring for Aging People), which is a standard, publicly available database, was utilized to train and test. The experimental results show that the algorithm is able to correctly recognize 19 PA types with an accuracy of 93.44%, while the training is faster than others. The ensemble classifier system based on the RF (Random Forest) algorithm can achieve high recognition accuracy and fast calculation.
Visual cluster analysis and pattern recognition template and methods
Osbourn, Gordon Cecil; Martinez, Rubel Francisco
1999-01-01
A method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.
The fast iris image clarity evaluation based on Tenengrad and ROI selection
NASA Astrophysics Data System (ADS)
Gao, Shuqin; Han, Min; Cheng, Xu
2018-04-01
In iris recognition system, the clarity of iris image is an important factor that influences recognition effect. In the process of recognition, the blurred image may possibly be rejected by the automatic iris recognition system, which will lead to the failure of identification. Therefore it is necessary to evaluate the iris image definition before recognition. Considered the existing evaluation methods on iris image definition, we proposed a fast algorithm to evaluate the definition of iris image in this paper. In our algorithm, firstly ROI (Region of Interest) is extracted based on the reference point which is determined by using the feature of the light spots within the pupil, then Tenengrad operator is used to evaluate the iris image's definition. Experiment results show that, the iris image definition algorithm proposed in this paper could accurately distinguish the iris images of different clarity, and the algorithm has the merit of low computational complexity and more effectiveness.
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Johnson, J. K.
1979-01-01
An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets
Wernisch, Lorenz
2017-01-01
Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. PMID:29036190
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.
Gabasova, Evelina; Reid, John; Wernisch, Lorenz
2017-10-01
Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.
Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin
2017-08-31
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks
Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin
2017-01-01
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models
NASA Technical Reports Server (NTRS)
Mjoisness, Eric; Castano, Rebecca; Gray, Alexander
1999-01-01
We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
Fast detection of the fuzzy communities based on leader-driven algorithm
NASA Astrophysics Data System (ADS)
Fang, Changjian; Mu, Dejun; Deng, Zhenghong; Hu, Jun; Yi, Chen-He
2018-03-01
In this paper, we present the leader-driven algorithm (LDA) for learning community structure in networks. The algorithm allows one to find overlapping clusters in a network, an important aspect of real networks, especially social networks. The algorithm requires no input parameters and learns the number of clusters naturally from the network. It accomplishes this using leadership centrality in a clever manner. It identifies local minima of leadership centrality as followers which belong only to one cluster, and the remaining nodes are leaders which connect clusters. In this way, the number of clusters can be learned using only the network structure. The LDA is also an extremely fast algorithm, having runtime linear in the network size. Thus, this algorithm can be used to efficiently cluster extremely large networks.
Optimal pattern synthesis for speech recognition based on principal component analysis
NASA Astrophysics Data System (ADS)
Korsun, O. N.; Poliyev, A. V.
2018-02-01
The algorithm for building an optimal pattern for the purpose of automatic speech recognition, which increases the probability of correct recognition, is developed and presented in this work. The optimal pattern forming is based on the decomposition of an initial pattern to principal components, which enables to reduce the dimension of multi-parameter optimization problem. At the next step the training samples are introduced and the optimal estimates for principal components decomposition coefficients are obtained by a numeric parameter optimization algorithm. Finally, we consider the experiment results that show the improvement in speech recognition introduced by the proposed optimization algorithm.
Application of diffusion maps to identify human factors of self-reported anomalies in aviation.
Andrzejczak, Chris; Karwowski, Waldemar; Mikusinski, Piotr
2012-01-01
A study investigating what factors are present leading to pilots submitting voluntary anomaly reports regarding their flight performance was conducted. Diffusion Maps (DM) were selected as the method of choice for performing dimensionality reduction on text records for this study. Diffusion Maps have seen successful use in other domains such as image classification and pattern recognition. High-dimensionality data in the form of narrative text reports from the NASA Aviation Safety Reporting System (ASRS) were clustered and categorized by way of dimensionality reduction. Supervised analyses were performed to create a baseline document clustering system. Dimensionality reduction techniques identified concepts or keywords within records, and allowed the creation of a framework for an unsupervised document classification system. Results from the unsupervised clustering algorithm performed similarly to the supervised methods outlined in the study. The dimensionality reduction was performed on 100 of the most commonly occurring words within 126,000 text records describing commercial aviation incidents. This study demonstrates that unsupervised machine clustering and organization of incident reports is possible based on unbiased inputs. Findings from this study reinforced traditional views on what factors contribute to civil aviation anomalies, however, new associations between previously unrelated factors and conditions were also found.
Wavelet analysis of particle density functions in nucleus-nucleus interactions
NASA Astrophysics Data System (ADS)
Manna, S. K.; Haldar, P. K.; Mali, P.; Mukhopadhyay, A.; Singh, G.
A continuous wavelet analysis is performed for pattern recognition of the pseudorapidity density profile of singly charged particles produced in 16O+Ag/Br and 32S+Ag/Br interactions, each at an incident energy of 200 GeV per nucleon in the laboratory system. The experiments are compared with a model prediction based on the ultra-relativistic quantum molecular dynamics (UrQMD). To eliminate the contribution coming from known source(s) of particle cluster formation like Bose-Einstein correlation (BEC), the UrQMD output is modified by “an algorithm that mimics the BEC as an after burner.” We observe that for both interactions particle clusters are found at same pseudorapidity locations at all scales. However, the cluster locations in the 16O+Ag/Br interaction are different from those found in the 32S+Ag/Br interaction. Significant differences between experiments and simulations are revealed in the wavelet pseudorapidity spectra that can be interpreted as the preferred pseudorapidity values and/or scales of the pseudorapidity interval at which clusters of particles are formed. The observed discrepancy between experiment and corresponding simulation should therefore be interpreted in terms of some kind of nontrivial dynamics of multiparticle production.
Research on retailer data clustering algorithm based on Spark
NASA Astrophysics Data System (ADS)
Huang, Qiuman; Zhou, Feng
2017-03-01
Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm
Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong
2016-01-01
In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis. PMID:27959895
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm.
Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong
2016-01-01
In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.
Multi-frame knowledge based text enhancement for mobile phone captured videos
NASA Astrophysics Data System (ADS)
Ozarslan, Suleyman; Eren, P. Erhan
2014-02-01
In this study, we explore automated text recognition and enhancement using mobile phone captured videos of store receipts. We propose a method which includes Optical Character Resolution (OCR) enhanced by our proposed Row Based Multiple Frame Integration (RB-MFI), and Knowledge Based Correction (KBC) algorithms. In this method, first, the trained OCR engine is used for recognition; then, the RB-MFI algorithm is applied to the output of the OCR. The RB-MFI algorithm determines and combines the most accurate rows of the text outputs extracted by using OCR from multiple frames of the video. After RB-MFI, KBC algorithm is applied to these rows to correct erroneous characters. Results of the experiments show that the proposed video-based approach which includes the RB-MFI and the KBC algorithm increases the word character recognition rate to 95%, and the character recognition rate to 98%.
GDPC: Gravitation-based Density Peaks Clustering algorithm
NASA Astrophysics Data System (ADS)
Jiang, Jianhua; Hao, Dehao; Chen, Yujun; Parmar, Milan; Li, Keqin
2018-07-01
The Density Peaks Clustering algorithm, which we refer to as DPC, is a novel and efficient density-based clustering approach, and it is published in Science in 2014. The DPC has advantages of discovering clusters with varying sizes and varying densities, but has some limitations of detecting the number of clusters and identifying anomalies. We develop an enhanced algorithm with an alternative decision graph based on gravitation theory and nearby distance to identify centroids and anomalies accurately. We apply our method to some UCI and synthetic data sets. We report comparative clustering performances using F-Measure and 2-dimensional vision. We also compare our method to other clustering algorithms, such as K-Means, Affinity Propagation (AP) and DPC. We present F-Measure scores and clustering accuracies of our GDPC algorithm compared to K-Means, AP and DPC on different data sets. We show that the GDPC has the superior performance in its capability of: (1) detecting the number of clusters obviously; (2) aggregating clusters with varying sizes, varying densities efficiently; (3) identifying anomalies accurately.
[A new peak detection algorithm of Raman spectra].
Jiang, Cheng-Zhi; Sun, Qiang; Liu, Ying; Liang, Jing-Qiu; An, Yan; Liu, Bing
2014-01-01
The authors proposed a new Raman peak recognition method named bi-scale correlation algorithm. The algorithm uses the combination of the correlation coefficient and the local signal-to-noise ratio under two scales to achieve Raman peak identification. We compared the performance of the proposed algorithm with that of the traditional continuous wavelet transform method through MATLAB, and then tested the algorithm with real Raman spectra. The results show that the average time for identifying a Raman spectrum is 0.51 s with the algorithm, while it is 0.71 s with the continuous wavelet transform. When the signal-to-noise ratio of Raman peak is greater than or equal to 6 (modern Raman spectrometers feature an excellent signal-to-noise ratio), the recognition accuracy with the algorithm is higher than 99%, while it is less than 84% with the continuous wavelet transform method. The mean and the standard deviations of the peak position identification error of the algorithm are both less than that of the continuous wavelet transform method. Simulation analysis and experimental verification prove that the new algorithm possesses the following advantages: no needs of human intervention, no needs of de-noising and background removal operation, higher recognition speed and higher recognition accuracy. The proposed algorithm is operable in Raman peak identification.
An improved finger-vein recognition algorithm based on template matching
NASA Astrophysics Data System (ADS)
Liu, Yueyue; Di, Si; Jin, Jian; Huang, Daoping
2016-10-01
Finger-vein recognition has became the most popular biometric identify methods. The investigation on the recognition algorithms always is the key point in this field. So far, there are many applicable algorithms have been developed. However, there are still some problems in practice, such as the variance of the finger position which may lead to the image distortion and shifting; during the identification process, some matching parameters determined according to experience may also reduce the adaptability of algorithm. Focus on above mentioned problems, this paper proposes an improved finger-vein recognition algorithm based on template matching. In order to enhance the robustness of the algorithm for the image distortion, the least squares error method is adopted to correct the oblique finger. During the feature extraction, local adaptive threshold method is adopted. As regard as the matching scores, we optimized the translation preferences as well as matching distance between the input images and register images on the basis of Naoto Miura algorithm. Experimental results indicate that the proposed method can improve the robustness effectively under the finger shifting and rotation conditions.
Mining the National Career Assessment Examination Result Using Clustering Algorithm
NASA Astrophysics Data System (ADS)
Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.
2018-03-01
Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.
Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing
Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud
2015-01-01
This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309
NASA Astrophysics Data System (ADS)
Gong, Lina; Xu, Tao; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen
2017-03-01
The traditional microblog recommendation algorithm has the problems of low efficiency and modest effect in the era of big data. In the aim of solving these issues, this paper proposed a mixed recommendation algorithm with user clustering. This paper first introduced the situation of microblog marketing industry. Then, this paper elaborates the user interest modeling process and detailed advertisement recommendation methods. Finally, this paper compared the mixed recommendation algorithm with the traditional classification algorithm and mixed recommendation algorithm without user clustering. The results show that the mixed recommendation algorithm with user clustering has good accuracy and recall rate in the microblog advertisements promotion.
Procedure of Partitioning Data Into Number of Data Sets or Data Group - A Review
NASA Astrophysics Data System (ADS)
Kim, Tai-Hoon
The goal of clustering is to decompose a dataset into similar groups based on a objective function. Some already well established clustering algorithms are there for data clustering. Objective of these data clustering algorithms are to divide the data points of the feature space into a number of groups (or classes) so that a predefined set of criteria are satisfied. The article considers the comparative study about the effectiveness and efficiency of traditional data clustering algorithms. For evaluating the performance of the clustering algorithms, Minkowski score is used here for different data sets.
Android Malware Classification Using K-Means Clustering Algorithm
NASA Astrophysics Data System (ADS)
Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah
2017-08-01
Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.
An extended affinity propagation clustering method based on different data density types.
Zhao, XiuLi; Xu, WeiXiang
2015-01-01
Affinity propagation (AP) algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself.
Visual cluster analysis and pattern recognition template and methods
Osbourn, G.C.; Martinez, R.F.
1999-05-04
A method of clustering using a novel template to define a region of influence is disclosed. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques. 30 figs.
Cooperative inversion of magnetotelluric and seismic data sets
NASA Astrophysics Data System (ADS)
Markovic, M.; Santos, F.
2012-04-01
Cooperative inversion of magnetotelluric and seismic data sets Milenko Markovic,Fernando Monteiro Santos IDL, Faculdade de Ciências da Universidade de Lisboa 1749-016 Lisboa Inversion of single geophysical data has well-known limitations due to the non-linearity of the fields and non-uniqueness of the model. There is growing need, both in academy and industry to use two or more different data sets and thus obtain subsurface property distribution. In our case ,we are dealing with magnetotelluric and seismic data sets. In our approach,we are developing algorithm based on fuzzy-c means clustering technique, for pattern recognition of geophysical data. Separate inversion is performed on every step, information exchanged for model integration. Interrelationships between parameters from different models is not required in analytical form. We are investigating how different number of clusters, affects zonation and spatial distribution of parameters. In our study optimization in fuzzy c-means clustering (for magnetotelluric and seismic data) is compared for two cases, firstly alternating optimization and then hybrid method (alternating optimization+ Quasi-Newton method). Acknowledgment: This work is supported by FCT Portugal
Clustering-Based Ensemble Learning for Activity Recognition in Smart Homes
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-01-01
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks. PMID:25014095
Clustering-based ensemble learning for activity recognition in smart homes.
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-07-10
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.
Scalable Parallel Density-based Clustering and Applications
NASA Astrophysics Data System (ADS)
Patwary, Mostofa Ali
2014-04-01
Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
Energy Aware Clustering Algorithms for Wireless Sensor Networks
NASA Astrophysics Data System (ADS)
Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian
2011-09-01
The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
Removal of impulse noise clusters from color images with local order statistics
NASA Astrophysics Data System (ADS)
Ruchay, Alexey; Kober, Vitaly
2017-09-01
This paper proposes a novel algorithm for restoring images corrupted with clusters of impulse noise. The noise clusters often occur when the probability of impulse noise is very high. The proposed noise removal algorithm consists of detection of bulky impulse noise in three color channels with local order statistics followed by removal of the detected clusters by means of vector median filtering. With the help of computer simulation we show that the proposed algorithm is able to effectively remove clustered impulse noise. The performance of the proposed algorithm is compared in terms of image restoration metrics with that of common successful algorithms.
Chen, Yibing; Ogata, Taiki; Ueyama, Tsuyoshi; Takada, Toshiyuki; Ota, Jun
2018-01-01
Machine vision is playing an increasingly important role in industrial applications, and the automated design of image recognition systems has been a subject of intense research. This study has proposed a system for automatically designing the field-of-view (FOV) of a camera, the illumination strength and the parameters in a recognition algorithm. We formulated the design problem as an optimisation problem and used an experiment based on a hierarchical algorithm to solve it. The evaluation experiments using translucent plastics objects showed that the use of the proposed system resulted in an effective solution with a wide FOV, recognition of all objects and 0.32 mm and 0.4° maximal positional and angular errors when all the RGB (red, green and blue) for illumination and R channel image for recognition were used. Though all the RGB illumination and grey scale images also provided recognition of all the objects, only a narrow FOV was selected. Moreover, full recognition was not achieved by using only G illumination and a grey-scale image. The results showed that the proposed method can automatically design the FOV, illumination and parameters in the recognition algorithm and that tuning all the RGB illumination is desirable even when single-channel or grey-scale images are used for recognition. PMID:29786665
Chen, Yibing; Ogata, Taiki; Ueyama, Tsuyoshi; Takada, Toshiyuki; Ota, Jun
2018-05-22
Machine vision is playing an increasingly important role in industrial applications, and the automated design of image recognition systems has been a subject of intense research. This study has proposed a system for automatically designing the field-of-view (FOV) of a camera, the illumination strength and the parameters in a recognition algorithm. We formulated the design problem as an optimisation problem and used an experiment based on a hierarchical algorithm to solve it. The evaluation experiments using translucent plastics objects showed that the use of the proposed system resulted in an effective solution with a wide FOV, recognition of all objects and 0.32 mm and 0.4° maximal positional and angular errors when all the RGB (red, green and blue) for illumination and R channel image for recognition were used. Though all the RGB illumination and grey scale images also provided recognition of all the objects, only a narrow FOV was selected. Moreover, full recognition was not achieved by using only G illumination and a grey-scale image. The results showed that the proposed method can automatically design the FOV, illumination and parameters in the recognition algorithm and that tuning all the RGB illumination is desirable even when single-channel or grey-scale images are used for recognition.
Study of parameters of the nearest neighbour shared algorithm on clustering documents
NASA Astrophysics Data System (ADS)
Mustika Rukmi, Alvida; Budi Utomo, Daryono; Imro’atus Sholikhah, Neni
2018-03-01
Document clustering is one way of automatically managing documents, extracting of document topics and fastly filtering information. Preprocess of clustering documents processed by textmining consists of: keyword extraction using Rapid Automatic Keyphrase Extraction (RAKE) and making the document as concept vector using Latent Semantic Analysis (LSA). Furthermore, the clustering process is done so that the documents with the similarity of the topic are in the same cluster, based on the preprocesing by textmining performed. Shared Nearest Neighbour (SNN) algorithm is a clustering method based on the number of "nearest neighbors" shared. The parameters in the SNN Algorithm consist of: k nearest neighbor documents, ɛ shared nearest neighbor documents and MinT minimum number of similar documents, which can form a cluster. Characteristics The SNN algorithm is based on shared ‘neighbor’ properties. Each cluster is formed by keywords that are shared by the documents. SNN algorithm allows a cluster can be built more than one keyword, if the value of the frequency of appearing keywords in document is also high. Determination of parameter values on SNN algorithm affects document clustering results. The higher parameter value k, will increase the number of neighbor documents from each document, cause similarity of neighboring documents are lower. The accuracy of each cluster is also low. The higher parameter value ε, caused each document catch only neighbor documents that have a high similarity to build a cluster. It also causes more unclassified documents (noise). The higher the MinT parameter value cause the number of clusters will decrease, since the number of similar documents can not form clusters if less than MinT. Parameter in the SNN Algorithm determine performance of clustering result and the amount of noise (unclustered documents ). The Silhouette coeffisient shows almost the same result in many experiments, above 0.9, which means that SNN algorithm works well with different parameter values.
Algorithms of maximum likelihood data clustering with applications
NASA Astrophysics Data System (ADS)
Giada, Lorenzo; Marsili, Matteo
2002-12-01
We address the problem of data clustering by introducing an unsupervised, parameter-free approach based on maximum likelihood principle. Starting from the observation that data sets belonging to the same cluster share a common information, we construct an expression for the likelihood of any possible cluster structure. The likelihood in turn depends only on the Pearson's coefficient of the data. We discuss clustering algorithms that provide a fast and reliable approximation to maximum likelihood configurations. Compared to standard clustering methods, our approach has the advantages that (i) it is parameter free, (ii) the number of clusters need not be fixed in advance and (iii) the interpretation of the results is transparent. In order to test our approach and compare it with standard clustering algorithms, we analyze two very different data sets: time series of financial market returns and gene expression data. We find that different maximization algorithms produce similar cluster structures whereas the outcome of standard algorithms has a much wider variability.
A new clustering algorithm applicable to multispectral and polarimetric SAR images
NASA Technical Reports Server (NTRS)
Wong, Yiu-Fai; Posner, Edward C.
1993-01-01
We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.
Fast parallel algorithms that compute transitive closure of a fuzzy relation
NASA Technical Reports Server (NTRS)
Kreinovich, Vladik YA.
1993-01-01
The notion of a transitive closure of a fuzzy relation is very useful for clustering in pattern recognition, for fuzzy databases, etc. The original algorithm proposed by L. Zadeh (1971) requires the computation time O(n(sup 4)), where n is the number of elements in the relation. In 1974, J. C. Dunn proposed a O(n(sup 2)) algorithm. Since we must compute n(n-1)/2 different values s(a, b) (a not equal to b) that represent the fuzzy relation, and we need at least one computational step to compute each of these values, we cannot compute all of them in less than O(n(sup 2)) steps. So, Dunn's algorithm is in this sense optimal. For small n, it is ok. However, for big n (e.g., for big databases), it is still a lot, so it would be desirable to decrease the computation time (this problem was formulated by J. Bezdek). Since this decrease cannot be done on a sequential computer, the only way to do it is to use a computer with several processors working in parallel. We show that on a parallel computer, transitive closure can be computed in time O((log(sub 2)(n))2).
Locality constrained joint dynamic sparse representation for local matching based face recognition.
Wang, Jianzhong; Yi, Yugen; Zhou, Wei; Shi, Yanjiao; Qi, Miao; Zhang, Ming; Zhang, Baoxue; Kong, Jun
2014-01-01
Recently, Sparse Representation-based Classification (SRC) has attracted a lot of attention for its applications to various tasks, especially in biometric techniques such as face recognition. However, factors such as lighting, expression, pose and disguise variations in face images will decrease the performances of SRC and most other face recognition techniques. In order to overcome these limitations, we propose a robust face recognition method named Locality Constrained Joint Dynamic Sparse Representation-based Classification (LCJDSRC) in this paper. In our method, a face image is first partitioned into several smaller sub-images. Then, these sub-images are sparsely represented using the proposed locality constrained joint dynamic sparse representation algorithm. Finally, the representation results for all sub-images are aggregated to obtain the final recognition result. Compared with other algorithms which process each sub-image of a face image independently, the proposed algorithm regards the local matching-based face recognition as a multi-task learning problem. Thus, the latent relationships among the sub-images from the same face image are taken into account. Meanwhile, the locality information of the data is also considered in our algorithm. We evaluate our algorithm by comparing it with other state-of-the-art approaches. Extensive experiments on four benchmark face databases (ORL, Extended YaleB, AR and LFW) demonstrate the effectiveness of LCJDSRC.
Spatial cluster detection using dynamic programming.
Sverchkov, Yuriy; Jiang, Xia; Cooper, Gregory F
2012-03-25
The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm.
Spatial cluster detection using dynamic programming
2012-01-01
Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm. PMID:22443103
Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions.
Zhu, Lin; Chung, Fu-Lai; Wang, Shitong
2009-06-01
The fuzziness index m has important influence on the clustering result of fuzzy clustering algorithms, and it should not be forced to fix at the usual value m = 2. In view of its distinctive features in applications and its limitation in having m = 2 only, a recent advance of fuzzy clustering called fuzzy c-means clustering with improved fuzzy partitions (IFP-FCM) is extended in this paper, and a generalized algorithm called GIFP-FCM for more effective clustering is proposed. By introducing a novel membership constraint function, a new objective function is constructed, and furthermore, GIFP-FCM clustering is derived. Meanwhile, from the viewpoints of L(p) norm distance measure and competitive learning, the robustness and convergence of the proposed algorithm are analyzed. Furthermore, the classical fuzzy c-means algorithm (FCM) and IFP-FCM can be taken as two special cases of the proposed algorithm. Several experimental results including its application to noisy image texture segmentation are presented to demonstrate its average advantage over FCM and IFP-FCM in both clustering and robustness capabilities.
Incremental fuzzy C medoids clustering of time series data using dynamic time warping distance
Chen, Jingli; Wu, Shuai; Liu, Zhizhong; Chao, Hao
2018-01-01
Clustering time series data is of great significance since it could extract meaningful statistics and other characteristics. Especially in biomedical engineering, outstanding clustering algorithms for time series may help improve the health level of people. Considering data scale and time shifts of time series, in this paper, we introduce two incremental fuzzy clustering algorithms based on a Dynamic Time Warping (DTW) distance. For recruiting Single-Pass and Online patterns, our algorithms could handle large-scale time series data by splitting it into a set of chunks which are processed sequentially. Besides, our algorithms select DTW to measure distance of pair-wise time series and encourage higher clustering accuracy because DTW could determine an optimal match between any two time series by stretching or compressing segments of temporal data. Our new algorithms are compared to some existing prominent incremental fuzzy clustering algorithms on 12 benchmark time series datasets. The experimental results show that the proposed approaches could yield high quality clusters and were better than all the competitors in terms of clustering accuracy. PMID:29795600
Incremental fuzzy C medoids clustering of time series data using dynamic time warping distance.
Liu, Yongli; Chen, Jingli; Wu, Shuai; Liu, Zhizhong; Chao, Hao
2018-01-01
Clustering time series data is of great significance since it could extract meaningful statistics and other characteristics. Especially in biomedical engineering, outstanding clustering algorithms for time series may help improve the health level of people. Considering data scale and time shifts of time series, in this paper, we introduce two incremental fuzzy clustering algorithms based on a Dynamic Time Warping (DTW) distance. For recruiting Single-Pass and Online patterns, our algorithms could handle large-scale time series data by splitting it into a set of chunks which are processed sequentially. Besides, our algorithms select DTW to measure distance of pair-wise time series and encourage higher clustering accuracy because DTW could determine an optimal match between any two time series by stretching or compressing segments of temporal data. Our new algorithms are compared to some existing prominent incremental fuzzy clustering algorithms on 12 benchmark time series datasets. The experimental results show that the proposed approaches could yield high quality clusters and were better than all the competitors in terms of clustering accuracy.
Multi-Optimisation Consensus Clustering
NASA Astrophysics Data System (ADS)
Li, Jian; Swift, Stephen; Liu, Xiaohui
Ensemble Clustering has been developed to provide an alternative way of obtaining more stable and accurate clustering results. It aims to avoid the biases of individual clustering algorithms. However, it is still a challenge to develop an efficient and robust method for Ensemble Clustering. Based on an existing ensemble clustering method, Consensus Clustering (CC), this paper introduces an advanced Consensus Clustering algorithm called Multi-Optimisation Consensus Clustering (MOCC), which utilises an optimised Agreement Separation criterion and a Multi-Optimisation framework to improve the performance of CC. Fifteen different data sets are used for evaluating the performance of MOCC. The results reveal that MOCC can generate more accurate clustering results than the original CC algorithm.
Swarm Intelligence in Text Document Clustering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cui, Xiaohui; Potok, Thomas E
2008-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role inmore » helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.« less
Stereo-Based Region-Growing using String Matching
NASA Technical Reports Server (NTRS)
Mandelbaum, Robert; Mintz, Max
1995-01-01
We present a novel stereo algorithm based on a coarse texture segmentation preprocessing phase. Matching is performed using a string comparison. Matching sub-strings correspond to matching sequences of textures. Inter-scanline clustering of matching sub-strings yields regions of matching texture. The shape of these regions yield information concerning object's height, width and azimuthal position relative to the camera pair. Hence, rather than the standard dense depth map, the output of this algorithm is a segmentation of objects in the scene. Such a format is useful for the integration of stereo with other sensor modalities on a mobile robotic platform. It is also useful for localization; the height and width of a detected object may be used for landmark recognition, while depth and relative azimuthal location determine pose. The algorithm does not rely on the monotonicity of order of image primitives. Occlusions, exposures, and foreshortening effects are not problematic. The algorithm can deal with certain types of transparencies. It is computationally efficient, and very amenable to parallel implementation. Further, the epipolar constraints may be relaxed to some small but significant degree. A version of the algorithm has been implemented and tested on various types of images. It performs best on random dot stereograms, on images with easily filtered backgrounds (as in synthetic images), and on real scenes with uncontrived backgrounds.
Twelve automated thresholding methods for segmentation of PET images: a phantom study.
Prieto, Elena; Lecumberri, Pablo; Pagola, Miguel; Gómez, Marisol; Bilbao, Izaskun; Ecay, Margarita; Peñuelas, Iván; Martí-Climent, Josep M
2012-06-21
Tumor volume delineation over positron emission tomography (PET) images is of great interest for proper diagnosis and therapy planning. However, standard segmentation techniques (manual or semi-automated) are operator dependent and time consuming while fully automated procedures are cumbersome or require complex mathematical development. The aim of this study was to segment PET images in a fully automated way by implementing a set of 12 automated thresholding algorithms, classical in the fields of optical character recognition, tissue engineering or non-destructive testing images in high-tech structures. Automated thresholding algorithms select a specific threshold for each image without any a priori spatial information of the segmented object or any special calibration of the tomograph, as opposed to usual thresholding methods for PET. Spherical (18)F-filled objects of different volumes were acquired on clinical PET/CT and on a small animal PET scanner, with three different signal-to-background ratios. Images were segmented with 12 automatic thresholding algorithms and results were compared with the standard segmentation reference, a threshold at 42% of the maximum uptake. Ridler and Ramesh thresholding algorithms based on clustering and histogram-shape information, respectively, provided better results that the classical 42%-based threshold (p < 0.05). We have herein demonstrated that fully automated thresholding algorithms can provide better results than classical PET segmentation tools.
Twelve automated thresholding methods for segmentation of PET images: a phantom study
NASA Astrophysics Data System (ADS)
Prieto, Elena; Lecumberri, Pablo; Pagola, Miguel; Gómez, Marisol; Bilbao, Izaskun; Ecay, Margarita; Peñuelas, Iván; Martí-Climent, Josep M.
2012-06-01
Tumor volume delineation over positron emission tomography (PET) images is of great interest for proper diagnosis and therapy planning. However, standard segmentation techniques (manual or semi-automated) are operator dependent and time consuming while fully automated procedures are cumbersome or require complex mathematical development. The aim of this study was to segment PET images in a fully automated way by implementing a set of 12 automated thresholding algorithms, classical in the fields of optical character recognition, tissue engineering or non-destructive testing images in high-tech structures. Automated thresholding algorithms select a specific threshold for each image without any a priori spatial information of the segmented object or any special calibration of the tomograph, as opposed to usual thresholding methods for PET. Spherical 18F-filled objects of different volumes were acquired on clinical PET/CT and on a small animal PET scanner, with three different signal-to-background ratios. Images were segmented with 12 automatic thresholding algorithms and results were compared with the standard segmentation reference, a threshold at 42% of the maximum uptake. Ridler and Ramesh thresholding algorithms based on clustering and histogram-shape information, respectively, provided better results that the classical 42%-based threshold (p < 0.05). We have herein demonstrated that fully automated thresholding algorithms can provide better results than classical PET segmentation tools.
Novel density-based and hierarchical density-based clustering algorithms for uncertain data.
Zhang, Xianchao; Liu, Han; Zhang, Xiaotong
2017-09-01
Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing algorithms in accuracy and efficiency. Copyright © 2017 Elsevier Ltd. All rights reserved.
Optimal wavelength band clustering for multispectral iris recognition.
Gong, Yazhuo; Zhang, David; Shi, Pengfei; Yan, Jingqi
2012-07-01
This work explores the possibility of clustering spectral wavelengths based on the maximum dissimilarity of iris textures. The eventual goal is to determine how many bands of spectral wavelengths will be enough for iris multispectral fusion and to find these bands that will provide higher performance of iris multispectral recognition. A multispectral acquisition system was first designed for imaging the iris at narrow spectral bands in the range of 420 to 940 nm. Next, a set of 60 human iris images that correspond to the right and left eyes of 30 different subjects were acquired for an analysis. Finally, we determined that 3 clusters were enough to represent the 10 feature bands of spectral wavelengths using the agglomerative clustering based on two-dimensional principal component analysis. The experimental results suggest (1) the number, center, and composition of clusters of spectral wavelengths and (2) the higher performance of iris multispectral recognition based on a three wavelengths-bands fusion.
Gaussian mixture models-based ship target recognition algorithm in remote sensing infrared images
NASA Astrophysics Data System (ADS)
Yao, Shoukui; Qin, Xiaojuan
2018-02-01
Since the resolution of remote sensing infrared images is low, the features of ship targets become unstable. The issue of how to recognize ships with fuzzy features is an open problem. In this paper, we propose a novel ship target recognition algorithm based on Gaussian mixture models (GMMs). In the proposed algorithm, there are mainly two steps. At the first step, the Hu moments of these ship target images are calculated, and the GMMs are trained on the moment features of ships. At the second step, the moment feature of each ship image is assigned to the trained GMMs for recognition. Because of the scale, rotation, translation invariance property of Hu moments and the power feature-space description ability of GMMs, the GMMs-based ship target recognition algorithm can recognize ship reliably. Experimental results of a large simulating image set show that our approach is effective in distinguishing different ship types, and obtains a satisfactory ship recognition performance.
Word recognition using a lexicon constrained by first/last character decisions
NASA Astrophysics Data System (ADS)
Zhao, Sheila X.; Srihari, Sargur N.
1995-03-01
In lexicon based recognition of machine-printed word images, the size of the lexicon can be quite extensive. The recognition performance is closely related to the size of the lexicon. Recognition performance drops quickly when lexicon size increases. Here, we present an algorithm to improve the word recognition performance by reducing the size of the given lexicon. The algorithm utilizes the information provided by the first and last characters of a word to reduce the size of the given lexicon. Given a word image and a lexicon that contains the word in the image, the first and last characters are segmented and then recognized by a character classifier. The possible candidates based on the results given by the classifier are selected, which give us the sub-lexicon. Then a word shape analysis algorithm is applied to produce the final ranking of the given lexicon. The algorithm was tested on a set of machine- printed gray-scale word images which includes a wide range of print types and qualities.
Shishido, Ryunosuke; Kawai, Yuki; Fujii, Asuka
2014-09-04
The essence of the molecular recognition of the neurotransmitter acetylcholine has been attributed to the attractive interaction between a quaternary ammonium and aromatic rings. We employed protonated trimethylamine-(benzene)n clusters (n = 1-4) in the gas phase as a model to study the recognition mechanism of acetylcholine at the microscopic level. We applied size-selective infrared spectroscopy to the clusters and observed the NH and CH stretching vibrational regions. We also performed density functional theory calculations of stable structures, charge distributions, and infrared spectra of the clusters. It was shown that the methyl groups of protonated trimethylamine are solvated by benzene one at a time in the n > 1 clusters, and the validity of these clusters as a model system of the acetylcholine recognition was demonstrated. The nature of the interactions between a quaternary ammonium and aromatic rings is discussed on the basis of the observed infrared spectra and the theoretical calculations.
Vision-based posture recognition using an ensemble classifier and a vote filter
NASA Astrophysics Data System (ADS)
Ji, Peng; Wu, Changcheng; Xu, Xiaonong; Song, Aiguo; Li, Huijun
2016-10-01
Posture recognition is a very important Human-Robot Interaction (HRI) way. To segment effective posture from an image, we propose an improved region grow algorithm which combining with the Single Gauss Color Model. The experiment shows that the improved region grow algorithm can get the complete and accurate posture than traditional Single Gauss Model and region grow algorithm, and it can eliminate the similar region from the background at the same time. In the posture recognition part, and in order to improve the recognition rate, we propose a CNN ensemble classifier, and in order to reduce the misjudgments during a continuous gesture control, a vote filter is proposed and applied to the sequence of recognition results. Comparing with CNN classifier, the CNN ensemble classifier we proposed can yield a 96.27% recognition rate, which is better than that of CNN classifier, and the proposed vote filter can improve the recognition result and reduce the misjudgments during the consecutive gesture switch.
Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization.
Sun, Yanfeng; Gao, Junbin; Hong, Xia; Mishra, Bamdev; Yin, Baocai
2016-03-01
Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.
NASA Astrophysics Data System (ADS)
Yang, Gongping; Zhou, Guang-Tong; Yin, Yilong; Yang, Xiukun
2010-12-01
A critical step in an automatic fingerprint recognition system is the segmentation of fingerprint images. Existing methods are usually designed to segment fingerprint images originated from a certain sensor. Thus their performances are significantly affected when dealing with fingerprints collected by different sensors. This work studies the sensor interoperability of fingerprint segmentation algorithms, which refers to the algorithm's ability to adapt to the raw fingerprints obtained from different sensors. We empirically analyze the sensor interoperability problem, and effectively address the issue by proposing a [InlineEquation not available: see fulltext.]-means based segmentation method called SKI. SKI clusters foreground and background blocks of a fingerprint image based on the [InlineEquation not available: see fulltext.]-means algorithm, where a fingerprint block is represented by a 3-dimensional feature vector consisting of block-wise coherence, mean, and variance (abbreviated as CMV). SKI also employs morphological postprocessing to achieve favorable segmentation results. We perform SKI on each fingerprint to ensure sensor interoperability. The interoperability and robustness of our method are validated by experiments performed on a number of fingerprint databases which are obtained from various sensors.
Anandakrishnan, Ramu; Onufriev, Alexey
2008-03-01
In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.
NASA Astrophysics Data System (ADS)
Costache, G. N.; Gavat, I.
2004-09-01
Along with the aggressive growing of the amount of digital data available (text, audio samples, digital photos and digital movies joined all in the multimedia domain) the need for classification, recognition and retrieval of this kind of data became very important. In this paper will be presented a system structure to handle multimedia data based on a recognition perspective. The main processing steps realized for the interesting multimedia objects are: first, the parameterization, by analysis, in order to obtain a description based on features, forming the parameter vector; second, a classification, generally with a hierarchical structure to make the necessary decisions. For audio signals, both speech and music, the derived perceptual features are the melcepstral (MFCC) and the perceptual linear predictive (PLP) coefficients. For images, the derived features are the geometric parameters of the speaker mouth. The hierarchical classifier consists generally in a clustering stage, based on the Kohonnen Self-Organizing Maps (SOM) and a final stage, based on a powerful classification algorithm called Support Vector Machines (SVM). The system, in specific variants, is applied with good results in two tasks: the first, is a bimodal speech recognition which uses features obtained from speech signal fused to features obtained from speaker's image and the second is a music retrieval from large music database.
Ni, Yepeng; Liu, Jianbo; Liu, Shan; Bai, Yaxin
2016-01-01
With the rapid development of smartphones and wireless networks, indoor location-based services have become more and more prevalent. Due to the sophisticated propagation of radio signals, the Received Signal Strength Indicator (RSSI) shows a significant variation during pedestrian walking, which introduces critical errors in deterministic indoor positioning. To solve this problem, we present a novel method to improve the indoor pedestrian positioning accuracy by embedding a fuzzy pattern recognition algorithm into a Hidden Markov Model. The fuzzy pattern recognition algorithm follows the rule that the RSSI fading has a positive correlation to the distance between the measuring point and the AP location even during a dynamic positioning measurement. Through this algorithm, we use the RSSI variation trend to replace the specific RSSI value to achieve a fuzzy positioning. The transition probability of the Hidden Markov Model is trained by the fuzzy pattern recognition algorithm with pedestrian trajectories. Using the Viterbi algorithm with the trained model, we can obtain a set of hidden location states. In our experiments, we demonstrate that, compared with the deterministic pattern matching algorithm, our method can greatly improve the positioning accuracy and shows robust environmental adaptability. PMID:27618053
NASA Astrophysics Data System (ADS)
Abdul-Nasir, Aimi Salihah; Mashor, Mohd Yusoff; Halim, Nurul Hazwani Abd; Mohamed, Zeehaida
2015-05-01
Malaria is a life-threatening parasitic infectious disease that corresponds for nearly one million deaths each year. Due to the requirement of prompt and accurate diagnosis of malaria, the current study has proposed an unsupervised pixel segmentation based on clustering algorithm in order to obtain the fully segmented red blood cells (RBCs) infected with malaria parasites based on the thin blood smear images of P. vivax species. In order to obtain the segmented infected cell, the malaria images are first enhanced by using modified global contrast stretching technique. Then, an unsupervised segmentation technique based on clustering algorithm has been applied on the intensity component of malaria image in order to segment the infected cell from its blood cells background. In this study, cascaded moving k-means (MKM) and fuzzy c-means (FCM) clustering algorithms has been proposed for malaria slide image segmentation. After that, median filter algorithm has been applied to smooth the image as well as to remove any unwanted regions such as small background pixels from the image. Finally, seeded region growing area extraction algorithm has been applied in order to remove large unwanted regions that are still appeared on the image due to their size in which cannot be cleaned by using median filter. The effectiveness of the proposed cascaded MKM and FCM clustering algorithms has been analyzed qualitatively and quantitatively by comparing the proposed cascaded clustering algorithm with MKM and FCM clustering algorithms. Overall, the results indicate that segmentation using the proposed cascaded clustering algorithm has produced the best segmentation performances by achieving acceptable sensitivity as well as high specificity and accuracy values compared to the segmentation results provided by MKM and FCM algorithms.
Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.
Deb, Suash; Yang, Xin-She
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang
2017-01-01
Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.
Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming.
Wang, Haizhou; Song, Mingzhou
2011-12-01
The heuristic k -means algorithm, widely used for cluster analysis, does not guarantee optimality. We developed a dynamic programming algorithm for optimal one-dimensional clustering. The algorithm is implemented as an R package called Ckmeans.1d.dp . We demonstrate its advantage in optimality and runtime over the standard iterative k -means algorithm.
Inference from clustering with application to gene-expression microarrays.
Dougherty, Edward R; Barrera, Junior; Brun, Marcel; Kim, Seungchan; Cesar, Roberto M; Chen, Yidong; Bittner, Michael; Trent, Jeffrey M
2002-01-01
There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.
Ghane, Narjes; Vard, Alireza; Talebi, Ardeshir; Nematollahy, Pardis
2017-01-01
Recognition of white blood cells (WBCs) is the first step to diagnose some particular diseases such as acquired immune deficiency syndrome, leukemia, and other blood-related diseases that are usually done by pathologists using an optical microscope. This process is time-consuming, extremely tedious, and expensive and needs experienced experts in this field. Thus, a computer-aided diagnosis system that assists pathologists in the diagnostic process can be so effective. Segmentation of WBCs is usually a first step in developing a computer-aided diagnosis system. The main purpose of this paper is to segment WBCs from microscopic images. For this purpose, we present a novel combination of thresholding, k-means clustering, and modified watershed algorithms in three stages including (1) segmentation of WBCs from a microscopic image, (2) extraction of nuclei from cell's image, and (3) separation of overlapping cells and nuclei. The evaluation results of the proposed method show that similarity measures, precision, and sensitivity respectively were 92.07, 96.07, and 94.30% for nucleus segmentation and 92.93, 97.41, and 93.78% for cell segmentation. In addition, statistical analysis presents high similarity between manual segmentation and the results obtained by the proposed method.
Implementation of spectral clustering on microarray data of carcinoma using k-means algorithm
NASA Astrophysics Data System (ADS)
Frisca, Bustamam, Alhadi; Siswantining, Titin
2017-03-01
Clustering is one of data analysis methods that aims to classify data which have similar characteristics in the same group. Spectral clustering is one of the most popular modern clustering algorithms. As an effective clustering technique, spectral clustering method emerged from the concepts of spectral graph theory. Spectral clustering method needs partitioning algorithm. There are some partitioning methods including PAM, SOM, Fuzzy c-means, and k-means. Based on the research that has been done by Capital and Choudhury in 2013, when using Euclidian distance k-means algorithm provide better accuracy than PAM algorithm. So in this paper we use k-means as our partition algorithm. The major advantage of spectral clustering is in reducing data dimension, especially in this case to reduce the dimension of large microarray dataset. Microarray data is a small-sized chip made of a glass plate containing thousands and even tens of thousands kinds of genes in the DNA fragments derived from doubling cDNA. Application of microarray data is widely used to detect cancer, for the example is carcinoma, in which cancer cells express the abnormalities in his genes. The purpose of this research is to classify the data that have high similarity in the same group and the data that have low similarity in the others. In this research, Carcinoma microarray data using 7457 genes. The result of partitioning using k-means algorithm is two clusters.
Using Grey Wolf Algorithm to Solve the Capacitated Vehicle Routing Problem
NASA Astrophysics Data System (ADS)
Korayem, L.; Khorsid, M.; Kassem, S. S.
2015-05-01
The capacitated vehicle routing problem (CVRP) is a class of the vehicle routing problems (VRPs). In CVRP a set of identical vehicles having fixed capacities are required to fulfill customers' demands for a single commodity. The main objective is to minimize the total cost or distance traveled by the vehicles while satisfying a number of constraints, such as: the capacity constraint of each vehicle, logical flow constraints, etc. One of the methods employed in solving the CVRP is the cluster-first route-second method. It is a technique based on grouping of customers into a number of clusters, where each cluster is served by one vehicle. Once clusters are formed, a route determining the best sequence to visit customers is established within each cluster. The recently bio-inspired grey wolf optimizer (GWO), introduced in 2014, has proven to be efficient in solving unconstrained, as well as, constrained optimization problems. In the current research, our main contributions are: combining GWO with the traditional K-means clustering algorithm to generate the ‘K-GWO’ algorithm, deriving a capacitated version of the K-GWO algorithm by incorporating a capacity constraint into the aforementioned algorithm, and finally, developing 2 new clustering heuristics. The resulting algorithm is used in the clustering phase of the cluster-first route-second method to solve the CVR problem. The algorithm is tested on a number of benchmark problems with encouraging results.
Optical character recognition of handwritten Arabic using hidden Markov models
NASA Astrophysics Data System (ADS)
Aulama, Mohannad M.; Natsheh, Asem M.; Abandah, Gheith A.; Olama, Mohammed M.
2011-04-01
The problem of optical character recognition (OCR) of handwritten Arabic has not received a satisfactory solution yet. In this paper, an Arabic OCR algorithm is developed based on Hidden Markov Models (HMMs) combined with the Viterbi algorithm, which results in an improved and more robust recognition of characters at the sub-word level. Integrating the HMMs represents another step of the overall OCR trends being currently researched in the literature. The proposed approach exploits the structure of characters in the Arabic language in addition to their extracted features to achieve improved recognition rates. Useful statistical information of the Arabic language is initially extracted and then used to estimate the probabilistic parameters of the mathematical HMM. A new custom implementation of the HMM is developed in this study, where the transition matrix is built based on the collected large corpus, and the emission matrix is built based on the results obtained via the extracted character features. The recognition process is triggered using the Viterbi algorithm which employs the most probable sequence of sub-words. The model was implemented to recognize the sub-word unit of Arabic text raising the recognition rate from being linked to the worst recognition rate for any character to the overall structure of the Arabic language. Numerical results show that there is a potentially large recognition improvement by using the proposed algorithms.
Optical character recognition of handwritten Arabic using hidden Markov models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aulama, Mohannad M.; Natsheh, Asem M.; Abandah, Gheith A.
2011-01-01
The problem of optical character recognition (OCR) of handwritten Arabic has not received a satisfactory solution yet. In this paper, an Arabic OCR algorithm is developed based on Hidden Markov Models (HMMs) combined with the Viterbi algorithm, which results in an improved and more robust recognition of characters at the sub-word level. Integrating the HMMs represents another step of the overall OCR trends being currently researched in the literature. The proposed approach exploits the structure of characters in the Arabic language in addition to their extracted features to achieve improved recognition rates. Useful statistical information of the Arabic language ismore » initially extracted and then used to estimate the probabilistic parameters of the mathematical HMM. A new custom implementation of the HMM is developed in this study, where the transition matrix is built based on the collected large corpus, and the emission matrix is built based on the results obtained via the extracted character features. The recognition process is triggered using the Viterbi algorithm which employs the most probable sequence of sub-words. The model was implemented to recognize the sub-word unit of Arabic text raising the recognition rate from being linked to the worst recognition rate for any character to the overall structure of the Arabic language. Numerical results show that there is a potentially large recognition improvement by using the proposed algorithms.« less
Chemodynamical Clustering Applied to APOGEE Data: Rediscovering Globular Clusters
NASA Astrophysics Data System (ADS)
Chen, Boquan; D’Onghia, Elena; Pardy, Stephen A.; Pasquali, Anna; Bertelli Motta, Clio; Hanlon, Bret; Grebel, Eva K.
2018-06-01
We have developed a novel technique based on a clustering algorithm that searches for kinematically and chemically clustered stars in the APOGEE DR12 Cannon data. As compared to classical chemical tagging, the kinematic information included in our methodology allows us to identify stars that are members of known globular clusters with greater confidence. We apply our algorithm to the entire APOGEE catalog of 150,615 stars whose chemical abundances are derived by the Cannon. Our methodology found anticorrelations between the elements Al and Mg, Na and O, and C and N previously identified in the optical spectra in globular clusters, even though we omit these elements in our algorithm. Our algorithm identifies globular clusters without a priori knowledge of their locations in the sky. Thus, not only does this technique promise to discover new globular clusters, but it also allows us to identify candidate streams of kinematically and chemically clustered stars in the Milky Way.
Recognizing characters of ancient manuscripts
NASA Astrophysics Data System (ADS)
Diem, Markus; Sablatnig, Robert
2010-02-01
Considering printed Latin text, the main issues of Optical Character Recognition (OCR) systems are solved. However, for degraded handwritten document images, basic preprocessing steps such as binarization, gain poor results with state-of-the-art methods. In this paper ancient Slavonic manuscripts from the 11th century are investigated. In order to minimize the consequences of false character segmentation, a binarization-free approach based on local descriptors is proposed. Additionally local information allows the recognition of partially visible or washed out characters. The proposed algorithm consists of two steps: character classification and character localization. Initially Scale Invariant Feature Transform (SIFT) features are extracted which are subsequently classified using Support Vector Machines (SVM). Afterwards, the interest points are clustered according to their spatial information. Thereby, characters are localized and finally recognized based on a weighted voting scheme of pre-classified local descriptors. Preliminary results show that the proposed system can handle highly degraded manuscript images with background clutter (e.g. stains, tears) and faded out characters.
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
Hyperspectral face recognition with spatiospectral information fusion and PLS regression.
Uzair, Muhammad; Mahmood, Arif; Mian, Ajmal
2015-03-01
Hyperspectral imaging offers new opportunities for face recognition via improved discrimination along the spectral dimension. However, it poses new challenges, including low signal-to-noise ratio, interband misalignment, and high data dimensionality. Due to these challenges, the literature on hyperspectral face recognition is not only sparse but is limited to ad hoc dimensionality reduction techniques and lacks comprehensive evaluation. We propose a hyperspectral face recognition algorithm using a spatiospectral covariance for band fusion and partial least square regression for classification. Moreover, we extend 13 existing face recognition techniques, for the first time, to perform hyperspectral face recognition.We formulate hyperspectral face recognition as an image-set classification problem and evaluate the performance of seven state-of-the-art image-set classification techniques. We also test six state-of-the-art grayscale and RGB (color) face recognition algorithms after applying fusion techniques on hyperspectral images. Comparison with the 13 extended and five existing hyperspectral face recognition techniques on three standard data sets show that the proposed algorithm outperforms all by a significant margin. Finally, we perform band selection experiments to find the most discriminative bands in the visible and near infrared response spectrum.
Lukashin, A V; Fuchs, R
2001-05-01
Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.
Basic firefly algorithm for document clustering
NASA Astrophysics Data System (ADS)
Mohammed, Athraa Jasim; Yusof, Yuhanis; Husni, Husniza
2015-12-01
The Document clustering plays significant role in Information Retrieval (IR) where it organizes documents prior to the retrieval process. To date, various clustering algorithms have been proposed and this includes the K-means and Particle Swarm Optimization. Even though these algorithms have been widely applied in many disciplines due to its simplicity, such an approach tends to be trapped in a local minimum during its search for an optimal solution. To address the shortcoming, this paper proposes a Basic Firefly (Basic FA) algorithm to cluster text documents. The algorithm employs the Average Distance to Document Centroid (ADDC) as the objective function of the search. Experiments utilizing the proposed algorithm were conducted on the 20Newsgroups benchmark dataset. Results demonstrate that the Basic FA generates a more robust and compact clusters than the ones produced by K-means and Particle Swarm Optimization (PSO).
Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra.
Rieder, Vera; Schork, Karin U; Kerschke, Laura; Blank-Landeshammer, Bernhard; Sickmann, Albert; Rahnenführer, Jörg
2017-11-03
In proteomics, liquid chromatography-tandem mass spectrometry (LC-MS/MS) is established for identifying peptides and proteins. Duplicated spectra, that is, multiple spectra of the same peptide, occur both in single MS/MS runs and in large spectral libraries. Clustering tandem mass spectra is used to find consensus spectra, with manifold applications. First, it speeds up database searches, as performed for instance by Mascot. Second, it helps to identify novel peptides across species. Third, it is used for quality control to detect wrongly annotated spectra. We compare different clustering algorithms based on the cosine distance between spectra. CAST, MS-Cluster, and PRIDE Cluster are popular algorithms to cluster tandem mass spectra. We add well-known algorithms for large data sets, hierarchical clustering, DBSCAN, and connected components of a graph, as well as the new method N-Cluster. All algorithms are evaluated on real data with varied parameter settings. Cluster results are compared with each other and with peptide annotations based on validation measures such as purity. Quality control, regarding the detection of wrongly (un)annotated spectra, is discussed for exemplary resulting clusters. N-Cluster proves to be highly competitive. All clustering results benefit from the so-called DISMS2 filter that integrates additional information, for example, on precursor mass.
Weighted graph cuts without eigenvectors a multilevel approach.
Dhillon, Inderjit S; Guan, Yuqiang; Kulis, Brian
2007-11-01
A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral clustering and kernel k-means are two of the main methods. In this paper, we discuss an equivalence between the objective functions used in these seemingly different methods--in particular, a general weighted kernel k-means objective is mathematically equivalent to a weighted graph clustering objective. We exploit this equivalence to develop a fast, high-quality multilevel algorithm that directly optimizes various weighted graph clustering objectives, such as the popular ratio cut, normalized cut, and ratio association criteria. This eliminates the need for any eigenvector computation for graph clustering problems, which can be prohibitive for very large graphs. Previous multilevel graph partitioning methods, such as Metis, have suffered from the restriction of equal-sized clusters; our multilevel algorithm removes this restriction by using kernel k-means to optimize weighted graph cuts. Experimental results show that our multilevel algorithm outperforms a state-of-the-art spectral clustering algorithm in terms of speed, memory usage, and quality. We demonstrate that our algorithm is applicable to large-scale clustering tasks such as image segmentation, social network analysis and gene network analysis.
Model-based clustering for RNA-seq data.
Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P
2014-01-15
RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Two generalizations of Kohonen clustering
NASA Technical Reports Server (NTRS)
Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.
1993-01-01
The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.
Collegial Activity Learning between Heterogeneous Sensors.
Feuz, Kyle D; Cook, Diane J
2017-11-01
Activity recognition algorithms have matured and become more ubiquitous in recent years. However, these algorithms are typically customized for a particular sensor platform. In this paper we introduce PECO, a Personalized activity ECOsystem, that transfers learned activity information seamlessly between sensor platforms in real time so that any available sensor can continue to track activities without requiring its own extensive labeled training data. We introduce a multi-view transfer learning algorithm that facilitates this information handoff between sensor platforms and provide theoretical performance bounds for the algorithm. In addition, we empirically evaluate PECO using datasets that utilize heterogeneous sensor platforms to perform activity recognition. These results indicate that not only can activity recognition algorithms transfer important information to new sensor platforms, but any number of platforms can work together as colleagues to boost performance.
An improved initialization center k-means clustering algorithm based on distance and density
NASA Astrophysics Data System (ADS)
Duan, Yanling; Liu, Qun; Xia, Shuyin
2018-04-01
Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.
Diametrical clustering for identifying anti-correlated gene clusters.
Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman
2003-09-01
Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.
Pulmonary Nodule Recognition Based on Multiple Kernel Learning Support Vector Machine-PSO
Zhu, Zhichuan; Zhao, Qingdong; Liu, Liwei; Zhang, Lijuan
2018-01-01
Pulmonary nodule recognition is the core module of lung CAD. The Support Vector Machine (SVM) algorithm has been widely used in pulmonary nodule recognition, and the algorithm of Multiple Kernel Learning Support Vector Machine (MKL-SVM) has achieved good results therein. Based on grid search, however, the MKL-SVM algorithm needs long optimization time in course of parameter optimization; also its identification accuracy depends on the fineness of grid. In the paper, swarm intelligence is introduced and the Particle Swarm Optimization (PSO) is combined with MKL-SVM algorithm to be MKL-SVM-PSO algorithm so as to realize global optimization of parameters rapidly. In order to obtain the global optimal solution, different inertia weights such as constant inertia weight, linear inertia weight, and nonlinear inertia weight are applied to pulmonary nodules recognition. The experimental results show that the model training time of the proposed MKL-SVM-PSO algorithm is only 1/7 of the training time of the MKL-SVM grid search algorithm, achieving better recognition effect. Moreover, Euclidean norm of normalized error vector is proposed to measure the proximity between the average fitness curve and the optimal fitness curve after convergence. Through statistical analysis of the average of 20 times operation results with different inertial weights, it can be seen that the dynamic inertial weight is superior to the constant inertia weight in the MKL-SVM-PSO algorithm. In the dynamic inertial weight algorithm, the parameter optimization time of nonlinear inertia weight is shorter; the average fitness value after convergence is much closer to the optimal fitness value, which is better than the linear inertial weight. Besides, a better nonlinear inertial weight is verified. PMID:29853983
Pulmonary Nodule Recognition Based on Multiple Kernel Learning Support Vector Machine-PSO.
Li, Yang; Zhu, Zhichuan; Hou, Alin; Zhao, Qingdong; Liu, Liwei; Zhang, Lijuan
2018-01-01
Pulmonary nodule recognition is the core module of lung CAD. The Support Vector Machine (SVM) algorithm has been widely used in pulmonary nodule recognition, and the algorithm of Multiple Kernel Learning Support Vector Machine (MKL-SVM) has achieved good results therein. Based on grid search, however, the MKL-SVM algorithm needs long optimization time in course of parameter optimization; also its identification accuracy depends on the fineness of grid. In the paper, swarm intelligence is introduced and the Particle Swarm Optimization (PSO) is combined with MKL-SVM algorithm to be MKL-SVM-PSO algorithm so as to realize global optimization of parameters rapidly. In order to obtain the global optimal solution, different inertia weights such as constant inertia weight, linear inertia weight, and nonlinear inertia weight are applied to pulmonary nodules recognition. The experimental results show that the model training time of the proposed MKL-SVM-PSO algorithm is only 1/7 of the training time of the MKL-SVM grid search algorithm, achieving better recognition effect. Moreover, Euclidean norm of normalized error vector is proposed to measure the proximity between the average fitness curve and the optimal fitness curve after convergence. Through statistical analysis of the average of 20 times operation results with different inertial weights, it can be seen that the dynamic inertial weight is superior to the constant inertia weight in the MKL-SVM-PSO algorithm. In the dynamic inertial weight algorithm, the parameter optimization time of nonlinear inertia weight is shorter; the average fitness value after convergence is much closer to the optimal fitness value, which is better than the linear inertial weight. Besides, a better nonlinear inertial weight is verified.
NASA Astrophysics Data System (ADS)
Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian
2016-09-01
We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.
A novel harmony search-K means hybrid algorithm for clustering gene expression data
Nazeer, KA Abdul; Sebastian, MP; Kumar, SD Madhu
2013-01-01
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms. PMID:23390351
A novel harmony search-K means hybrid algorithm for clustering gene expression data.
Nazeer, Ka Abdul; Sebastian, Mp; Kumar, Sd Madhu
2013-01-01
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.
m-BIRCH: an online clustering approach for computer vision applications
NASA Astrophysics Data System (ADS)
Madan, Siddharth K.; Dana, Kristin J.
2015-03-01
We adapt a classic online clustering algorithm called Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH), to incrementally cluster large datasets of features commonly used in multimedia and computer vision. We call the adapted version modified-BIRCH (m-BIRCH). The algorithm uses only a fraction of the dataset memory to perform clustering, and updates the clustering decisions when new data comes in. Modifications made in m-BIRCH enable data driven parameter selection and effectively handle varying density regions in the feature space. Data driven parameter selection automatically controls the level of coarseness of the data summarization. Effective handling of varying density regions is necessary to well represent the different density regions in data summarization. We use m-BIRCH to cluster 840K color SIFT descriptors, and 60K outlier corrupted grayscale patches. We use the algorithm to cluster datasets consisting of challenging non-convex clustering patterns. Our implementation of the algorithm provides an useful clustering tool and is made publicly available.
DrugQuest - a text mining workflow for drug association discovery.
Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Vizirianakis, Ioannis S; Iliopoulos, Ioannis
2016-06-06
Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .
A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters
Wang, Zhihao; Yi, Jing
2016-01-01
For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule n and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result. PMID:28042291
Poole, William; Leinonen, Kalle; Shmulevich, Ilya
2017-01-01
Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C. PMID:28170390
Poole, William; Leinonen, Kalle; Shmulevich, Ilya; Knijnenburg, Theo A; Bernard, Brady
2017-02-01
Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.
A novel speech processing algorithm based on harmonicity cues in cochlear implant
NASA Astrophysics Data System (ADS)
Wang, Jian; Chen, Yousheng; Zhang, Zongping; Chen, Yan; Zhang, Weifeng
2017-08-01
This paper proposed a novel speech processing algorithm in cochlear implant, which used harmonicity cues to enhance tonal information in Mandarin Chinese speech recognition. The input speech was filtered by a 4-channel band-pass filter bank. The frequency ranges for the four bands were: 300-621, 621-1285, 1285-2657, and 2657-5499 Hz. In each pass band, temporal envelope and periodicity cues (TEPCs) below 400 Hz were extracted by full wave rectification and low-pass filtering. The TEPCs were modulated by a sinusoidal carrier, the frequency of which was fundamental frequency (F0) and its harmonics most close to the center frequency of each band. Signals from each band were combined together to obtain an output speech. Mandarin tone, word, and sentence recognition in quiet listening conditions were tested for the extensively used continuous interleaved sampling (CIS) strategy and the novel F0-harmonic algorithm. Results found that the F0-harmonic algorithm performed consistently better than CIS strategy in Mandarin tone, word, and sentence recognition. In addition, sentence recognition rate was higher than word recognition rate, as a result of contextual information in the sentence. Moreover, tone 3 and 4 performed better than tone 1 and tone 2, due to the easily identified features of the former. In conclusion, the F0-harmonic algorithm could enhance tonal information in cochlear implant speech processing due to the use of harmonicity cues, thereby improving Mandarin tone, word, and sentence recognition. Further study will focus on the test of the F0-harmonic algorithm in noisy listening conditions.
Security clustering algorithm based on reputation in hierarchical peer-to-peer network
NASA Astrophysics Data System (ADS)
Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji
2013-03-01
For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.
Robust traffic sign detection using fuzzy shape recognizer
NASA Astrophysics Data System (ADS)
Li, Lunbo; Li, Jun; Sun, Jianhong
2009-10-01
A novel fuzzy approach for the detection of traffic signs in natural environments is presented. More than 3000 road images were collected under different weather conditions by a digital camera, and used for testing this approach. Every RGB image was converted into HSV colour space, and segmented by the hue and saturation thresholds. A symmetrical detector was used to extract the local features of the regions of interest (ROI), and the shape of ROI was determined by a fuzzy shape recognizer which invoked a set of fuzzy rules. The experimental results show that the proposed algorithm is translation, rotation and scaling invariant, and gives reliable shape recognition in complex traffic scenes where clustering and partial occlusion normally occur.
Shah, Sohil Atul
2017-01-01
Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838
Mumtaz, Shahzad; Nabney, Ian T; Flower, Darren R
2017-10-01
Peptide-binding MHC proteins are thought the most variable across the human population; the extreme MHC polymorphism observed is functionally important and results from constrained divergent evolution. MHCs have vital functions in immunology and homeostasis: cell surface MHC class I molecules report cell status to CD8+ T cells, NKT cells and NK cells, thus playing key roles in pathogen defence, as well as mediating smell recognition, mate choice, Adverse Drug Reactions, and transplantation rejection. MHC peptide specificity falls into several supertypes exhibiting commonality of binding. It seems likely that other supertypes exist relevant to other functions. Since comprehensive experimental characterization is intractable, structure-based bioinformatics is the only viable solution. We modelled functional MHC proteins by homology and used calculated Poisson-Boltzmann electrostatics projected from the top surface of the MHC as multi-dimensional descriptors, analysing them using state-of-the-art dimensionality reduction techniques and clustering algorithms. We were able to recover the 3 MHC loci as separate clusters and identify clear sub-groups within them, vindicating unequivocally our choice of both data representation and clustering strategy. We expect this approach to make a profound contribution to the study of MHC polymorphism and its functional consequences, and, by extension, other burgeoning structural systems, such as GPCRs. Copyright © 2017 Elsevier Inc. All rights reserved.
Recognition of flow in everyday life using sensor agent robot with laser range finder
NASA Astrophysics Data System (ADS)
Goshima, Misa; Mita, Akira
2011-04-01
In the present paper, we suggest an algorithm for a sensor agent robot with a laser range finder to recognize the flows of residents in the living spaces in order to achieve flow recognition in the living spaces, recognition of the number of people in spaces, and the classification of the flows. House reform is or will be demanded to prolong the lifetime of the home. Adaption for the individuals is needed for our aging society which is growing at a rapid pace. Home autonomous mobile robots will become popular in the future for aged people to assist them in various situations. Therefore we have to collect various type of information of human and living spaces. However, a penetration in personal privacy must be avoided. It is essential to recognize flows in everyday life in order to assist house reforms and aging societies in terms of adaption for the individuals. With background subtraction, extra noise removal, and the clustering based k-means method, we got an average accuracy of more than 90% from the behavior from 1 to 3 persons, and also confirmed the reliability of our system no matter the position of the sensor. Our system can take advantages from autonomous mobile robots and protect the personal privacy. It hints at a generalization of flow recognition methods in the living spaces.
Determining open cluster membership. A Bayesian framework for quantitative member classification
NASA Astrophysics Data System (ADS)
Stott, Jonathan J.
2018-01-01
Aims: My goal is to develop a quantitative algorithm for assessing open cluster membership probabilities. The algorithm is designed to work with single-epoch observations. In its simplest form, only one set of program images and one set of reference images are required. Methods: The algorithm is based on a two-stage joint astrometric and photometric assessment of cluster membership probabilities. The probabilities were computed within a Bayesian framework using any available prior information. Where possible, the algorithm emphasizes simplicity over mathematical sophistication. Results: The algorithm was implemented and tested against three observational fields using published survey data. M 67 and NGC 654 were selected as cluster examples while a third, cluster-free, field was used for the final test data set. The algorithm shows good quantitative agreement with the existing surveys and has a false-positive rate significantly lower than the astrometric or photometric methods used individually.
Random Walk Quantum Clustering Algorithm Based on Space
NASA Astrophysics Data System (ADS)
Xiao, Shufen; Dong, Yumin; Ma, Hongyang
2018-01-01
In the random quantum walk, which is a quantum simulation of the classical walk, data points interacted when selecting the appropriate walk strategy by taking advantage of quantum-entanglement features; thus, the results obtained when the quantum walk is used are different from those when the classical walk is adopted. A new quantum walk clustering algorithm based on space is proposed by applying the quantum walk to clustering analysis. In this algorithm, data points are viewed as walking participants, and similar data points are clustered using the walk function in the pay-off matrix according to a certain rule. The walk process is simplified by implementing a space-combining rule. The proposed algorithm is validated by a simulation test and is proved superior to existing clustering algorithms, namely, Kmeans, PCA + Kmeans, and LDA-Km. The effects of some of the parameters in the proposed algorithm on its performance are also analyzed and discussed. Specific suggestions are provided.
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
NASA Astrophysics Data System (ADS)
Wagstaff, Kiri L.
2012-03-01
On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained clustering, in which some partial information about item assignments or other components of the resulting output are already known and must be accommodated by the solution. Some algorithms seek a partition of the data set into distinct clusters, while others build a hierarchy of nested clusters that can capture taxonomic relationships. Some produce a single optimal solution, while others construct a probabilistic model of cluster membership. More formally, clustering algorithms operate on a data set X composed of items represented by one or more features (dimensions). These could include physical location, such as right ascension and declination, as well as other properties such as brightness, color, temporal change, size, texture, and so on. Let D be the number of dimensions used to represent each item, xi ∈ RD. The clustering goal is to produce an organization P of the items in X that optimizes an objective function f : P -> R, which quantifies the quality of solution P. Often f is defined so as to maximize similarity within a cluster and minimize similarity between clusters. To that end, many algorithms make use of a measure d : X x X -> R of the distance between two items. A partitioning algorithm produces a set of clusters P = {c1, . . . , ck} such that the clusters are nonoverlapping (c_i intersected with c_j = empty set, i != j) subsets of the data set (Union_i c_i=X). Hierarchical algorithms produce a series of partitions P = {p1, . . . , pn }. For a complete hierarchy, the number of partitions n’= n, the number of items in the data set; the top partition is a single cluster containing all items, and the bottom partition contains n clusters, each containing a single item. For model-based clustering, each cluster c_j is represented by a model m_j , such as the cluster center or a Gaussian distribution. The wide array of available clustering algorithms may seem bewildering, and covering all of them is beyond the scope of this chapter. Choosing among them for a particular application involves considerations of the kind of data being analyzed, algorithm runtime efficiency, and how much prior knowledge is available about the problem domain, which can dictate the nature of clusters sought. Fundamentally, the clustering method and its representations of clusters carries with it a definition of what a cluster is, and it is important that this be aligned with the analysis goals for the problem at hand. In this chapter, I emphasize this point by identifying for each algorithm the cluster representation as a model, m_j , even for algorithms that are not typically thought of as creating a “model.” This chapter surveys a basic collection of clustering methods useful to any practitioner who is interested in applying clustering to a new data set. The algorithms include k-means (Section 25.2), EM (Section 25.3), agglomerative (Section 25.4), and spectral (Section 25.5) clustering, with side mentions of variants such as kernel k-means and divisive clustering. The chapter also discusses each algorithm’s strengths and limitations and provides pointers to additional in-depth reading for each subject. Section 25.6 discusses methods for incorporating domain knowledge into the clustering process. This chapter concludes with a brief survey of interesting applications of clustering methods to astronomy data (Section 25.7). The chapter begins with k-means because it is both generally accessible and so widely used that understanding it can be considered a necessary prerequisite for further work in the field. EM can be viewed as a more sophisticated version of k-means that uses a generative model for each cluster and probabilistic item assignments. Agglomerative clustering is the most basic form of hierarchical clustering and provides a basis for further exploration of algorithms in that vein. Spectral clustering permits a departure from feature-vector-based clustering and can operate on data sets instead represented as affinity, or similarity matrices—cases in which only pairwise information is known. The list of algorithms covered in this chapter is representative of those most commonly in use, but it is by no means comprehensive. There is an extensive collection of existing books on clustering that provide additional background and depth. Three early books that remain useful today are Anderberg’s Cluster Analysis for Applications [3], Hartigan’s Clustering Algorithms [25], and Gordon’s Classification [22]. The latter covers basics on similarity measures, partitioning and hierarchical algorithms, fuzzy clustering, overlapping clustering, conceptual clustering, validations methods, and visualization or data reduction techniques such as principal components analysis (PCA),multidimensional scaling, and self-organizing maps. More recently, Jain et al. provided a useful and informative survey [27] of a variety of different clustering algorithms, including those mentioned here as well as fuzzy, graph-theoretic, and evolutionary clustering. Everitt’s Cluster Analysis [19] provides a modern overview of algorithms, similarity measures, and evaluation methods.
NASA Astrophysics Data System (ADS)
Iqtait, M.; Mohamad, F. S.; Mamat, M.
2018-03-01
Biometric is a pattern recognition system which is used for automatic recognition of persons based on characteristics and features of an individual. Face recognition with high recognition rate is still a challenging task and usually accomplished in three phases consisting of face detection, feature extraction, and expression classification. Precise and strong location of trait point is a complicated and difficult issue in face recognition. Cootes proposed a Multi Resolution Active Shape Models (ASM) algorithm, which could extract specified shape accurately and efficiently. Furthermore, as the improvement of ASM, Active Appearance Models algorithm (AAM) is proposed to extracts both shape and texture of specified object simultaneously. In this paper we give more details about the two algorithms and give the results of experiments, testing their performance on one dataset of faces. We found that the ASM is faster and gains more accurate trait point location than the AAM, but the AAM gains a better match to the texture.
NASA Astrophysics Data System (ADS)
Levchuk, Georgiy; Shabarekh, Charlotte; Furjanic, Caitlin
2011-06-01
In this paper, we present results of adversarial activity recognition using data collected in the Empire Challenge (EC 09) exercise. The EC09 experiment provided an opportunity to evaluate our probabilistic spatiotemporal mission recognition algorithms using the data from live air-born and ground sensors. Using ambiguous and noisy data about locations of entities and motion events on the ground, the algorithms inferred the types and locations of OPFOR activities, including reconnaissance, cache runs, IED emplacements, logistics, and planning meetings. In this paper, we present detailed summary of the validation study and recognition accuracy results. Our algorithms were able to detect locations and types of over 75% of hostile activities in EC09 while producing 25% false alarms.
An adaptive deep Q-learning strategy for handwritten digit recognition.
Qiao, Junfei; Wang, Gongming; Li, Wenjing; Chen, Min
2018-02-22
Handwritten digits recognition is a challenging problem in recent years. Although many deep learning-based classification algorithms are studied for handwritten digits recognition, the recognition accuracy and running time still need to be further improved. In this paper, an adaptive deep Q-learning strategy is proposed to improve accuracy and shorten running time for handwritten digit recognition. The adaptive deep Q-learning strategy combines the feature-extracting capability of deep learning and the decision-making of reinforcement learning to form an adaptive Q-learning deep belief network (Q-ADBN). First, Q-ADBN extracts the features of original images using an adaptive deep auto-encoder (ADAE), and the extracted features are considered as the current states of Q-learning algorithm. Second, Q-ADBN receives Q-function (reward signal) during recognition of the current states, and the final handwritten digits recognition is implemented by maximizing the Q-function using Q-learning algorithm. Finally, experimental results from the well-known MNIST dataset show that the proposed Q-ADBN has a superiority to other similar methods in terms of accuracy and running time. Copyright © 2018 Elsevier Ltd. All rights reserved.
Appearance-based face recognition and light-fields.
Gross, Ralph; Matthews, Iain; Baker, Simon
2004-04-01
Arguably the most important decision to be made when developing an object recognition algorithm is selecting the scene measurements or features on which to base the algorithm. In appearance-based object recognition, the features are chosen to be the pixel intensity values in an image of the object. These pixel intensities correspond directly to the radiance of light emitted from the object along certain rays in space. The set of all such radiance values over all possible rays is known as the plenoptic function or light-field. In this paper, we develop a theory of appearance-based object recognition from light-fields. This theory leads directly to an algorithm for face recognition across pose that uses as many images of the face as are available, from one upwards. All of the pixels, whichever image they come from, are treated equally and used to estimate the (eigen) light-field of the object. The eigen light-field is then used as the set of features on which to base recognition, analogously to how the pixel intensities are used in appearance-based face and object recognition.
A Fault Recognition System for Gearboxes of Wind Turbines
NASA Astrophysics Data System (ADS)
Yang, Zhiling; Huang, Haiyue; Yin, Zidong
2017-12-01
Costs of maintenance and loss of power generation caused by the faults of wind turbines gearboxes are the main components of operation costs for a wind farm. Therefore, the technology of condition monitoring and fault recognition for wind turbines gearboxes is becoming a hot topic. A condition monitoring and fault recognition system (CMFRS) is presented for CBM of wind turbines gearboxes in this paper. The vibration signals from acceleration sensors at different locations of gearbox and the data from supervisory control and data acquisition (SCADA) system are collected to CMFRS. Then the feature extraction and optimization algorithm is applied to these operational data. Furthermore, to recognize the fault of gearboxes, the GSO-LSSVR algorithm is proposed, combining the least squares support vector regression machine (LSSVR) with the Glowworm Swarm Optimization (GSO) algorithm. Finally, the results show that the fault recognition system used in this paper has a high rate for identifying three states of wind turbines’ gears; besides, the combination of date features can affect the identifying rate and the selection optimization algorithm presented in this paper can get a pretty good date feature subset for the fault recognition.
St. Hilaire, Melissa A.; Sullivan, Jason P.; Anderson, Clare; Cohen, Daniel A.; Barger, Laura K.; Lockley, Steven W.; Klerman, Elizabeth B.
2012-01-01
There is currently no “gold standard” marker of cognitive performance impairment resulting from sleep loss. We utilized pattern recognition algorithms to determine which features of data collected under controlled laboratory conditions could most reliably identify cognitive performance impairment in response to sleep loss using data from only one testing session, such as would occur in the “real world” or field conditions. A training set for testing the pattern recognition algorithms was developed using objective Psychomotor Vigilance Task (PVT) and subjective Karolinska Sleepiness Scale (KSS) data collected from laboratory studies during which subjects were sleep deprived for 26 – 52 hours. The algorithm was then tested in data from both laboratory and field experiments. The pattern recognition algorithm was able to identify performance impairment with a single testing session in individuals studied under laboratory conditions using PVT, KSS, length of time awake and time of day information with sensitivity and specificity as high as 82%. When this algorithm was tested on data collected under real-world conditions from individuals whose data were not in the training set, accuracy of predictions for individuals categorized with low performance impairment were as high as 98%. Predictions for medium and severe performance impairment were less accurate. We conclude that pattern recognition algorithms may be a promising method for identifying performance impairment in individuals using only current information about the individual’s behavior. Single testing features (e.g., number of PVT lapses) with high correlation with performance impairment in the laboratory setting may not be the best indicators of performance impairment under real-world conditions. Pattern recognition algorithms should be further tested for their ability to be used in conjunction with other assessments of sleepiness in real-world conditions to quantify performance impairment in response to sleep loss. PMID:22959616
Ruocco, Anthony C.; Reilly, James L.; Rubin, Leah H.; Daros, Alex R.; Gershon, Elliot S.; Tamminga, Carol A.; Pearlson, Godfrey D.; Hill, S. Kristian; Keshavan, Matcheri S.; Gur, Ruben C.; Sweeney, John A.
2014-01-01
Background Difficulty recognizing facial emotions is an important social-cognitive deficit associated with psychotic disorders. It also may reflect a familial risk for psychosis in schizophrenia-spectrum disorders and bipolar disorder. Objective The objectives of this study from the Bipolar-Schizophrenia Network on Intermediate Phenotypes (B-SNIP) consortium were to: 1) compare emotion recognition deficits in schizophrenia, schizoaffective disorder and bipolar disorder with psychosis, 2) determine the familiality of emotion recognition deficits across these disorders, and 3) evaluate emotion recognition deficits in nonpsychotic relatives with and without elevated Cluster A and Cluster B personality disorder traits. Method Participants included probands with schizophrenia (n=297), schizoaffective disorder (depressed type, n=61; bipolar type, n=69), bipolar disorder with psychosis (n=248), their first-degree relatives (n=332, n=69, n=154, and n=286, respectively) and healthy controls (n=380). All participants completed the Penn Emotion Recognition Test, a standardized measure of facial emotion recognition assessing four basic emotions (happiness, sadness, anger and fear) and neutral expressions (no emotion). Results Compared to controls, emotion recognition deficits among probands increased progressively from bipolar disorder to schizoaffective disorder to schizophrenia. Proband and relative groups showed similar deficits perceiving angry and neutral faces, whereas deficits on fearful, happy and sad faces were primarily isolated to schizophrenia probands. Even non-psychotic relatives without elevated Cluster A or Cluster B personality disorder traits showed deficits on neutral and angry faces. Emotion recognition ability was moderately familial only in schizophrenia families. Conclusions Emotion recognition deficits are prominent but somewhat different across psychotic disorders. These deficits are reflected to a lesser extent in relatives, particularly on angry and neutral faces. Deficits were evident in non-psychotic relatives even without elevated personality disorder traits. Deficits in facial emotion recognition may reflect an important social-cognitive deficit in patients with psychotic disorders. PMID:25052782
Ruocco, Anthony C; Reilly, James L; Rubin, Leah H; Daros, Alex R; Gershon, Elliot S; Tamminga, Carol A; Pearlson, Godfrey D; Hill, S Kristian; Keshavan, Matcheri S; Gur, Ruben C; Sweeney, John A
2014-09-01
Difficulty recognizing facial emotions is an important social-cognitive deficit associated with psychotic disorders. It also may reflect a familial risk for psychosis in schizophrenia-spectrum disorders and bipolar disorder. The objectives of this study from the Bipolar-Schizophrenia Network on Intermediate Phenotypes (B-SNIP) consortium were to: 1) compare emotion recognition deficits in schizophrenia, schizoaffective disorder and bipolar disorder with psychosis, 2) determine the familiality of emotion recognition deficits across these disorders, and 3) evaluate emotion recognition deficits in nonpsychotic relatives with and without elevated Cluster A and Cluster B personality disorder traits. Participants included probands with schizophrenia (n=297), schizoaffective disorder (depressed type, n=61; bipolar type, n=69), bipolar disorder with psychosis (n=248), their first-degree relatives (n=332, n=69, n=154, and n=286, respectively) and healthy controls (n=380). All participants completed the Penn Emotion Recognition Test, a standardized measure of facial emotion recognition assessing four basic emotions (happiness, sadness, anger and fear) and neutral expressions (no emotion). Compared to controls, emotion recognition deficits among probands increased progressively from bipolar disorder to schizoaffective disorder to schizophrenia. Proband and relative groups showed similar deficits perceiving angry and neutral faces, whereas deficits on fearful, happy and sad faces were primarily isolated to schizophrenia probands. Even non-psychotic relatives without elevated Cluster A or Cluster B personality disorder traits showed deficits on neutral and angry faces. Emotion recognition ability was moderately familial only in schizophrenia families. Emotion recognition deficits are prominent but somewhat different across psychotic disorders. These deficits are reflected to a lesser extent in relatives, particularly on angry and neutral faces. Deficits were evident in non-psychotic relatives even without elevated personality disorder traits. Deficits in facial emotion recognition may reflect an important social-cognitive deficit in patients with psychotic disorders. Copyright © 2014 Elsevier B.V. All rights reserved.
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
ERIC Educational Resources Information Center
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
Multi scales based sparse matrix spectral clustering image segmentation
NASA Astrophysics Data System (ADS)
Liu, Zhongmin; Chen, Zhicai; Li, Zhanming; Hu, Wenjin
2018-04-01
In image segmentation, spectral clustering algorithms have to adopt the appropriate scaling parameter to calculate the similarity matrix between the pixels, which may have a great impact on the clustering result. Moreover, when the number of data instance is large, computational complexity and memory use of the algorithm will greatly increase. To solve these two problems, we proposed a new spectral clustering image segmentation algorithm based on multi scales and sparse matrix. We devised a new feature extraction method at first, then extracted the features of image on different scales, at last, using the feature information to construct sparse similarity matrix which can improve the operation efficiency. Compared with traditional spectral clustering algorithm, image segmentation experimental results show our algorithm have better degree of accuracy and robustness.
An AK-LDMeans algorithm based on image clustering
NASA Astrophysics Data System (ADS)
Chen, Huimin; Li, Xingwei; Zhang, Yongbin; Chen, Nan
2018-03-01
Clustering is an effective analytical technique for handling unmarked data for value mining. Its ultimate goal is to mark unclassified data quickly and correctly. We use the roadmap for the current image processing as the experimental background. In this paper, we propose an AK-LDMeans algorithm to automatically lock the K value by designing the Kcost fold line, and then use the long-distance high-density method to select the clustering centers to further replace the traditional initial clustering center selection method, which further improves the efficiency and accuracy of the traditional K-Means Algorithm. And the experimental results are compared with the current clustering algorithm and the results are obtained. The algorithm can provide effective reference value in the fields of image processing, machine vision and data mining.
Online Feature Transformation Learning for Cross-Domain Object Category Recognition.
Zhang, Xuesong; Zhuang, Yan; Wang, Wei; Pedrycz, Witold
2017-06-09
In this paper, we introduce a new research problem termed online feature transformation learning in the context of multiclass object category recognition. The learning of a feature transformation is viewed as learning a global similarity metric function in an online manner. We first consider the problem of online learning a feature transformation matrix expressed in the original feature space and propose an online passive aggressive feature transformation algorithm. Then these original features are mapped to kernel space and an online single kernel feature transformation (OSKFT) algorithm is developed to learn a nonlinear feature transformation. Based on the OSKFT and the existing Hedge algorithm, a novel online multiple kernel feature transformation algorithm is also proposed, which can further improve the performance of online feature transformation learning in large-scale application. The classifier is trained with k nearest neighbor algorithm together with the learned similarity metric function. Finally, we experimentally examined the effect of setting different parameter values in the proposed algorithms and evaluate the model performance on several multiclass object recognition data sets. The experimental results demonstrate the validity and good performance of our methods on cross-domain and multiclass object recognition application.
Hierarchical trie packet classification algorithm based on expectation-maximization clustering.
Bi, Xia-An; Zhao, Junxia
2017-01-01
With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm.
Fast and accurate face recognition based on image compression
NASA Astrophysics Data System (ADS)
Zheng, Yufeng; Blasch, Erik
2017-05-01
Image compression is desired for many image-related applications especially for network-based applications with bandwidth and storage constraints. The face recognition community typical reports concentrate on the maximal compression rate that would not decrease the recognition accuracy. In general, the wavelet-based face recognition methods such as EBGM (elastic bunch graph matching) and FPB (face pattern byte) are of high performance but run slowly due to their high computation demands. The PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis) algorithms run fast but perform poorly in face recognition. In this paper, we propose a novel face recognition method based on standard image compression algorithm, which is termed as compression-based (CPB) face recognition. First, all gallery images are compressed by the selected compression algorithm. Second, a mixed image is formed with the probe and gallery images and then compressed. Third, a composite compression ratio (CCR) is computed with three compression ratios calculated from: probe, gallery and mixed images. Finally, the CCR values are compared and the largest CCR corresponds to the matched face. The time cost of each face matching is about the time of compressing the mixed face image. We tested the proposed CPB method on the "ASUMSS face database" (visible and thermal images) from 105 subjects. The face recognition accuracy with visible images is 94.76% when using JPEG compression. On the same face dataset, the accuracy of FPB algorithm was reported as 91.43%. The JPEG-compressionbased (JPEG-CPB) face recognition is standard and fast, which may be integrated into a real-time imaging device.
Energy Aware Cluster-Based Routing in Flying Ad-Hoc Networks.
Aadil, Farhan; Raza, Ali; Khan, Muhammad Fahad; Maqsood, Muazzam; Mehmood, Irfan; Rho, Seungmin
2018-05-03
Flying ad-hoc networks (FANETs) are a very vibrant research area nowadays. They have many military and civil applications. Limited battery energy and the high mobility of micro unmanned aerial vehicles (UAVs) represent their two main problems, i.e., short flight time and inefficient routing. In this paper, we try to address both of these problems by means of efficient clustering. First, we adjust the transmission power of the UAVs by anticipating their operational requirements. Optimal transmission range will have minimum packet loss ratio (PLR) and better link quality, which ultimately save the energy consumed during communication. Second, we use a variant of the K-Means Density clustering algorithm for selection of cluster heads. Optimal cluster heads enhance the cluster lifetime and reduce the routing overhead. The proposed model outperforms the state of the art artificial intelligence techniques such as Ant Colony Optimization-based clustering algorithm and Grey Wolf Optimization-based clustering algorithm. The performance of the proposed algorithm is evaluated in term of number of clusters, cluster building time, cluster lifetime and energy consumption.
A Palmprint Recognition Algorithm Using Phase-Only Correlation
NASA Astrophysics Data System (ADS)
Ito, Koichi; Aoki, Takafumi; Nakajima, Hiroshi; Kobayashi, Koji; Higuchi, Tatsuo
This paper presents a palmprint recognition algorithm using Phase-Only Correlation (POC). The use of phase components in 2D (two-dimensional) discrete Fourier transforms of palmprint images makes it possible to achieve highly robust image registration and matching. In the proposed algorithm, POC is used to align scaling, rotation and translation between two palmprint images, and evaluate similarity between them. Experimental evaluation using a palmprint image database clearly demonstrates efficient matching performance of the proposed algorithm.
Simulation and performance of an artificial retina for 40 MHz track reconstruction
Abba, A.; Bedeschi, F.; Citterio, M.; ...
2015-03-05
We present the results of a detailed simulation of the artificial retina pattern-recognition algorithm, designed to reconstruct events with hundreds of charged-particle tracks in pixel and silicon detectors at LHCb with LHC crossing frequency of 40 MHz. Performances of the artificial retina algorithm are assessed using the official Monte Carlo samples of the LHCb experiment. We found performances for the retina pattern-recognition algorithm comparable with the full LHCb reconstruction algorithm.
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream
Ying Wah, Teh
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
A fast density-based clustering algorithm for real-time Internet of Things stream.
Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.
Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition.
Bianne-Bernard, Anne-Laure; Menasri, Farès; Al-Hajj Mohamad, Rami; Mokbel, Chafic; Kermorvant, Christopher; Likforman-Sulem, Laurence
2011-10-01
This study aims at building an efficient word recognition system resulting from the combination of three handwriting recognizers. The main component of this combined system is an HMM-based recognizer which considers dynamic and contextual information for a better modeling of writing units. For modeling the contextual units, a state-tying process based on decision tree clustering is introduced. Decision trees are built according to a set of expert-based questions on how characters are written. Questions are divided into global questions, yielding larger clusters, and precise questions, yielding smaller ones. Such clustering enables us to reduce the total number of models and Gaussians densities by 10. We then apply this modeling to the recognition of handwritten words. Experiments are conducted on three publicly available databases based on Latin or Arabic languages: Rimes, IAM, and OpenHart. The results obtained show that contextual information embedded with dynamic modeling significantly improves recognition.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
A clustering algorithm for determining community structure in complex networks
NASA Astrophysics Data System (ADS)
Jin, Hong; Yu, Wei; Li, ShiJun
2018-02-01
Clustering algorithms are attractive for the task of community detection in complex networks. DENCLUE is a representative density based clustering algorithm which has a firm mathematical basis and good clustering properties allowing for arbitrarily shaped clusters in high dimensional datasets. However, this method cannot be directly applied to community discovering due to its inability to deal with network data. Moreover, it requires a careful selection of the density parameter and the noise threshold. To solve these issues, a new community detection method is proposed in this paper. First, we use a spectral analysis technique to map the network data into a low dimensional Euclidean Space which can preserve node structural characteristics. Then, DENCLUE is applied to detect the communities in the network. A mathematical method named Sheather-Jones plug-in is chosen to select the density parameter which can describe the intrinsic clustering structure accurately. Moreover, every node on the network is meaningful so there were no noise nodes as a result the noise threshold can be ignored. We test our algorithm on both benchmark and real-life networks, and the results demonstrate the effectiveness of our algorithm over other popularity density based clustering algorithms adopted to community detection.
Collaborative filtering recommendation model based on fuzzy clustering algorithm
NASA Astrophysics Data System (ADS)
Yang, Ye; Zhang, Yunhua
2018-05-01
As one of the most widely used algorithms in recommender systems, collaborative filtering algorithm faces two serious problems, which are the sparsity of data and poor recommendation effect in big data environment. In traditional clustering analysis, the object is strictly divided into several classes and the boundary of this division is very clear. However, for most objects in real life, there is no strict definition of their forms and attributes of their class. Concerning the problems above, this paper proposes to improve the traditional collaborative filtering model through the hybrid optimization of implicit semantic algorithm and fuzzy clustering algorithm, meanwhile, cooperating with collaborative filtering algorithm. In this paper, the fuzzy clustering algorithm is introduced to fuzzy clustering the information of project attribute, which makes the project belong to different project categories with different membership degrees, and increases the density of data, effectively reduces the sparsity of data, and solves the problem of low accuracy which is resulted from the inaccuracy of similarity calculation. Finally, this paper carries out empirical analysis on the MovieLens dataset, and compares it with the traditional user-based collaborative filtering algorithm. The proposed algorithm has greatly improved the recommendation accuracy.
NASA Astrophysics Data System (ADS)
Feng, Jian-xin; Tang, Jia-fu; Wang, Guang-xing
2007-04-01
On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.
Parallel Clustering Algorithm for Large-Scale Biological Data Sets
Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang
2014-01-01
Backgrounds Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Methods Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. Result A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. PMID:24705246
Measuring Constraint-Set Utility for Partitional Clustering Algorithms
NASA Technical Reports Server (NTRS)
Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato
2006-01-01
Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing
Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing
2014-01-01
With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
Efficient Record Linkage Algorithms Using Complete Linkage Clustering.
Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar
2016-01-01
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times.
Efficient Record Linkage Algorithms Using Complete Linkage Clustering
Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar
2016-01-01
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. PMID:27124604
Tang, Jiqiang; Yang, Wu; Zhu, Lingyun; Wang, Dong; Feng, Xin
2017-04-26
In recent years, Wireless Sensor Networks with a Mobile Sink (WSN-MS) have been an active research topic due to the widespread use of mobile devices. However, how to get the balance between data delivery latency and energy consumption becomes a key issue of WSN-MS. In this paper, we study the clustering approach by jointly considering the Route planning for mobile sink and Clustering Problem (RCP) for static sensor nodes. We solve the RCP problem by using the minimum travel route clustering approach, which applies the minimum travel route of the mobile sink to guide the clustering process. We formulate the RCP problem as an Integer Non-Linear Programming (INLP) problem to shorten the travel route of the mobile sink under three constraints: the communication hops constraint, the travel route constraint and the loop avoidance constraint. We then propose an Imprecise Induction Algorithm (IIA) based on the property that the solution with a small hop count is more feasible than that with a large hop count. The IIA algorithm includes three processes: initializing travel route planning with a Traveling Salesman Problem (TSP) algorithm, transforming the cluster head to a cluster member and transforming the cluster member to a cluster head. Extensive experimental results show that the IIA algorithm could automatically adjust cluster heads according to the maximum hops parameter and plan a shorter travel route for the mobile sink. Compared with the Shortest Path Tree-based Data-Gathering Algorithm (SPT-DGA), the IIA algorithm has the characteristics of shorter route length, smaller cluster head count and faster convergence rate.
2012-09-30
recognition. Algorithm design and statistical analysis and feature analysis. Post -Doctoral Associate, Cornell University, Bioacoustics Research...short. The HPC-ADA was designed based on fielded systems [1-4, 6] that offer a variety of desirable attributes, specifically dynamic resource...The software package was designed to utilize parallel and distributed processing for running recognition and other advanced algorithms. DeLMA
False match elimination for face recognition based on SIFT algorithm
NASA Astrophysics Data System (ADS)
Gu, Xuyuan; Shi, Ping; Shao, Meide
2011-06-01
The SIFT (Scale Invariant Feature Transform) is a well known algorithm used to detect and describe local features in images. It is invariant to image scale, rotation and robust to the noise and illumination. In this paper, a novel method used for face recognition based on SIFT is proposed, which combines the optimization of SIFT, mutual matching and Progressive Sample Consensus (PROSAC) together and can eliminate the false matches of face recognition effectively. Experiments on ORL face database show that many false matches can be eliminated and better recognition rate is achieved.
Efficient implementation of parallel three-dimensional FFT on clusters of PCs
NASA Astrophysics Data System (ADS)
Takahashi, Daisuke
2003-05-01
In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of PCs. The three-dimensional FFT algorithm can be altered into a block three-dimensional FFT algorithm to reduce the number of cache misses. We show that the block three-dimensional FFT algorithm improves performance by utilizing the cache memory effectively. We use the block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT algorithm. We succeeded in obtaining performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.
Blessy, S A Praylin Selva; Sulochana, C Helen
2015-01-01
Segmentation of brain tumor from Magnetic Resonance Imaging (MRI) becomes very complicated due to the structural complexities of human brain and the presence of intensity inhomogeneities. To propose a method that effectively segments brain tumor from MR images and to evaluate the performance of unsupervised optimal fuzzy clustering (UOFC) algorithm for segmentation of brain tumor from MR images. Segmentation is done by preprocessing the MR image to standardize intensity inhomogeneities followed by feature extraction, feature fusion and clustering. Different validation measures are used to evaluate the performance of the proposed method using different clustering algorithms. The proposed method using UOFC algorithm produces high sensitivity (96%) and low specificity (4%) compared to other clustering methods. Validation results clearly show that the proposed method with UOFC algorithm effectively segments brain tumor from MR images.
Adaptive density trajectory cluster based on time and space distance
NASA Astrophysics Data System (ADS)
Liu, Fagui; Zhang, Zhijie
2017-10-01
There are some hotspot problems remaining in trajectory cluster for discovering mobile behavior regularity, such as the computation of distance between sub trajectories, the setting of parameter values in cluster algorithm and the uncertainty/boundary problem of data set. As a result, based on the time and space, this paper tries to define the calculation method of distance between sub trajectories. The significance of distance calculation for sub trajectories is to clearly reveal the differences in moving trajectories and to promote the accuracy of cluster algorithm. Besides, a novel adaptive density trajectory cluster algorithm is proposed, in which cluster radius is computed through using the density of data distribution. In addition, cluster centers and number are selected by a certain strategy automatically, and uncertainty/boundary problem of data set is solved by designed weighted rough c-means. Experimental results demonstrate that the proposed algorithm can perform the fuzzy trajectory cluster effectively on the basis of the time and space distance, and obtain the optimal cluster centers and rich cluster results information adaptably for excavating the features of mobile behavior in mobile and sociology network.
NASA Astrophysics Data System (ADS)
Di, Nur Faraidah Muhammad; Satari, Siti Zanariah
2017-05-01
Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
A Fast Implementation of the ISOCLUS Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2003-01-01
Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor modification to the ISOCLUS specification. We provide empirical evidence, on both synthetic and Landsat image data sets, that our algorithm's performance is essentially the same as that of ISOCLUS, but with significantly lower running times. We show that our algorithm runs from 3 to 30 times faster than a straightforward implementation of ISOCLUS. Our adaptation of the filtering algorithm involves the efficient computation of a number of cluster statistics that are needed for ISOCLUS, but not for k-means.
Ping, Lichuan; Wang, Ningyuan; Tang, Guofang; Lu, Thomas; Yin, Li; Tu, Wenhe; Fu, Qian-Jie
2017-09-01
Because of limited spectral resolution, Mandarin-speaking cochlear implant (CI) users have difficulty perceiving fundamental frequency (F0) cues that are important to lexical tone recognition. To improve Mandarin tone recognition in CI users, we implemented and evaluated a novel real-time algorithm (C-tone) to enhance the amplitude contour, which is strongly correlated with the F0 contour. The C-tone algorithm was implemented in clinical processors and evaluated in eight users of the Nurotron NSP-60 CI system. Subjects were given 2 weeks of experience with C-tone. Recognition of Chinese tones, monosyllables, and disyllables in quiet was measured with and without the C-tone algorithm. Subjective quality ratings were also obtained for C-tone. After 2 weeks of experience with C-tone, there were small but significant improvements in recognition of lexical tones, monosyllables, and disyllables (P < 0.05 in all cases). Among lexical tones, the largest improvements were observed for Tone 3 (falling-rising) and the smallest for Tone 4 (falling). Improvements with C-tone were greater for disyllables than for monosyllables. Subjective quality ratings showed no strong preference for or against C-tone, except for perception of own voice, where C-tone was preferred. The real-time C-tone algorithm provided small but significant improvements for speech performance in quiet with no change in sound quality. Pre-processing algorithms to reduce noise and better real-time F0 extraction would improve the benefits of C-tone in complex listening environments. Chinese CI users' speech recognition in quiet can be significantly improved by modifying the amplitude contour to better resemble the F0 contour.
Bagley, Amy D.; Abramowitz, Carolyn S.; Kosson, David S.
2010-01-01
Deficits in emotion processing have been widely reported to be central to psychopathy. However, few prior studies have examined vocal affect recognition in psychopaths, and these studies suffer from significant methodological limitations. Moreover, prior studies have yielded conflicting findings regarding the specificity of psychopaths’ affect recognition deficits. This study examined vocal affect recognition in 107 male inmates under conditions requiring isolated prosodic vs. semantic analysis of affective cues and compared subgroups of offenders identified via cluster analysis on vocal affect recognition. Psychopaths demonstrated deficits in vocal affect recognition under conditions requiring use of semantic cues and conditions requiring use of prosodic cues. Moreover, both primary and secondary psychopaths exhibited relatively similar emotional deficits in the semantic analysis condition compared to nonpsychopathic control participants. This study demonstrates that psychopaths’ vocal affect recognition deficits are not due to methodological limitations of previous studies and provides preliminary evidence that primary and secondary psychopaths exhibit generally similar deficits in vocal affect recognition. PMID:19413412
Neural system for heartbeats recognition using genetically integrated ensemble of classifiers.
Osowski, Stanislaw; Siwek, Krzysztof; Siroic, Robert
2011-03-01
This paper presents the application of genetic algorithm for the integration of neural classifiers combined in the ensemble for the accurate recognition of heartbeat types on the basis of ECG registration. The idea presented in this paper is that using many classifiers arranged in the form of ensemble leads to the increased accuracy of the recognition. In such ensemble the important problem is the integration of all classifiers into one effective classification system. This paper proposes the use of genetic algorithm. It was shown that application of the genetic algorithm is very efficient and allows to reduce significantly the total error of heartbeat recognition. This was confirmed by the numerical experiments performed on the MIT BIH Arrhythmia Database. Copyright © 2011 Elsevier Ltd. All rights reserved.
Wang, Guanglei; Wang, Pengyu; Han, Yechen; Liu, Xiuling; Li, Yan; Lu, Qian
2017-06-01
In recent years, optical coherence tomography (OCT) has developed into a popular coronary imaging technology at home and abroad. The segmentation of plaque regions in coronary OCT images has great significance for vulnerable plaque recognition and research. In this paper, a new algorithm based on K -means clustering and improved random walk is proposed and Semi-automated segmentation of calcified plaque, fibrotic plaque and lipid pool was achieved. And the weight function of random walk is improved. The distance between the edges of pixels in the image and the seed points is added to the definition of the weight function. It increases the weak edge weights and prevent over-segmentation. Based on the above methods, the OCT images of 9 coronary atherosclerotic patients were selected for plaque segmentation. By contrasting the doctor's manual segmentation results with this method, it was proved that this method had good robustness and accuracy. It is hoped that this method can be helpful for the clinical diagnosis of coronary heart disease.
Reducing Earth Topography Resolution for SMAP Mission Ground Tracks Using K-Means Clustering
NASA Technical Reports Server (NTRS)
Rizvi, Farheen
2013-01-01
The K-means clustering algorithm is used to reduce Earth topography resolution for the SMAP mission ground tracks. As SMAP propagates in orbit, knowledge of the radar antenna footprints on Earth is required for the antenna misalignment calibration. Each antenna footprint contains a latitude and longitude location pair on the Earth surface. There are 400 pairs in one data set for the calibration model. It is computationally expensive to calculate corresponding Earth elevation for these data pairs. Thus, the antenna footprint resolution is reduced. Similar topographical data pairs are grouped together with the K-means clustering algorithm. The resolution is reduced to the mean of each topographical cluster called the cluster centroid. The corresponding Earth elevation for each cluster centroid is assigned to the entire group. Results show that 400 data points are reduced to 60 while still maintaining algorithm performance and computational efficiency. In this work, sensitivity analysis is also performed to show a trade-off between algorithm performance versus computational efficiency as the number of cluster centroids and algorithm iterations are increased.
Accurate Grid-based Clustering Algorithm with Diagonal Grid Searching and Merging
NASA Astrophysics Data System (ADS)
Liu, Feng; Ye, Chengcheng; Zhu, Erzhou
2017-09-01
Due to the advent of big data, data mining technology has attracted more and more attentions. As an important data analysis method, grid clustering algorithm is fast but with relatively lower accuracy. This paper presents an improved clustering algorithm combined with grid and density parameters. The algorithm first divides the data space into the valid meshes and invalid meshes through grid parameters. Secondly, from the starting point located at the first point of the diagonal of the grids, the algorithm takes the direction of “horizontal right, vertical down” to merge the valid meshes. Furthermore, by the boundary grid processing, the invalid grids are searched and merged when the adjacent left, above, and diagonal-direction grids are all the valid ones. By doing this, the accuracy of clustering is improved. The experimental results have shown that the proposed algorithm is accuracy and relatively faster when compared with some popularly used algorithms.
NASA Astrophysics Data System (ADS)
Chuan, Zun Liang; Ismail, Noriszura; Shinyie, Wendy Ling; Lit Ken, Tan; Fam, Soo-Fen; Senawi, Azlyna; Yusoff, Wan Nur Syahidah Wan
2018-04-01
Due to the limited of historical precipitation records, agglomerative hierarchical clustering algorithms widely used to extrapolate information from gauged to ungauged precipitation catchments in yielding a more reliable projection of extreme hydro-meteorological events such as extreme precipitation events. However, identifying the optimum number of homogeneous precipitation catchments accurately based on the dendrogram resulted using agglomerative hierarchical algorithms are very subjective. The main objective of this study is to propose an efficient regionalized algorithm to identify the homogeneous precipitation catchments for non-stationary precipitation time series. The homogeneous precipitation catchments are identified using average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling, while uncentered correlation coefficient as the similarity measure. The regionalized homogeneous precipitation is consolidated using K-sample Anderson Darling non-parametric test. The analysis result shows the proposed regionalized algorithm performed more better compared to the proposed agglomerative hierarchical clustering algorithm in previous studies.
Robust MST-Based Clustering Algorithm.
Liu, Qidong; Zhang, Ruisheng; Zhao, Zhili; Wang, Zhenghai; Jiao, Mengyao; Wang, Guangjing
2018-06-01
Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.
NASA Astrophysics Data System (ADS)
Brenden, T. O.; Clark, R. D.; Wiley, M. J.; Seelbach, P. W.; Wang, L.
2005-05-01
Remote sensing and geographic information systems have made it possible to attribute variables for streams at increasingly detailed resolutions (e.g., individual river reaches). Nevertheless, management decisions still must be made at large scales because land and stream managers typically lack sufficient resources to manage on an individual reach basis. Managers thus require a method for identifying stream management units that are ecologically similar and that can be expected to respond similarly to management decisions. We have developed a spatially-constrained clustering algorithm that can merge neighboring river reaches with similar ecological characteristics into larger management units. The clustering algorithm is based on the Cluster Affinity Search Technique (CAST), which was developed for clustering gene expression data. Inputs to the clustering algorithm are the neighbor relationships of the reaches that comprise the digital river network, the ecological attributes of the reaches, and an affinity value, which identifies the minimum similarity for merging river reaches. In this presentation, we describe the clustering algorithm in greater detail and contrast its use with other methods (expert opinion, classification approach, regular clustering) for identifying management units using several Michigan watersheds as a backdrop.
On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms.
Chen, Chunlei; He, Li; Zhang, Huixiang; Zheng, Hao; Wang, Lei
2017-01-01
Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions.
Real-time polarization imaging algorithm for camera-based polarization navigation sensors.
Lu, Hao; Zhao, Kaichun; You, Zheng; Huang, Kaoli
2017-04-10
Biologically inspired polarization navigation is a promising approach due to its autonomous nature, high precision, and robustness. Many researchers have built point source-based and camera-based polarization navigation prototypes in recent years. Camera-based prototypes can benefit from their high spatial resolution but incur a heavy computation load. The pattern recognition algorithm in most polarization imaging algorithms involves several nonlinear calculations that impose a significant computation burden. In this paper, the polarization imaging and pattern recognition algorithms are optimized through reduction to several linear calculations by exploiting the orthogonality of the Stokes parameters without affecting precision according to the features of the solar meridian and the patterns of the polarized skylight. The algorithm contains a pattern recognition algorithm with a Hough transform as well as orientation measurement algorithms. The algorithm was loaded and run on a digital signal processing system to test its computational complexity. The test showed that the running time decreased to several tens of milliseconds from several thousand milliseconds. Through simulations and experiments, it was found that the algorithm can measure orientation without reducing precision. It can hence satisfy the practical demands of low computational load and high precision for use in embedded systems.
Hierarchical trie packet classification algorithm based on expectation-maximization clustering
Bi, Xia-an; Zhao, Junxia
2017-01-01
With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm. PMID:28704476
Image-algebraic design of multispectral target recognition algorithms
NASA Astrophysics Data System (ADS)
Schmalz, Mark S.; Ritter, Gerhard X.
1994-06-01
In this paper, we discuss methods for multispectral ATR (Automated Target Recognition) of small targets that are sensed under suboptimal conditions, such as haze, smoke, and low light levels. In particular, we discuss our ongoing development of algorithms and software that effect intelligent object recognition by selecting ATR filter parameters according to ambient conditions. Our algorithms are expressed in terms of IA (image algebra), a concise, rigorous notation that unifies linear and nonlinear mathematics in the image processing domain. IA has been implemented on a variety of parallel computers, with preprocessors available for the Ada and FORTRAN languages. An image algebra C++ class library has recently been made available. Thus, our algorithms are both feasible implementationally and portable to numerous machines. Analyses emphasize the aspects of image algebra that aid the design of multispectral vision algorithms, such as parameterized templates that facilitate the flexible specification of ATR filters.
Automatic recognition of fundamental tissues on histology images of the human cardiovascular system.
Mazo, Claudia; Trujillo, Maria; Alegre, Enrique; Salazar, Liliana
2016-10-01
Cardiovascular disease is the leading cause of death worldwide. Therefore, techniques for improving diagnosis and treatment in this field have become key areas for research. In particular, approaches for tissue image processing may support education system and medical practice. In this paper, an approach to automatic recognition and classification of fundamental tissues, using morphological information is presented. Taking a 40× or 10× histological image as input, three clusters are created with the k-means algorithm using a structural tensor and the red and the green channels. Loose connective tissue, light regions and cell nuclei are recognised on 40× images. Then, the cell nuclei's features - shape and spatial projection - and light regions are used to recognise and classify epithelial cells and tissue into flat, cubic and cylindrical. In a similar way, light regions, loose connective and muscle tissues are recognised on 10× images. Finally, the tissue's function and composition are used to refine muscle tissue recognition. Experimental validation is then carried out by histologist following expert criteria, along with manually annotated images that are used as a ground-truth. The results revealed that the proposed approach classified the fundamental tissues in a similar way to the conventional method employed by histologists. The proposed automatic recognition approach provides for epithelial tissues a sensitivity of 0.79 for cubic, 0.85 for cylindrical and 0.91 for flat. Furthermore, the experts gave our method an average score of 4.85 out of 5 in the recognition of loose connective tissue and 4.82 out of 5 for muscle tissue recognition. Copyright © 2016 Elsevier Ltd. All rights reserved.
A fast parallel clustering algorithm for molecular simulation trajectories.
Zhao, Yutong; Sheong, Fu Kit; Sun, Jian; Sander, Pedro; Huang, Xuhui
2013-01-15
We implemented a GPU-powered parallel k-centers algorithm to perform clustering on the conformations of molecular dynamics (MD) simulations. The algorithm is up to two orders of magnitude faster than the CPU implementation. We tested our algorithm on four protein MD simulation datasets ranging from the small Alanine Dipeptide to a 370-residue Maltose Binding Protein (MBP). It is capable of grouping 250,000 conformations of the MBP into 4000 clusters within 40 seconds. To achieve this, we effectively parallelized the code on the GPU and utilize the triangle inequality of metric spaces. Furthermore, the algorithm's running time is linear with respect to the number of cluster centers. In addition, we found the triangle inequality to be less effective in higher dimensions and provide a mathematical rationale. Finally, using Alanine Dipeptide as an example, we show a strong correlation between cluster populations resulting from the k-centers algorithm and the underlying density. © 2012 Wiley Periodicals, Inc. Copyright © 2012 Wiley Periodicals, Inc.
Long-term surface EMG monitoring using K-means clustering and compressive sensing
NASA Astrophysics Data System (ADS)
Balouchestani, Mohammadreza; Krishnan, Sridhar
2015-05-01
In this work, we present an advanced K-means clustering algorithm based on Compressed Sensing theory (CS) in combination with the K-Singular Value Decomposition (K-SVD) method for Clustering of long-term recording of surface Electromyography (sEMG) signals. The long-term monitoring of sEMG signals aims at recording of the electrical activity produced by muscles which are very useful procedure for treatment and diagnostic purposes as well as for detection of various pathologies. The proposed algorithm is examined for three scenarios of sEMG signals including healthy person (sEMG-Healthy), a patient with myopathy (sEMG-Myopathy), and a patient with neuropathy (sEMG-Neuropathr), respectively. The proposed algorithm can easily scan large sEMG datasets of long-term sEMG recording. We test the proposed algorithm with Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) dimensionality reduction methods. Then, the output of the proposed algorithm is fed to K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers in order to calclute the clustering performance. The proposed algorithm achieves a classification accuracy of 99.22%. This ability allows reducing 17% of Average Classification Error (ACE), 9% of Training Error (TE), and 18% of Root Mean Square Error (RMSE). The proposed algorithm also reduces 14% clustering energy consumption compared to the existing K-Means clustering algorithm.
NASA Astrophysics Data System (ADS)
Fan, Tian-E.; Shao, Gui-Fang; Ji, Qing-Shuang; Zheng, Ji-Wen; Liu, Tun-dong; Wen, Yu-Hua
2016-11-01
Theoretically, the determination of the structure of a cluster is to search the global minimum on its potential energy surface. The global minimization problem is often nondeterministic-polynomial-time (NP) hard and the number of local minima grows exponentially with the cluster size. In this article, a multi-populations multi-strategies differential evolution algorithm has been proposed to search the globally stable structure of Fe and Cr nanoclusters. The algorithm combines a multi-populations differential evolution with an elite pool scheme to keep the diversity of the solutions and avoid prematurely trapping into local optima. Moreover, multi-strategies such as growing method in initialization and three differential strategies in mutation are introduced to improve the convergence speed and lower the computational cost. The accuracy and effectiveness of our algorithm have been verified by comparing the results of Fe clusters with Cambridge Cluster Database. Meanwhile, the performance of our algorithm has been analyzed by comparing the convergence rate and energy evaluations with the classical DE algorithm. The multi-populations, multi-strategies mutation and growing method in initialization in our algorithm have been considered respectively. Furthermore, the structural growth pattern of Cr clusters has been predicted by this algorithm. The results show that the lowest-energy structure of Cr clusters contains many icosahedra, and the number of the icosahedral rings rises with increasing size.
Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra".
Griss, Johannes; Perez-Riverol, Yasset; The, Matthew; Käll, Lukas; Vizcaíno, Juan Antonio
2018-05-04
In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.
A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
A dynamic scheduling algorithm for singe-arm two-cluster tools with flexible processing times
NASA Astrophysics Data System (ADS)
Li, Xin; Fung, Richard Y. K.
2018-02-01
This article presents a dynamic algorithm for job scheduling in two-cluster tools producing multi-type wafers with flexible processing times. Flexible processing times mean that the actual times for processing wafers should be within given time intervals. The objective of the work is to minimize the completion time of the newly inserted wafer. To deal with this issue, a two-cluster tool is decomposed into three reduced single-cluster tools (RCTs) in a series based on a decomposition approach proposed in this article. For each single-cluster tool, a dynamic scheduling algorithm based on temporal constraints is developed to schedule the newly inserted wafer. Three experiments have been carried out to test the dynamic scheduling algorithm proposed, comparing with the results the 'earliest starting time' heuristic (EST) adopted in previous literature. The results show that the dynamic algorithm proposed in this article is effective and practical.
A novel artificial bee colony based clustering algorithm for categorical data.
Ji, Jinchao; Pang, Wei; Zheng, Yanlin; Wang, Zhe; Ma, Zhiqiang
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data.
Zhu, Bohui; Ding, Yongsheng; Hao, Kuangrong
2013-01-01
This paper presents a novel maximum margin clustering method with immune evolution (IEMMC) for automatic diagnosis of electrocardiogram (ECG) arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias. PMID:23690875
The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm
Ahmed, Zakir Hussain
2014-01-01
The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148
A genetic graph-based approach for partitional clustering.
Menéndez, Héctor D; Barrero, David F; Camacho, David
2014-05-01
Clustering is one of the most versatile tools for data analysis. In the recent years, clustering that seeks the continuity of data (in opposition to classical centroid-based approaches) has attracted an increasing research interest. It is a challenging problem with a remarkable practical interest. The most popular continuity clustering method is the spectral clustering (SC) algorithm, which is based on graph cut: It initially generates a similarity graph using a distance measure and then studies its graph spectrum to find the best cut. This approach is sensitive to the parameters of the metric, and a correct parameter choice is critical to the quality of the cluster. This work proposes a new algorithm, inspired by SC, that reduces the parameter dependency while maintaining the quality of the solution. The new algorithm, named genetic graph-based clustering (GGC), takes an evolutionary approach introducing a genetic algorithm (GA) to cluster the similarity graph. The experimental validation shows that GGC increases robustness of SC and has competitive performance in comparison with classical clustering methods, at least, in the synthetic and real dataset used in the experiments.
Exploratory Item Classification Via Spectral Graph Clustering
Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang
2017-01-01
Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.
He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej
2011-12-01
Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
Approximated mutual information training for speech recognition using myoelectric signals.
Guo, Hua J; Chan, A D C
2006-01-01
A new training algorithm called the approximated maximum mutual information (AMMI) is proposed to improve the accuracy of myoelectric speech recognition using hidden Markov models (HMMs). Previous studies have demonstrated that automatic speech recognition can be performed using myoelectric signals from articulatory muscles of the face. Classification of facial myoelectric signals can be performed using HMMs that are trained using the maximum likelihood (ML) algorithm; however, this algorithm maximizes the likelihood of the observations in the training sequence, which is not directly associated with optimal classification accuracy. The AMMI training algorithm attempts to maximize the mutual information, thereby training the HMMs to optimize their parameters for discrimination. Our results show that AMMI training consistently reduces the error rates compared to these by the ML training, increasing the accuracy by approximately 3% on average.
a Review on State-Of Face Recognition Approaches
NASA Astrophysics Data System (ADS)
Mahmood, Zahid; Muhammad, Nazeer; Bibi, Nargis; Ali, Tauseef
Automatic Face Recognition (FR) presents a challenging task in the field of pattern recognition and despite the huge research in the past several decades; it still remains an open research problem. This is primarily due to the variability in the facial images, such as non-uniform illuminations, low resolution, occlusion, and/or variation in poses. Due to its non-intrusive nature, the FR is an attractive biometric modality and has gained a lot of attention in the biometric research community. Driven by the enormous number of potential application domains, many algorithms have been proposed for the FR. This paper presents an overview of the state-of-the-art FR algorithms, focusing their performances on publicly available databases. We highlight the conditions of the image databases with regard to the recognition rate of each approach. This is useful as a quick research overview and for practitioners as well to choose an algorithm for their specified FR application. To provide a comprehensive survey, the paper divides the FR algorithms into three categories: (1) intensity-based, (2) video-based, and (3) 3D based FR algorithms. In each category, the most commonly used algorithms and their performance is reported on standard face databases and a brief critical discussion is carried out.
Machine-learned cluster identification in high-dimensional data.
Ultsch, Alfred; Lötsch, Jörn
2017-02-01
High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a viable, unbiased method to identify true clusters in the high-dimensional space of complex data. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Scheirer, Walter J; de Rezende Rocha, Anderson; Sapkota, Archana; Boult, Terrance E
2013-07-01
To date, almost all experimental evaluations of machine learning-based recognition algorithms in computer vision have taken the form of "closed set" recognition, whereby all testing classes are known at training time. A more realistic scenario for vision applications is "open set" recognition, where incomplete knowledge of the world is present at training time, and unknown classes can be submitted to an algorithm during testing. This paper explores the nature of open set recognition and formalizes its definition as a constrained minimization problem. The open set recognition problem is not well addressed by existing algorithms because it requires strong generalization. As a step toward a solution, we introduce a novel "1-vs-set machine," which sculpts a decision space from the marginal distances of a 1-class or binary SVM with a linear kernel. This methodology applies to several different applications in computer vision where open set recognition is a challenging problem, including object recognition and face verification. We consider both in this work, with large scale cross-dataset experiments performed over the Caltech 256 and ImageNet sets, as well as face matching experiments performed over the Labeled Faces in the Wild set. The experiments highlight the effectiveness of machines adapted for open set evaluation compared to existing 1-class and binary SVMs for the same tasks.
Tang, Jiqiang; Yang, Wu; Zhu, Lingyun; Wang, Dong; Feng, Xin
2017-01-01
In recent years, Wireless Sensor Networks with a Mobile Sink (WSN-MS) have been an active research topic due to the widespread use of mobile devices. However, how to get the balance between data delivery latency and energy consumption becomes a key issue of WSN-MS. In this paper, we study the clustering approach by jointly considering the Route planning for mobile sink and Clustering Problem (RCP) for static sensor nodes. We solve the RCP problem by using the minimum travel route clustering approach, which applies the minimum travel route of the mobile sink to guide the clustering process. We formulate the RCP problem as an Integer Non-Linear Programming (INLP) problem to shorten the travel route of the mobile sink under three constraints: the communication hops constraint, the travel route constraint and the loop avoidance constraint. We then propose an Imprecise Induction Algorithm (IIA) based on the property that the solution with a small hop count is more feasible than that with a large hop count. The IIA algorithm includes three processes: initializing travel route planning with a Traveling Salesman Problem (TSP) algorithm, transforming the cluster head to a cluster member and transforming the cluster member to a cluster head. Extensive experimental results show that the IIA algorithm could automatically adjust cluster heads according to the maximum hops parameter and plan a shorter travel route for the mobile sink. Compared with the Shortest Path Tree-based Data-Gathering Algorithm (SPT-DGA), the IIA algorithm has the characteristics of shorter route length, smaller cluster head count and faster convergence rate. PMID:28445434
Nidheesh, N; Abdul Nazeer, K A; Ameer, P M
2017-12-01
Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.
Determining the Number of Clusters in a Data Set Without Graphical Interpretation
NASA Technical Reports Server (NTRS)
Aguirre, Nathan S.; Davies, Misty D.
2011-01-01
Cluster analysis is a data mining technique that is meant ot simplify the process of classifying data points. The basic clustering process requires an input of data points and the number of clusters wanted. The clustering algorithm will then pick starting C points for the clusters, which can be either random spatial points or random data points. It then assigns each data point to the nearest C point where "nearest usually means Euclidean distance, but some algorithms use another criterion. The next step is determining whether the clustering arrangement this found is within a certain tolerance. If it falls within this tolerance, the process ends. Otherwise the C points are adjusted based on how many data points are in each cluster, and the steps repeat until the algorithm converges,
Bhattacharya, Anindya; De, Rajat K
2010-08-01
Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software. Copyright 2010 Elsevier Inc. All rights reserved.
Object Recognition and Localization: The Role of Tactile Sensors
Aggarwal, Achint; Kirchner, Frank
2014-01-01
Tactile sensors, because of their intrinsic insensitivity to lighting conditions and water turbidity, provide promising opportunities for augmenting the capabilities of vision sensors in applications involving object recognition and localization. This paper presents two approaches for haptic object recognition and localization for ground and underwater environments. The first approach called Batch Ransac and Iterative Closest Point augmented Particle Filter (BRICPPF) is based on an innovative combination of particle filters, Iterative-Closest-Point algorithm, and a feature-based Random Sampling and Consensus (RANSAC) algorithm for database matching. It can handle a large database of 3D-objects of complex shapes and performs a complete six-degree-of-freedom localization of static objects. The algorithms are validated by experimentation in ground and underwater environments using real hardware. To our knowledge this is the first instance of haptic object recognition and localization in underwater environments. The second approach is biologically inspired, and provides a close integration between exploration and recognition. An edge following exploration strategy is developed that receives feedback from the current state of recognition. A recognition by parts approach is developed which uses the BRICPPF for object sub-part recognition. Object exploration is either directed to explore a part until it is successfully recognized, or is directed towards new parts to endorse the current recognition belief. This approach is validated by simulation experiments. PMID:24553087
Reducing the time requirement of k-means algorithm.
Osamor, Victor Chukwudi; Adebiyi, Ezekiel Femi; Oyelade, Jelilli Olarenwaju; Doumbia, Seydou
2012-01-01
Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in d-dimensional space R(d) and an integer k. The problem is to determine a set of k points in R(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We provided the correctness proof for this algorithm. Results obtained from testing the algorithm on three biological data and six non-biological data (three of these data are real, while the other three are simulated) also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of a known structure using the Hubert-Arabie Adjusted Rand index (ARI(HA)). We found that when k is close to d, the quality is good (ARI(HA)>0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARI(HA)>0.9). In this paper, emphases are on the reduction of the time requirement of the k-means algorithm and its application to microarray data due to the desire to create a tool for clustering and malaria research. However, the new clustering algorithm can be used for other clustering needs as long as an appropriate measure of distance between the centroids and the members is used. This has been demonstrated in this work on six non-biological data.
Reducing the Time Requirement of k-Means Algorithm
Osamor, Victor Chukwudi; Adebiyi, Ezekiel Femi; Oyelade, Jelilli Olarenwaju; Doumbia, Seydou
2012-01-01
Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in d-dimensional space Rd and an integer k. The problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We provided the correctness proof for this algorithm. Results obtained from testing the algorithm on three biological data and six non-biological data (three of these data are real, while the other three are simulated) also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of a known structure using the Hubert-Arabie Adjusted Rand index (ARIHA). We found that when k is close to d, the quality is good (ARIHA>0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARIHA>0.9). In this paper, emphases are on the reduction of the time requirement of the k-means algorithm and its application to microarray data due to the desire to create a tool for clustering and malaria research. However, the new clustering algorithm can be used for other clustering needs as long as an appropriate measure of distance between the centroids and the members is used. This has been demonstrated in this work on six non-biological data. PMID:23239974
An Autonomous Star Identification Algorithm Based on One-Dimensional Vector Pattern for Star Sensors
Luo, Liyan; Xu, Luping; Zhang, Hua
2015-01-01
In order to enhance the robustness and accelerate the recognition speed of star identification, an autonomous star identification algorithm for star sensors is proposed based on the one-dimensional vector pattern (one_DVP). In the proposed algorithm, the space geometry information of the observed stars is used to form the one-dimensional vector pattern of the observed star. The one-dimensional vector pattern of the same observed star remains unchanged when the stellar image rotates, so the problem of star identification is simplified as the comparison of the two feature vectors. The one-dimensional vector pattern is adopted to build the feature vector of the star pattern, which makes it possible to identify the observed stars robustly. The characteristics of the feature vector and the proposed search strategy for the matching pattern make it possible to achieve the recognition result as quickly as possible. The simulation results demonstrate that the proposed algorithm can effectively accelerate the star identification. Moreover, the recognition accuracy and robustness by the proposed algorithm are better than those by the pyramid algorithm, the modified grid algorithm, and the LPT algorithm. The theoretical analysis and experimental results show that the proposed algorithm outperforms the other three star identification algorithms. PMID:26198233
Luo, Liyan; Xu, Luping; Zhang, Hua
2015-07-07
In order to enhance the robustness and accelerate the recognition speed of star identification, an autonomous star identification algorithm for star sensors is proposed based on the one-dimensional vector pattern (one_DVP). In the proposed algorithm, the space geometry information of the observed stars is used to form the one-dimensional vector pattern of the observed star. The one-dimensional vector pattern of the same observed star remains unchanged when the stellar image rotates, so the problem of star identification is simplified as the comparison of the two feature vectors. The one-dimensional vector pattern is adopted to build the feature vector of the star pattern, which makes it possible to identify the observed stars robustly. The characteristics of the feature vector and the proposed search strategy for the matching pattern make it possible to achieve the recognition result as quickly as possible. The simulation results demonstrate that the proposed algorithm can effectively accelerate the star identification. Moreover, the recognition accuracy and robustness by the proposed algorithm are better than those by the pyramid algorithm, the modified grid algorithm, and the LPT algorithm. The theoretical analysis and experimental results show that the proposed algorithm outperforms the other three star identification algorithms.
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Fromm, F. R.; Northouse, R. A.
1974-01-01
A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.
Block clustering based on difference of convex functions (DC) programming and DC algorithms.
Le, Hoai Minh; Le Thi, Hoai An; Dinh, Tao Pham; Huynh, Van Ngai
2013-10-01
We investigate difference of convex functions (DC) programming and the DC algorithm (DCA) to solve the block clustering problem in the continuous framework, which traditionally requires solving a hard combinatorial optimization problem. DC reformulation techniques and exact penalty in DC programming are developed to build an appropriate equivalent DC program of the block clustering problem. They lead to an elegant and explicit DCA scheme for the resulting DC program. Computational experiments show the robustness and efficiency of the proposed algorithm and its superiority over standard algorithms such as two-mode K-means, two-mode fuzzy clustering, and block classification EM.
Online clustering algorithms for radar emitter classification.
Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max
2005-08-01
Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.
CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms.
Kohlhoff, Kai J; Sosnick, Marc H; Hsu, William T; Pande, Vijay S; Altman, Russ B
2011-08-15
Data clustering techniques are an essential component of a good data analysis toolbox. Many current bioinformatics applications are inherently compute-intense and work with very large datasets. Sequential algorithms are inadequate for providing the necessary performance. For this reason, we have created Clustering Algorithms for Massively Parallel Architectures, Including GPU Nodes (CAMPAIGN), a central resource for data clustering algorithms and tools that are implemented specifically for execution on massively parallel processing architectures. CAMPAIGN is a library of data clustering algorithms and tools, written in 'C for CUDA' for Nvidia GPUs. The library provides up to two orders of magnitude speed-up over respective CPU-based clustering algorithms and is intended as an open-source resource. New modules from the community will be accepted into the library and the layout of it is such that it can easily be extended to promising future platforms such as OpenCL. Releases of the CAMPAIGN library are freely available for download under the LGPL from https://simtk.org/home/campaign. Source code can also be obtained through anonymous subversion access as described on https://simtk.org/scm/?group_id=453. kjk33@cantab.net.
Research on the precise positioning of customers in large data environment
NASA Astrophysics Data System (ADS)
Zhou, Xu; He, Lili
2018-04-01
Customer positioning has always been a problem that enterprises focus on. In this paper, FCM clustering algorithm is used to cluster customer groups. However, due to the traditional FCM clustering algorithm, which is susceptible to the influence of the initial clustering center and easy to fall into the local optimal problem, the short board of FCM is solved by the gray optimization algorithm (GWO) to achieve efficient and accurate handling of a large number of retailer data.
An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China.
Zou, Hui; Zou, Zhihong; Wang, Xiaojing
2015-11-12
The increase and the complexity of data caused by the uncertain environment is today's reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006-2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality.
An effective approach for iris recognition using phase-based image matching.
Miyazawa, Kazuyuki; Ito, Koichi; Aoki, Takafumi; Kobayashi, Koji; Nakajima, Hiroshi
2008-10-01
This paper presents an efficient algorithm for iris recognition using phase-based image matching--an image matching technique using phase components in 2D Discrete Fourier Transforms (DFTs) of given images. Experimental evaluation using CASIA iris image databases (versions 1.0 and 2.0) and Iris Challenge Evaluation (ICE) 2005 database clearly demonstrates that the use of phase components of iris images makes possible to achieve highly accurate iris recognition with a simple matching algorithm. This paper also discusses major implementation issues of our algorithm. In order to reduce the size of iris data and to prevent the visibility of iris images, we introduce the idea of 2D Fourier Phase Code (FPC) for representing iris information. The 2D FPC is particularly useful for implementing compact iris recognition devices using state-of-the-art Digital Signal Processing (DSP) technology.
NASA Astrophysics Data System (ADS)
Obozov, A. A.; Serpik, I. N.; Mihalchenko, G. S.; Fedyaeva, G. A.
2017-01-01
In the article, the problem of application of the pattern recognition (a relatively young area of engineering cybernetics) for analysis of complicated technical systems is examined. It is shown that the application of a statistical approach for hard distinguishable situations could be the most effective. The different recognition algorithms are based on Bayes approach, which estimates posteriori probabilities of a certain event and an assumed error. Application of the statistical approach to pattern recognition is possible for solving the problem of technical diagnosis complicated systems and particularly big powered marine diesel engines.
Iris recognition based on key image feature extraction.
Ren, X; Tian, Q; Zhang, J; Wu, S; Zeng, Y
2008-01-01
In iris recognition, feature extraction can be influenced by factors such as illumination and contrast, and thus the features extracted may be unreliable, which can cause a high rate of false results in iris pattern recognition. In order to obtain stable features, an algorithm was proposed in this paper to extract key features of a pattern from multiple images. The proposed algorithm built an iris feature template by extracting key features and performed iris identity enrolment. Simulation results showed that the selected key features have high recognition accuracy on the CASIA Iris Set, where both contrast and illumination variance exist.
Cooperative network clustering and task allocation for heterogeneous small satellite network
NASA Astrophysics Data System (ADS)
Qin, Jing
The research of small satellite has emerged as a hot topic in recent years because of its economical prospects and convenience in launching and design. Due to the size and energy constraints of small satellites, forming a small satellite network(SSN) in which all the satellites cooperate with each other to finish tasks is an efficient and effective way to utilize them. In this dissertation, I designed and evaluated a weight based dominating set clustering algorithm, which efficiently organizes the satellites into stable clusters. The traditional clustering algorithms of large monolithic satellite networks, such as formation flying and satellite swarm, are often limited on automatic formation of clusters. Therefore, a novel Distributed Weight based Dominating Set(DWDS) clustering algorithm is designed to address the clustering problems in the stochastically deployed SSNs. Considering the unique features of small satellites, this algorithm is able to form the clusters efficiently and stably. In this algorithm, satellites are separated into different groups according to their spatial characteristics. A minimum dominating set is chosen as the candidate cluster head set based on their weights, which is a weighted combination of residual energy and connection degree. Then the cluster heads admit new neighbors that accept their invitations into the cluster, until the maximum cluster size is reached. Evaluated by the simulation results, in a SSN with 200 to 800 nodes, the algorithm is able to efficiently cluster more than 90% of nodes in 3 seconds. The Deadline Based Resource Balancing (DBRB) task allocation algorithm is designed for efficient task allocations in heterogeneous LEO small satellite networks. In the task allocation process, the dispatcher needs to consider the deadlines of the tasks as well as the residue energy of different resources for best energy utilization. We assume the tasks adopt a Map-Reduce framework, in which a task can consist of multiple subtasks. The DBRB algorithm is deployed on the head node of a cluster. It gathers the status from each cluster member and calculates their Node Importance Factors (NIFs) from the carried resources, residue power and compute capacity. The algorithm calculates the number of concurrent subtasks based on the deadlines, and allocates the subtasks to the nodes according to their NIF values. The simulation results show that when cluster members carry multiple resources, resource are more balanced and rare resources serve longer in DBRB than in the Earliest Deadline First algorithm. We also show that the algorithm performs well in service isolation by serving multiple tasks with different deadlines. Moreover, the average task response time with various cluster size settings is well controlled within deadlines as well. Except non-realtime tasks, small satellites may execute realtime tasks as well. The location-dependent tasks, such as image capturing, data transmission and remote sensing tasks are realtime tasks that are required to be started / finished on specific time. The resource energy balancing algorithm for realtime and non-realtime mixed workload is developed to efficiently schedule the tasks for best system performance. It calculates the residue energy for each resource type and tries to preserve resources and node availability when distributing tasks. Non-realtime tasks can be preempted by realtime tasks to provide better QoS to realtime tasks. I compared the performance of proposed algorithm with a random-priority scheduling algorithm, with only realtime tasks, non-realtime tasks and mixed tasks. It shows the resource energy reservation algorithm outperforms the latter one with both balanced and imbalanced workloads. Although the resource energy balancing task allocation algorithm for mixed workload provides preemption mechanism for realtime tasks, realtime tasks can still fail due to resource exhaustion. For LEO small satellite flies around the earth on stable orbits, the location-dependent realtime tasks can be considered as periodical tasks. Therefore, it is possible to reserve energy for these realtime tasks. The resource energy reservation algorithm preserves energy for the realtime tasks when the execution routine of periodical realtime tasks is known. In order to reserve energy for tasks starting very early in each period that the node does not have enough energy charged, an energy wrapping mechanism is also designed to calculate the residue energy from the previous period. The simulation results show that without energy reservation, realtime task failure rate can reach more than 60% when the workload is highly imbalanced. In contrast, the resource energy reservation produces zero RT task failures and leads to equal or better aggregate system throughput than the non-reservation algorithm. The proposed algorithm also preserves more energy because it avoids task preemption. (Abstract shortened by ProQuest.).
Automatic speech recognition research at NASA-Ames Research Center
NASA Technical Reports Server (NTRS)
Coler, Clayton R.; Plummer, Robert P.; Huff, Edward M.; Hitchcock, Myron H.
1977-01-01
A trainable acoustic pattern recognizer manufactured by Scope Electronics is presented. The voice command system VCS encodes speech by sampling 16 bandpass filters with center frequencies in the range from 200 to 5000 Hz. Variations in speaking rate are compensated for by a compression algorithm that subdivides each utterance into eight subintervals in such a way that the amount of spectral change within each subinterval is the same. The recorded filter values within each subinterval are then reduced to a 15-bit representation, giving a 120-bit encoding for each utterance. The VCS incorporates a simple recognition algorithm that utilizes five training samples of each word in a vocabulary of up to 24 words. The recognition rate of approximately 85 percent correct for untrained speakers and 94 percent correct for trained speakers was not considered adequate for flight systems use. Therefore, the built-in recognition algorithm was disabled, and the VCS was modified to transmit 120-bit encodings to an external computer for recognition.
Mandarin Chinese Tone Identification in Cochlear Implants: Predictions from Acoustic Models
Morton, Kenneth D.; Torrione, Peter A.; Throckmorton, Chandra S.; Collins, Leslie M.
2015-01-01
It has been established that current cochlear implants do not supply adequate spectral information for perception of tonal languages. Comprehension of a tonal language, such as Mandarin Chinese, requires recognition of lexical tones. New strategies of cochlear stimulation such as variable stimulation rate and current steering may provide the means of delivering more spectral information and thus may provide the auditory fine structure required for tone recognition. Several cochlear implant signal processing strategies are examined in this study, the continuous interleaved sampling (CIS) algorithm, the frequency amplitude modulation encoding (FAME) algorithm, and the multiple carrier frequency algorithm (MCFA). These strategies provide different types and amounts of spectral information. Pattern recognition techniques can be applied to data from Mandarin Chinese tone recognition tasks using acoustic models as a means of testing the abilities of these algorithms to transmit the changes in fundamental frequency indicative of the four lexical tones. The ability of processed Mandarin Chinese tones to be correctly classified may predict trends in the effectiveness of different signal processing algorithms in cochlear implants. The proposed techniques can predict trends in performance of the signal processing techniques in quiet conditions but fail to do so in noise. PMID:18706497
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters. PMID:27391786
Luo, Ze; Baoping, Yan; Takekawa, John Y.; Prosser, Diann J.
2012-01-01
We propose a new method to help ornithologists and ecologists discover shared segments on the migratory pathway of the bar-headed geese by time-based plane-sweeping trajectory clustering. We present a density-based time parameterized line segment clustering algorithm, which extends traditional comparable clustering algorithms from temporal and spatial dimensions. We present a time-based plane-sweeping trajectory clustering algorithm to reveal the dynamic evolution of spatial-temporal object clusters and discover common motion patterns of bar-headed geese in the process of migration. Experiments are performed on GPS-based satellite telemetry data from bar-headed geese and results demonstrate our algorithms can correctly discover shared segments of the bar-headed geese migratory pathway. We also present findings on the migratory behavior of bar-headed geese determined from this new analytical approach.
Computational gene expression profiling under salt stress reveals patterns of co-expression
Sanchita; Sharma, Ashok
2016-01-01
Plants respond differently to environmental conditions. Among various abiotic stresses, salt stress is a condition where excess salt in soil causes inhibition of plant growth. To understand the response of plants to the stress conditions, identification of the responsible genes is required. Clustering is a data mining technique used to group the genes with similar expression. The genes of a cluster show similar expression and function. We applied clustering algorithms on gene expression data of Solanum tuberosum showing differential expression in Capsicum annuum under salt stress. The clusters, which were common in multiple algorithms were taken further for analysis. Principal component analysis (PCA) further validated the findings of other cluster algorithms by visualizing their clusters in three-dimensional space. Functional annotation results revealed that most of the genes were involved in stress related responses. Our findings suggest that these algorithms may be helpful in the prediction of the function of co-expressed genes. PMID:26981411
HWDA: A coherence recognition and resolution algorithm for hybrid web data aggregation
NASA Astrophysics Data System (ADS)
Guo, Shuhang; Wang, Jian; Wang, Tong
2017-09-01
Aiming at the object confliction recognition and resolution problem for hybrid distributed data stream aggregation, a distributed data stream object coherence solution technology is proposed. Firstly, the framework was defined for the object coherence conflict recognition and resolution, named HWDA. Secondly, an object coherence recognition technology was proposed based on formal language description logic and hierarchical dependency relationship between logic rules. Thirdly, a conflict traversal recognition algorithm was proposed based on the defined dependency graph. Next, the conflict resolution technology was prompted based on resolution pattern matching including the definition of the three types of conflict, conflict resolution matching pattern and arbitration resolution method. At last, the experiment use two kinds of web test data sets to validate the effect of application utilizing the conflict recognition and resolution technology of HWDA.
Online graphic symbol recognition using neural network and ARG matching
NASA Astrophysics Data System (ADS)
Yang, Bing; Li, Changhua; Xie, Weixing
2001-09-01
This paper proposes a novel method for on-line recognition of line-based graphic symbol. The input strokes are usually warped into a cursive form due to the sundry drawing style, and classifying them is very difficult. To deal with this, an ART-2 neural network is used to classify the input strokes. It has the advantages of high recognition rate, less recognition time and forming classes in a self-organized manner. The symbol recognition is achieved by an Attribute Relational Graph (ARG) matching algorithm. The ARG is very efficient for representing complex objects, but computation cost is very high. To over come this, we suggest a fast graph matching algorithm using symbol structure information. The experimental results show that the proposed method is effective for recognition of symbols with hierarchical structure.
A scalable and practical one-pass clustering algorithm for recommender system
NASA Astrophysics Data System (ADS)
Khalid, Asra; Ghazanfar, Mustansar Ali; Azam, Awais; Alahmari, Saad Ali
2015-12-01
KMeans clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple, fast, and accurate. We show empirically that the proposed algorithm outperforms K-Means in terms of recommendation and training time while maintaining a good level of accuracy.
Li, Xiaofang; Xu, Lizhong; Wang, Huibin; Song, Jie; Yang, Simon X.
2010-01-01
The traditional Low Energy Adaptive Cluster Hierarchy (LEACH) routing protocol is a clustering-based protocol. The uneven selection of cluster heads results in premature death of cluster heads and premature blind nodes inside the clusters, thus reducing the overall lifetime of the network. With a full consideration of information on energy and distance distribution of neighboring nodes inside the clusters, this paper proposes a new routing algorithm based on differential evolution (DE) to improve the LEACH routing protocol. To meet the requirements of monitoring applications in outdoor environments such as the meteorological, hydrological and wetland ecological environments, the proposed algorithm uses the simple and fast search features of DE to optimize the multi-objective selection of cluster heads and prevent blind nodes for improved energy efficiency and system stability. Simulation results show that the proposed new LEACH routing algorithm has better performance, effectively extends the working lifetime of the system, and improves the quality of the wireless sensor networks. PMID:22219670
Altered Actin Centripetal Retrograde Flow in Physically Restricted Immunological Synapses
Yu, Cheng-han; Wu, Hung-Jen; Kaizuka, Yoshihisa; Vale, Ronald D.; Groves, Jay T.
2010-01-01
Antigen recognition by T cells involves large scale spatial reorganization of numerous receptor, adhesion, and costimulatory proteins within the T cell-antigen presenting cell (APC) junction. The resulting patterns can be distinctive, and are collectively known as the immunological synapse. Dynamical assembly of cytoskeletal network is believed to play an important role in driving these assembly processes. In one experimental strategy, the APC is replaced with a synthetic supported membrane. An advantage of this configuration is that solid structures patterned onto the underlying substrate can guide immunological synapse assembly into altered patterns. Here, we use mobile anti-CD3ε on the spatial-partitioned supported bilayer to ligate and trigger T cell receptor (TCR) in live Jurkat T cells. Simultaneous tracking of both TCR clusters and GFP-actin speckles reveals their dynamic association and individual flow patterns. Actin retrograde flow directs the inward transport of TCR clusters. Flow-based particle tracking algorithms allow us to investigate the velocity distribution of actin flow field across the whole synapse, and centripetal velocity of actin flow decreases as it moves toward the center of synapse. Localized actin flow analysis reveals that, while there is no influence on actin motion from substrate patterns directly, velocity differences of actin are observed over physically trapped TCR clusters. Actin flow regains its velocity immediately after passing through confined TCR clusters. These observations are consistent with a dynamic and dissipative coupling between TCR clusters and viscoelastic actin network. PMID:20686692
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-01-01
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network’s running and the degree of candidate nodes’ effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
Ju, Chunhua; Xu, Chonghuan
2013-01-01
Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods.
Ju, Chunhua
2013-01-01
Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525
A New Method of Facial Expression Recognition Based on SPE Plus SVM
NASA Astrophysics Data System (ADS)
Ying, Zilu; Huang, Mingwei; Wang, Zhen; Wang, Zhewei
A novel method of facial expression recognition (FER) is presented, which uses stochastic proximity embedding (SPE) for data dimension reduction, and support vector machine (SVM) for expression classification. The proposed algorithm is applied to Japanese Female Facial Expression (JAFFE) database for FER, better performance is obtained compared with some traditional algorithms, such as PCA and LDA etc.. The result have further proved the effectiveness of the proposed algorithm.
Container-code recognition system based on computer vision and deep neural networks
NASA Astrophysics Data System (ADS)
Liu, Yi; Li, Tianjian; Jiang, Li; Liang, Xiaoyao
2018-04-01
Automatic container-code recognition system becomes a crucial requirement for ship transportation industry in recent years. In this paper, an automatic container-code recognition system based on computer vision and deep neural networks is proposed. The system consists of two modules, detection module and recognition module. The detection module applies both algorithms based on computer vision and neural networks, and generates a better detection result through combination to avoid the drawbacks of the two methods. The combined detection results are also collected for online training of the neural networks. The recognition module exploits both character segmentation and end-to-end recognition, and outputs the recognition result which passes the verification. When the recognition module generates false recognition, the result will be corrected and collected for online training of the end-to-end recognition sub-module. By combining several algorithms, the system is able to deal with more situations, and the online training mechanism can improve the performance of the neural networks at runtime. The proposed system is able to achieve 93% of overall recognition accuracy.
Convalescing Cluster Configuration Using a Superlative Framework
Sabitha, R.; Karthik, S.
2015-01-01
Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks. PMID:26543895
On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms
He, Li; Zheng, Hao; Wang, Lei
2017-01-01
Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions. PMID:29123546
A curvature-based weighted fuzzy c-means algorithm for point clouds de-noising
NASA Astrophysics Data System (ADS)
Cui, Xin; Li, Shipeng; Yan, Xiutian; He, Xinhua
2018-04-01
In order to remove the noise of three-dimensional scattered point cloud and smooth the data without damnify the sharp geometric feature simultaneity, a novel algorithm is proposed in this paper. The feature-preserving weight is added to fuzzy c-means algorithm which invented a curvature weighted fuzzy c-means clustering algorithm. Firstly, the large-scale outliers are removed by the statistics of r radius neighboring points. Then, the algorithm estimates the curvature of the point cloud data by using conicoid parabolic fitting method and calculates the curvature feature value. Finally, the proposed clustering algorithm is adapted to calculate the weighted cluster centers. The cluster centers are regarded as the new points. The experimental results show that this approach is efficient to different scale and intensities of noise in point cloud with a high precision, and perform a feature-preserving nature at the same time. Also it is robust enough to different noise model.
Liu, L L; Liu, M J; Ma, M
2015-09-28
The central task of this study was to mine the gene-to-medium relationship. Adequate knowledge of this relationship could potentially improve the accuracy of differentially expressed gene mining. One of the approaches to differentially expressed gene mining uses conventional clustering algorithms to identify the gene-to-medium relationship. Compared to conventional clustering algorithms, self-organization maps (SOMs) identify the nonlinear aspects of the gene-to-medium relationships by mapping the input space into another higher dimensional feature space. However, SOMs are not suitable for huge datasets consisting of millions of samples. Therefore, a new computational model, the Function Clustering Self-Organization Maps (FCSOMs), was developed. FCSOMs take advantage of the theory of granular computing as well as advanced statistical learning methodologies, and are built specifically for each information granule (a function cluster of genes), which are intelligently partitioned by the clustering algorithm provided by the DAVID_6.7 software platform. However, only the gene functions, and not their expression values, are considered in the fuzzy clustering algorithm of DAVID. Compared to the clustering algorithm of DAVID, these experimental results show a marked improvement in the accuracy of classification with the application of FCSOMs. FCSOMs can handle huge datasets and their complex classification problems, as each FCSOM (modeled for each function cluster) can be easily parallelized.
An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems
Dawson, Kevin J.; Belkhir, Khalid
2009-01-01
Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306
A Self-Organizing Spatial Clustering Approach to Support Large-Scale Network RTK Systems.
Shen, Lili; Guo, Jiming; Wang, Lei
2018-06-06
The network real-time kinematic (RTK) technique can provide centimeter-level real time positioning solutions and play a key role in geo-spatial infrastructure. With ever-increasing popularity, network RTK systems will face issues in the support of large numbers of concurrent users. In the past, high-precision positioning services were oriented towards professionals and only supported a few concurrent users. Currently, precise positioning provides a spatial foundation for artificial intelligence (AI), and countless smart devices (autonomous cars, unmanned aerial-vehicles (UAVs), robotic equipment, etc.) require precise positioning services. Therefore, the development of approaches to support large-scale network RTK systems is urgent. In this study, we proposed a self-organizing spatial clustering (SOSC) approach which automatically clusters online users to reduce the computational load on the network RTK system server side. The experimental results indicate that both the SOSC algorithm and the grid algorithm can reduce the computational load efficiently, while the SOSC algorithm gives a more elastic and adaptive clustering solution with different datasets. The SOSC algorithm determines the cluster number and the mean distance to cluster center (MDTCC) according to the data set, while the grid approaches are all predefined. The side-effects of clustering algorithms on the user side are analyzed with real global navigation satellite system (GNSS) data sets. The experimental results indicate that 10 km can be safely used as the cluster radius threshold for the SOSC algorithm without significantly reducing the positioning precision and reliability on the user side.
NASA Astrophysics Data System (ADS)
Zhang, Tianzhen; Wang, Xiumei; Gao, Xinbo
2018-04-01
Nowadays, several datasets are demonstrated by multi-view, which usually include shared and complementary information. Multi-view clustering methods integrate the information of multi-view to obtain better clustering results. Nonnegative matrix factorization has become an essential and popular tool in clustering methods because of its interpretation. However, existing nonnegative matrix factorization based multi-view clustering algorithms do not consider the disagreement between views and neglects the fact that different views will have different contributions to the data distribution. In this paper, we propose a new multi-view clustering method, named adaptive multi-view clustering based on nonnegative matrix factorization and pairwise co-regularization. The proposed algorithm can obtain the parts-based representation of multi-view data by nonnegative matrix factorization. Then, pairwise co-regularization is used to measure the disagreement between views. There is only one parameter to auto learning the weight values according to the contribution of each view to data distribution. Experimental results show that the proposed algorithm outperforms several state-of-the-arts algorithms for multi-view clustering.
The applicability and effectiveness of cluster analysis
NASA Technical Reports Server (NTRS)
Ingram, D. S.; Actkinson, A. L.
1973-01-01
An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.
Mamidi, Ashalatha Sreshty; Surolia, Avadhesha
2015-01-01
Structural information over the entire course of binding interactions based on the analyses of energy landscapes is described, which provides a framework to understand the events involved during biomolecular recognition. Conformational dynamics of malectin's exquisite selectivity for diglucosylated N-glycan (Dig-N-glycan), a highly flexible oligosaccharide comprising of numerous dihedral torsion angles, are described as an example. For this purpose, a novel approach based on hierarchical sampling for acquiring metastable molecular conformations constituting low-energy minima for understanding the structural features involved in a biologic recognition is proposed. For this purpose, four variants of principal component analysis were employed recursively in both Cartesian space and dihedral angles space that are characterized by free energy landscapes to select the most stable conformational substates. Subsequently, k-means clustering algorithm was implemented for geometric separation of the major native state to acquire a final ensemble of metastable conformers. A comparison of malectin complexes was then performed to characterize their conformational properties. Analyses of stereochemical metrics and other concerted binding events revealed surface complementarity, cooperative and bidentate hydrogen bonds, water-mediated hydrogen bonds, carbohydrate-aromatic interactions including CH-π and stacking interactions involved in this recognition. Additionally, a striking structural transition from loop to β-strands in malectin CRD upon specific binding to Dig-N-glycan is observed. The interplay of the above-mentioned binding events in malectin and Dig-N-glycan supports an extended conformational selection model as the underlying binding mechanism.
Hsu, Arthur L; Tang, Sen-Lin; Halgamuge, Saman K
2003-11-01
Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). JAVA software of dynamic SOM tree algorithm is available upon request for academic use. A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf
Internal Cluster Validation on Earthquake Data in the Province of Bengkulu
NASA Astrophysics Data System (ADS)
Rini, D. S.; Novianti, P.; Fransiska, H.
2018-04-01
K-means method is an algorithm for cluster n object based on attribute to k partition, where k < n. There is a deficiency of algorithms that is before the algorithm is executed, k points are initialized randomly so that the resulting data clustering can be different. If the random value for initialization is not good, the clustering becomes less optimum. Cluster validation is a technique to determine the optimum cluster without knowing prior information from data. There are two types of cluster validation, which are internal cluster validation and external cluster validation. This study aims to examine and apply some internal cluster validation, including the Calinski-Harabasz (CH) Index, Sillhouette (S) Index, Davies-Bouldin (DB) Index, Dunn Index (D), and S-Dbw Index on earthquake data in the Bengkulu Province. The calculation result of optimum cluster based on internal cluster validation is CH index, S index, and S-Dbw index yield k = 2, DB Index with k = 6 and Index D with k = 15. Optimum cluster (k = 6) based on DB Index gives good results for clustering earthquake in the Bengkulu Province.
Membership-degree preserving discriminant analysis with applications to face recognition.
Yang, Zhangjing; Liu, Chuancai; Huang, Pu; Qian, Jianjun
2013-01-01
In pattern recognition, feature extraction techniques have been widely employed to reduce the dimensionality of high-dimensional data. In this paper, we propose a novel feature extraction algorithm called membership-degree preserving discriminant analysis (MPDA) based on the fisher criterion and fuzzy set theory for face recognition. In the proposed algorithm, the membership degree of each sample to particular classes is firstly calculated by the fuzzy k-nearest neighbor (FKNN) algorithm to characterize the similarity between each sample and class centers, and then the membership degree is incorporated into the definition of the between-class scatter and the within-class scatter. The feature extraction criterion via maximizing the ratio of the between-class scatter to the within-class scatter is applied. Experimental results on the ORL, Yale, and FERET face databases demonstrate the effectiveness of the proposed algorithm.
The program complex for vocal recognition
NASA Astrophysics Data System (ADS)
Konev, Anton; Kostyuchenko, Evgeny; Yakimuk, Alexey
2017-01-01
This article discusses the possibility of applying the algorithm of determining the pitch frequency for the note recognition problems. Preliminary study of programs-analogues were carried out for programs with function “recognition of the music”. The software package based on the algorithm for pitch frequency calculation was implemented and tested. It was shown that the algorithm allows recognizing the notes in the vocal performance of the user. A single musical instrument, a set of musical instruments, and a human voice humming a tune can be the sound source. The input file is initially presented in the .wav format or is recorded in this format from a microphone. Processing is performed by sequentially determining the pitch frequency and conversion of its values to the note. According to test results, modification of algorithms used in the complex was planned.
NASA Technical Reports Server (NTRS)
Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William
2006-01-01
We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.
NASA Astrophysics Data System (ADS)
Rahman, Md. Habibur; Matin, M. A.; Salma, Umma
2017-12-01
The precipitation patterns of seventeen locations in Bangladesh from 1961 to 2014 were studied using a cluster analysis and metric multidimensional scaling. In doing so, the current research applies four major hierarchical clustering methods to precipitation in conjunction with different dissimilarity measures and metric multidimensional scaling. A variety of clustering algorithms were used to provide multiple clustering dendrograms for a mixture of distance measures. The dendrogram of pre-monsoon rainfall for the seventeen locations formed five clusters. The pre-monsoon precipitation data for the areas of Srimangal and Sylhet were located in two clusters across the combination of five dissimilarity measures and four hierarchical clustering algorithms. The single linkage algorithm with Euclidian and Manhattan distances, the average linkage algorithm with the Minkowski distance, and Ward's linkage algorithm provided similar results with regard to monsoon precipitation. The results of the post-monsoon and winter precipitation data are shown in different types of dendrograms with disparate combinations of sub-clusters. The schematic geometrical representations of the precipitation data using metric multidimensional scaling showed that the post-monsoon rainfall of Cox's Bazar was located far from those of the other locations. The results of a box-and-whisker plot, different clustering techniques, and metric multidimensional scaling indicated that the precipitation behaviour of Srimangal and Sylhet during the pre-monsoon season, Cox's Bazar and Sylhet during the monsoon season, Maijdi Court and Cox's Bazar during the post-monsoon season, and Cox's Bazar and Khulna during the winter differed from those at other locations in Bangladesh.
An adaptive clustering algorithm for image matching based on corner feature
NASA Astrophysics Data System (ADS)
Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song
2018-04-01
The traditional image matching algorithm always can not balance the real-time and accuracy better, to solve the problem, an adaptive clustering algorithm for image matching based on corner feature is proposed in this paper. The method is based on the similarity of the matching pairs of vector pairs, and the adaptive clustering is performed on the matching point pairs. Harris corner detection is carried out first, the feature points of the reference image and the perceived image are extracted, and the feature points of the two images are first matched by Normalized Cross Correlation (NCC) function. Then, using the improved algorithm proposed in this paper, the matching results are clustered to reduce the ineffective operation and improve the matching speed and robustness. Finally, the Random Sample Consensus (RANSAC) algorithm is used to match the matching points after clustering. The experimental results show that the proposed algorithm can effectively eliminate the most wrong matching points while the correct matching points are retained, and improve the accuracy of RANSAC matching, reduce the computation load of whole matching process at the same time.
Service-Aware Clustering: An Energy-Efficient Model for the Internet-of-Things
Bagula, Antoine; Abidoye, Ademola Philip; Zodi, Guy-Alain Lusilao
2015-01-01
Current generation wireless sensor routing algorithms and protocols have been designed based on a myopic routing approach, where the motes are assumed to have the same sensing and communication capabilities. Myopic routing is not a natural fit for the IoT, as it may lead to energy imbalance and subsequent short-lived sensor networks, routing the sensor readings over the most service-intensive sensor nodes, while leaving the least active nodes idle. This paper revisits the issue of energy efficiency in sensor networks to propose a clustering model where sensor devices’ service delivery is mapped into an energy awareness model, used to design a clustering algorithm that finds service-aware clustering (SAC) configurations in IoT settings. The performance evaluation reveals the relative energy efficiency of the proposed SAC algorithm compared to related routing algorithms in terms of energy consumption, the sensor nodes’ life span and its traffic engineering efficiency in terms of throughput and delay. These include the well-known low energy adaptive clustering hierarchy (LEACH) and LEACH-centralized (LEACH-C) algorithms, as well as the most recent algorithms, such as DECSA and MOCRN. PMID:26703619
Service-Aware Clustering: An Energy-Efficient Model for the Internet-of-Things.
Bagula, Antoine; Abidoye, Ademola Philip; Zodi, Guy-Alain Lusilao
2015-12-23
Current generation wireless sensor routing algorithms and protocols have been designed based on a myopic routing approach, where the motes are assumed to have the same sensing and communication capabilities. Myopic routing is not a natural fit for the IoT, as it may lead to energy imbalance and subsequent short-lived sensor networks, routing the sensor readings over the most service-intensive sensor nodes, while leaving the least active nodes idle. This paper revisits the issue of energy efficiency in sensor networks to propose a clustering model where sensor devices' service delivery is mapped into an energy awareness model, used to design a clustering algorithm that finds service-aware clustering (SAC) configurations in IoT settings. The performance evaluation reveals the relative energy efficiency of the proposed SAC algorithm compared to related routing algorithms in terms of energy consumption, the sensor nodes' life span and its traffic engineering efficiency in terms of throughput and delay. These include the well-known low energy adaptive clustering hierarchy (LEACH) and LEACH-centralized (LEACH-C) algorithms, as well as the most recent algorithms, such as DECSA and MOCRN.
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth
2015-01-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
Convex clustering: an attractive alternative to hierarchical clustering.
Chen, Gary K; Chi, Eric C; Ranola, John Michael O; Lange, Kenneth
2015-05-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/.
Research on gait-based human identification
NASA Astrophysics Data System (ADS)
Li, Youguo
Gait recognition refers to automatic identification of individual based on his/her style of walking. This paper proposes a gait recognition method based on Continuous Hidden Markov Model with Mixture of Gaussians(G-CHMM). First, we initialize a Gaussian mix model for training image sequence with K-means algorithm, then train the HMM parameters using a Baum-Welch algorithm. These gait feature sequences can be trained and obtain a Continuous HMM for every person, therefore, the 7 key frames and the obtained HMM can represent each person's gait sequence. Finally, the recognition is achieved by Front algorithm. The experiments made on CASIA gait databases obtain comparatively high correction identification ratio and comparatively strong robustness for variety of bodily angle.
Artificial intelligence tools for pattern recognition
NASA Astrophysics Data System (ADS)
Acevedo, Elena; Acevedo, Antonio; Felipe, Federico; Avilés, Pedro
2017-06-01
In this work, we present a system for pattern recognition that combines the power of genetic algorithms for solving problems and the efficiency of the morphological associative memories. We use a set of 48 tire prints divided into 8 brands of tires. The images have dimensions of 200 x 200 pixels. We applied Hough transform to obtain lines as main features. The number of lines obtained is 449. The genetic algorithm reduces the number of features to ten suitable lines that give thus the 100% of recognition. Morphological associative memories were used as evaluation function. The selection algorithms were Tournament and Roulette wheel. For reproduction, we applied one-point, two-point and uniform crossover.
Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal
2015-08-13
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach.
Balouchestani, Mohammadreza; Krishnan, Sridhar
2014-01-01
Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.
Exercise recognition for Kinect-based telerehabilitation.
Antón, D; Goñi, A; Illarramendi, A
2015-01-01
An aging population and people's higher survival to diseases and traumas that leave physical consequences are challenging aspects in the context of an efficient health management. This is why telerehabilitation systems are being developed, to allow monitoring and support of physiotherapy sessions at home, which could reduce healthcare costs while also improving the quality of life of the users. Our goal is the development of a Kinect-based algorithm that provides a very accurate real-time monitoring of physical rehabilitation exercises and that also provides a friendly interface oriented both to users and physiotherapists. The two main constituents of our algorithm are the posture classification method and the exercises recognition method. The exercises consist of series of movements. Each movement is composed of an initial posture, a final posture and the angular trajectories of the limbs involved in the movement. The algorithm was designed and tested with datasets of real movements performed by volunteers. We also explain in the paper how we obtained the optimal values for the trade-off values for posture and trajectory recognition. Two relevant aspects of the algorithm were evaluated in our tests, classification accuracy and real-time data processing. We achieved 91.9% accuracy in posture classification and 93.75% accuracy in trajectory recognition. We also checked whether the algorithm was able to process the data in real-time. We found that our algorithm could process more than 20,000 postures per second and all the required trajectory data-series in real-time, which in practice guarantees no perceptible delays. Later on, we carried out two clinical trials with real patients that suffered shoulder disorders. We obtained an exercise monitoring accuracy of 95.16%. We present an exercise recognition algorithm that handles the data provided by Kinect efficiently. The algorithm has been validated in a real scenario where we have verified its suitability. Moreover, we have received a positive feedback from both users and the physiotherapists who took part in the tests.
Kim, Hyoungrae; Jang, Cheongyun; Yadav, Dharmendra K; Kim, Mi-Hyun
2017-03-23
The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, however model building methods relies on explicit number of common conformers. In this work, we have attempted to make clustering algorithms, which could find reasonable number of representative conformer ensembles automatically with asymmetric dissimilarity matrix generated from openeye tool kit. RMSD was the important descriptor (variable) of each column of the N × N matrix considered as N variables describing the relationship (network) between the conformer (in a row) and the other N conformers. This approach used to evaluate the performance of the well-known clustering algorithms by comparison in terms of generating representative conformer ensembles and test them over different matrix transformation functions considering the stability. In the network, the representative conformer group could be resampled for four kinds of algorithms with implicit parameters. The directed dissimilarity matrix becomes the only input to the clustering algorithms. Dunn index, Davies-Bouldin index, Eta-squared values and omega-squared values were used to evaluate the clustering algorithms with respect to the compactness and the explanatory power. The evaluation includes the reduction (abstraction) rate of the data, correlation between the sizes of the population and the samples, the computational complexity and the memory usage as well. Every algorithm could find representative conformers automatically without any user intervention, and they reduced the data to 14-19% of the original values within 1.13 s per sample at the most. The clustering methods are simple and practical as they are fast and do not ask for any explicit parameters. RCDTC presented the maximum Dunn and omega-squared values of the four algorithms in addition to consistent reduction rate between the population size and the sample size. The performance of the clustering algorithms was consistent over different transformation functions. Moreover, the clustering method can also be applied to molecular dynamics sampling simulation results.
SU-F-T-20: Novel Catheter Lumen Recognition Algorithm for Rapid Digitization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dise, J; McDonald, D; Ashenafi, M
Purpose: Manual catheter recognition remains a time-consuming aspect of high-dose-rate brachytherapy (HDR) treatment planning. In this work, a novel catheter lumen recognition algorithm was created for accurate and rapid digitization. Methods: MatLab v8.5 was used to create the catheter recognition algorithm. Initially, the algorithm searches the patient CT dataset using an intensity based k-means filter designed to locate catheters. Once the catheters have been located, seed points are manually selected to initialize digitization of each catheter. From each seed point, the algorithm searches locally in order to automatically digitize the remaining catheter. This digitization is accomplished by finding pixels withmore » similar image curvature and divergence parameters compared to the seed pixel. Newly digitized pixels are treated as new seed positions, and hessian image analysis is used to direct the algorithm toward neighboring catheter pixels, and to make the algorithm insensitive to adjacent catheters that are unresolvable on CT, air pockets, and high Z artifacts. The algorithm was tested using 11 HDR treatment plans, including the Syed template, tandem and ovoid applicator, and multi-catheter lung brachytherapy. Digitization error was calculated by comparing manually determined catheter positions to those determined by the algorithm. Results: he digitization error was 0.23 mm ± 0.14 mm axially and 0.62 mm ± 0.13 mm longitudinally at the tip. The time of digitization, following initial seed placement was less than 1 second per catheter. The maximum total time required to digitize all tested applicators was 4 minutes (Syed template with 15 needles). Conclusion: This algorithm successfully digitizes HDR catheters for a variety of applicators with or without CT markers. The minimal axial error demonstrates the accuracy of the algorithm, and its insensitivity to image artifacts and challenging catheter positioning. Future work to automatically place initial seed positions would improve the algorithm speed.« less
A Modified MinMax k-Means Algorithm Based on PSO.
Wang, Xiaoyan; Bai, Yanping
The MinMax k -means algorithm is widely used to tackle the effect of bad initialization by minimizing the maximum intraclustering errors. Two parameters, including the exponent parameter and memory parameter, are involved in the executive process. Since different parameters have different clustering errors, it is crucial to choose appropriate parameters. In the original algorithm, a practical framework is given. Such framework extends the MinMax k -means to automatically adapt the exponent parameter to the data set. It has been believed that if the maximum exponent parameter has been set, then the programme can reach the lowest intraclustering errors. However, our experiments show that this is not always correct. In this paper, we modified the MinMax k -means algorithm by PSO to determine the proper values of parameters which can subject the algorithm to attain the lowest clustering errors. The proposed clustering method is tested on some favorite data sets in several different initial situations and is compared to the k -means algorithm and the original MinMax k -means algorithm. The experimental results indicate that our proposed algorithm can reach the lowest clustering errors automatically.
1993-06-18
the exception. In the Standardized Aquatic Microcosm and the Mixed Flask Culture (MFC) microcosms, multivariate analysis and clustering methods...rule rather than the exception. In the Standardized Aquatic Microcosm and the Mixed Flask Culture (MFC) microcosms, multivariate analysis and...experiments using two microcosm protocols. We use nonmetric clustering, a multivariate pattern recognition technique developed by Matthews and Heame (1991
NASA Astrophysics Data System (ADS)
Unglert, K.; Radić, V.; Jellinek, A. M.
2016-06-01
Variations in the spectral content of volcano seismicity related to changes in volcanic activity are commonly identified manually in spectrograms. However, long time series of monitoring data at volcano observatories require tools to facilitate automated and rapid processing. Techniques such as self-organizing maps (SOM) and principal component analysis (PCA) can help to quickly and automatically identify important patterns related to impending eruptions. For the first time, we evaluate the performance of SOM and PCA on synthetic volcano seismic spectra constructed from observations during two well-studied eruptions at Klauea Volcano, Hawai'i, that include features observed in many volcanic settings. In particular, our objective is to test which of the techniques can best retrieve a set of three spectral patterns that we used to compose a synthetic spectrogram. We find that, without a priori knowledge of the given set of patterns, neither SOM nor PCA can directly recover the spectra. We thus test hierarchical clustering, a commonly used method, to investigate whether clustering in the space of the principal components and on the SOM, respectively, can retrieve the known patterns. Our clustering method applied to the SOM fails to detect the correct number and shape of the known input spectra. In contrast, clustering of the data reconstructed by the first three PCA modes reproduces these patterns and their occurrence in time more consistently. This result suggests that PCA in combination with hierarchical clustering is a powerful practical tool for automated identification of characteristic patterns in volcano seismic spectra. Our results indicate that, in contrast to PCA, common clustering algorithms may not be ideal to group patterns on the SOM and that it is crucial to evaluate the performance of these tools on a control dataset prior to their application to real data.
NASA Astrophysics Data System (ADS)
Harit, Aditya; Joshi, J. C., Col; Gupta, K. K.
2018-03-01
The paper proposed an automatic facial emotion recognition algorithm which comprises of two main components: feature extraction and expression recognition. The algorithm uses a Gabor filter bank on fiducial points to find the facial expression features. The resulting magnitudes of Gabor transforms, along with 14 chosen FAPs (Facial Animation Parameters), compose the feature space. There are two stages: the training phase and the recognition phase. Firstly, for the present 6 different emotions, the system classifies all training expressions in 6 different classes (one for each emotion) in the training stage. In the recognition phase, it recognizes the emotion by applying the Gabor bank to a face image, then finds the fiducial points, and then feeds it to the trained neural architecture.
Face recognition algorithm based on Gabor wavelet and locality preserving projections
NASA Astrophysics Data System (ADS)
Liu, Xiaojie; Shen, Lin; Fan, Honghui
2017-07-01
In order to solve the effects of illumination changes and differences of personal features on the face recognition rate, this paper presents a new face recognition algorithm based on Gabor wavelet and Locality Preserving Projections (LPP). The problem of the Gabor filter banks with high dimensions was solved effectively, and also the shortcoming of the LPP on the light illumination changes was overcome. Firstly, the features of global image information were achieved, which used the good spatial locality and orientation selectivity of Gabor wavelet filters. Then the dimensions were reduced by utilizing the LPP, which well-preserved the local information of the image. The experimental results shown that this algorithm can effectively extract the features relating to facial expressions, attitude and other information. Besides, it can reduce influence of the illumination changes and the differences in personal features effectively, which improves the face recognition rate to 99.2%.
Analysis of objects in binary images. M.S. Thesis - Old Dominion Univ.
NASA Technical Reports Server (NTRS)
Leonard, Desiree M.
1991-01-01
Digital image processing techniques are typically used to produce improved digital images through the application of successive enhancement techniques to a given image or to generate quantitative data about the objects within that image. In support of and to assist researchers in a wide range of disciplines, e.g., interferometry, heavy rain effects on aerodynamics, and structure recognition research, it is often desirable to count objects in an image and compute their geometric properties. Therefore, an image analysis application package, focusing on a subset of image analysis techniques used for object recognition in binary images, was developed. This report describes the techniques and algorithms utilized in three main phases of the application and are categorized as: image segmentation, object recognition, and quantitative analysis. Appendices provide supplemental formulas for the algorithms employed as well as examples and results from the various image segmentation techniques and the object recognition algorithm implemented.
A star recognition method based on the Adaptive Ant Colony algorithm for star sensors.
Quan, Wei; Fang, Jiancheng
2010-01-01
A new star recognition method based on the Adaptive Ant Colony (AAC) algorithm has been developed to increase the star recognition speed and success rate for star sensors. This method draws circles, with the center of each one being a bright star point and the radius being a special angular distance, and uses the parallel processing ability of the AAC algorithm to calculate the angular distance of any pair of star points in the circle. The angular distance of two star points in the circle is solved as the path of the AAC algorithm, and the path optimization feature of the AAC is employed to search for the optimal (shortest) path in the circle. This optimal path is used to recognize the stellar map and enhance the recognition success rate and speed. The experimental results show that when the position error is about 50″, the identification success rate of this method is 98% while the Delaunay identification method is only 94%. The identification time of this method is up to 50 ms.
Automatic voice recognition using traditional and artificial neural network approaches
NASA Technical Reports Server (NTRS)
Botros, Nazeih M.
1989-01-01
The main objective of this research is to develop an algorithm for isolated-word recognition. This research is focused on digital signal analysis rather than linguistic analysis of speech. Features extraction is carried out by applying a Linear Predictive Coding (LPC) algorithm with order of 10. Continuous-word and speaker independent recognition will be considered in future study after accomplishing this isolated word research. To examine the similarity between the reference and the training sets, two approaches are explored. The first is implementing traditional pattern recognition techniques where a dynamic time warping algorithm is applied to align the two sets and calculate the probability of matching by measuring the Euclidean distance between the two sets. The second is implementing a backpropagation artificial neural net model with three layers as the pattern classifier. The adaptation rule implemented in this network is the generalized least mean square (LMS) rule. The first approach has been accomplished. A vocabulary of 50 words was selected and tested. The accuracy of the algorithm was found to be around 85 percent. The second approach is in progress at the present time.
A fingerprint classification algorithm based on combination of local and global information
NASA Astrophysics Data System (ADS)
Liu, Chongjin; Fu, Xiang; Bian, Junjie; Feng, Jufu
2011-12-01
Fingerprint recognition is one of the most important technologies in biometric identification and has been wildly applied in commercial and forensic areas. Fingerprint classification, as the fundamental procedure in fingerprint recognition, can sharply decrease the quantity for fingerprint matching and improve the efficiency of fingerprint recognition. Most fingerprint classification algorithms are based on the number and position of singular points. Because the singular points detecting method only considers the local information commonly, the classification algorithms are sensitive to noise. In this paper, we propose a novel fingerprint classification algorithm combining the local and global information of fingerprint. Firstly we use local information to detect singular points and measure their quality considering orientation structure and image texture in adjacent areas. Furthermore the global orientation model is adopted to measure the reliability of singular points group. Finally the local quality and global reliability is weighted to classify fingerprint. Experiments demonstrate the accuracy and effectivity of our algorithm especially for the poor quality fingerprint images.
Face recognition using total margin-based adaptive fuzzy support vector machines.
Liu, Yi-Hung; Chen, Yen-Ting
2007-01-01
This paper presents a new classifier called total margin-based adaptive fuzzy support vector machines (TAF-SVM) that deals with several problems that may occur in support vector machines (SVMs) when applied to the face recognition. The proposed TAF-SVM not only solves the overfitting problem resulted from the outlier with the approach of fuzzification of the penalty, but also corrects the skew of the optimal separating hyperplane due to the very imbalanced data sets by using different cost algorithm. In addition, by introducing the total margin algorithm to replace the conventional soft margin algorithm, a lower generalization error bound can be obtained. Those three functions are embodied into the traditional SVM so that the TAF-SVM is proposed and reformulated in both linear and nonlinear cases. By using two databases, the Chung Yuan Christian University (CYCU) multiview and the facial recognition technology (FERET) face databases, and using the kernel Fisher's discriminant analysis (KFDA) algorithm to extract discriminating face features, experimental results show that the proposed TAF-SVM is superior to SVM in terms of the face-recognition accuracy. The results also indicate that the proposed TAF-SVM can achieve smaller error variances than SVM over a number of tests such that better recognition stability can be obtained.
Pattern recognition for passive polarimetric data using nonparametric classifiers
NASA Astrophysics Data System (ADS)
Thilak, Vimal; Saini, Jatinder; Voelz, David G.; Creusere, Charles D.
2005-08-01
Passive polarization based imaging is a useful tool in computer vision and pattern recognition. A passive polarization imaging system forms a polarimetric image from the reflection of ambient light that contains useful information for computer vision tasks such as object detection (classification) and recognition. Applications of polarization based pattern recognition include material classification and automatic shape recognition. In this paper, we present two target detection algorithms for images captured by a passive polarimetric imaging system. The proposed detection algorithms are based on Bayesian decision theory. In these approaches, an object can belong to one of any given number classes and classification involves making decisions that minimize the average probability of making incorrect decisions. This minimum is achieved by assigning an object to the class that maximizes the a posteriori probability. Computing a posteriori probabilities requires estimates of class conditional probability density functions (likelihoods) and prior probabilities. A Probabilistic neural network (PNN), which is a nonparametric method that can compute Bayes optimal boundaries, and a -nearest neighbor (KNN) classifier, is used for density estimation and classification. The proposed algorithms are applied to polarimetric image data gathered in the laboratory with a liquid crystal-based system. The experimental results validate the effectiveness of the above algorithms for target detection from polarimetric data.
Learner Typologies Development Using OIndex and Data Mining Based Clustering Techniques
ERIC Educational Resources Information Center
Luan, Jing
2004-01-01
This explorative data mining project used distance based clustering algorithm to study 3 indicators, called OIndex, of student behavioral data and stabilized at a 6-cluster scenario following an exhaustive explorative study of 4, 5, and 6 cluster scenarios produced by K-Means and TwoStep algorithms. Using principles in data mining, the study…
Self-organization and clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.
Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
Reconstruction of a digital core containing clay minerals based on a clustering algorithm.
He, Yanlong; Pu, Chunsheng; Jing, Cheng; Gu, Xiaoyu; Chen, Qingdong; Liu, Hongzhi; Khan, Nasir; Dong, Qiaoling
2017-10-01
It is difficult to obtain a core sample and information for digital core reconstruction of mature sandstone reservoirs around the world, especially for an unconsolidated sandstone reservoir. Meanwhile, reconstruction and division of clay minerals play a vital role in the reconstruction of the digital cores, although the two-dimensional data-based reconstruction methods are specifically applicable as the microstructure reservoir simulation methods for the sandstone reservoir. However, reconstruction of clay minerals is still challenging from a research viewpoint for the better reconstruction of various clay minerals in the digital cores. In the present work, the content of clay minerals was considered on the basis of two-dimensional information about the reservoir. After application of the hybrid method, and compared with the model reconstructed by the process-based method, the digital core containing clay clusters without the labels of the clusters' number, size, and texture were the output. The statistics and geometry of the reconstruction model were similar to the reference model. In addition, the Hoshen-Kopelman algorithm was used to label various connected unclassified clay clusters in the initial model and then the number and size of clay clusters were recorded. At the same time, the K-means clustering algorithm was applied to divide the labeled, large connecting clusters into smaller clusters on the basis of difference in the clusters' characteristics. According to the clay minerals' characteristics, such as types, textures, and distributions, the digital core containing clay minerals was reconstructed by means of the clustering algorithm and the clay clusters' structure judgment. The distributions and textures of the clay minerals of the digital core were reasonable. The clustering algorithm improved the digital core reconstruction and provided an alternative method for the simulation of different clay minerals in the digital cores.
Validating clustering of molecular dynamics simulations using polymer models.
Phillips, Joshua L; Colvin, Michael E; Newsam, Shawn
2011-11-14
Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers.
Validating clustering of molecular dynamics simulations using polymer models
2011-01-01
Background Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. Results We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. Conclusions We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers. PMID:22082218
Multispectral iris recognition based on group selection and game theory
NASA Astrophysics Data System (ADS)
Ahmad, Foysal; Roy, Kaushik
2017-05-01
A commercially available iris recognition system uses only a narrow band of the near infrared spectrum (700-900 nm) while iris images captured in the wide range of 405 nm to 1550 nm offer potential benefits to enhance recognition performance of an iris biometric system. The novelty of this research is that a group selection algorithm based on coalition game theory is explored to select the best patch subsets. In this algorithm, patches are divided into several groups based on their maximum contribution in different groups. Shapley values are used to evaluate the contribution of patches in different groups. Results show that this group selection based iris recognition
Wavelet decomposition based principal component analysis for face recognition using MATLAB
NASA Astrophysics Data System (ADS)
Sharma, Mahesh Kumar; Sharma, Shashikant; Leeprechanon, Nopbhorn; Ranjan, Aashish
2016-03-01
For the realization of face recognition systems in the static as well as in the real time frame, algorithms such as principal component analysis, independent component analysis, linear discriminate analysis, neural networks and genetic algorithms are used for decades. This paper discusses an approach which is a wavelet decomposition based principal component analysis for face recognition. Principal component analysis is chosen over other algorithms due to its relative simplicity, efficiency, and robustness features. The term face recognition stands for identifying a person from his facial gestures and having resemblance with factor analysis in some sense, i.e. extraction of the principal component of an image. Principal component analysis is subjected to some drawbacks, mainly the poor discriminatory power and the large computational load in finding eigenvectors, in particular. These drawbacks can be greatly reduced by combining both wavelet transform decomposition for feature extraction and principal component analysis for pattern representation and classification together, by analyzing the facial gestures into space and time domain, where, frequency and time are used interchangeably. From the experimental results, it is envisaged that this face recognition method has made a significant percentage improvement in recognition rate as well as having a better computational efficiency.
NASA Astrophysics Data System (ADS)
Yu, Yongtao; Li, Jonathan; Wen, Chenglu; Guan, Haiyan; Luo, Huan; Wang, Cheng
2016-03-01
This paper presents a novel algorithm for detection and recognition of traffic signs in mobile laser scanning (MLS) data for intelligent transportation-related applications. The traffic sign detection task is accomplished based on 3-D point clouds by using bag-of-visual-phrases representations; whereas the recognition task is achieved based on 2-D images by using a Gaussian-Bernoulli deep Boltzmann machine-based hierarchical classifier. To exploit high-order feature encodings of feature regions, a deep Boltzmann machine-based feature encoder is constructed. For detecting traffic signs in 3-D point clouds, the proposed algorithm achieves an average recall, precision, quality, and F-score of 0.956, 0.946, 0.907, and 0.951, respectively, on the four selected MLS datasets. For on-image traffic sign recognition, a recognition accuracy of 97.54% is achieved by using the proposed hierarchical classifier. Comparative studies with the existing traffic sign detection and recognition methods demonstrate that our algorithm obtains promising, reliable, and high performance in both detecting traffic signs in 3-D point clouds and recognizing traffic signs on 2-D images.
Indonesian Sign Language Number Recognition using SIFT Algorithm
NASA Astrophysics Data System (ADS)
Mahfudi, Isa; Sarosa, Moechammad; Andrie Asmara, Rosa; Azrino Gustalika, M.
2018-04-01
Indonesian sign language (ISL) is generally used for deaf individuals and poor people communication in communicating. They use sign language as their primary language which consists of 2 types of action: sign and finger spelling. However, not all people understand their sign language so that this becomes a problem for them to communicate with normal people. this problem also becomes a factor they are isolated feel from the social life. It needs a solution that can help them to be able to interacting with normal people. Many research that offers a variety of methods in solving the problem of sign language recognition based on image processing. SIFT (Scale Invariant Feature Transform) algorithm is one of the methods that can be used to identify an object. SIFT is claimed very resistant to scaling, rotation, illumination and noise. Using SIFT algorithm for Indonesian sign language recognition number result rate recognition to 82% with the use of a total of 100 samples image dataset consisting 50 sample for training data and 50 sample images for testing data. Change threshold value get affect the result of the recognition. The best value threshold is 0.45 with rate recognition of 94%.
An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China
Zou, Hui; Zou, Zhihong; Wang, Xiaojing
2015-01-01
The increase and the complexity of data caused by the uncertain environment is today’s reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006–2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality. PMID:26569283
Vatsa, Mayank; Singh, Richa; Noore, Afzel
2008-08-01
This paper proposes algorithms for iris segmentation, quality enhancement, match score fusion, and indexing to improve both the accuracy and the speed of iris recognition. A curve evolution approach is proposed to effectively segment a nonideal iris image using the modified Mumford-Shah functional. Different enhancement algorithms are concurrently applied on the segmented iris image to produce multiple enhanced versions of the iris image. A support-vector-machine-based learning algorithm selects locally enhanced regions from each globally enhanced image and combines these good-quality regions to create a single high-quality iris image. Two distinct features are extracted from the high-quality iris image. The global textural feature is extracted using the 1-D log polar Gabor transform, and the local topological feature is extracted using Euler numbers. An intelligent fusion algorithm combines the textural and topological matching scores to further improve the iris recognition performance and reduce the false rejection rate, whereas an indexing algorithm enables fast and accurate iris identification. The verification and identification performance of the proposed algorithms is validated and compared with other algorithms using the CASIA Version 3, ICE 2005, and UBIRIS iris databases.
Eyler, Lauren; Hubbard, Alan; Juillard, Catherine
2016-10-01
Low and middle-income countries (LMICs) and the world's poor bear a disproportionate share of the global burden of injury. Data regarding disparities in injury are vital to inform injury prevention and trauma systems strengthening interventions targeted towards vulnerable populations, but are limited in LMICs. We aim to facilitate injury disparities research by generating a standardized methodology for assessing economic status in resource-limited country trauma registries where complex metrics such as income, expenditures, and wealth index are infeasible to assess. To address this need, we developed a cluster analysis-based algorithm for generating simple population-specific metrics of economic status using nationally representative Demographic and Health Surveys (DHS) household assets data. For a limited number of variables, g, our algorithm performs weighted k-medoids clustering of the population using all combinations of g asset variables and selects the combination of variables and number of clusters that maximize average silhouette width (ASW). In simulated datasets containing both randomly distributed variables and "true" population clusters defined by correlated categorical variables, the algorithm selected the correct variable combination and appropriate cluster numbers unless variable correlation was very weak. When used with 2011 Cameroonian DHS data, our algorithm identified twenty economic clusters with ASW 0.80, indicating well-defined population clusters. This economic model for assessing health disparities will be used in the new Cameroonian six-hospital centralized trauma registry. By describing our standardized methodology and algorithm for generating economic clustering models, we aim to facilitate measurement of health disparities in other trauma registries in resource-limited countries. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Probabilistic Open Set Recognition
NASA Astrophysics Data System (ADS)
Jain, Lalit Prithviraj
Real-world tasks in computer vision, pattern recognition and machine learning often touch upon the open set recognition problem: multi-class recognition with incomplete knowledge of the world and many unknown inputs. An obvious way to approach such problems is to develop a recognition system that thresholds probabilities to reject unknown classes. Traditional rejection techniques are not about the unknown; they are about the uncertain boundary and rejection around that boundary. Thus traditional techniques only represent the "known unknowns". However, a proper open set recognition algorithm is needed to reduce the risk from the "unknown unknowns". This dissertation examines this concept and finds existing probabilistic multi-class recognition approaches are ineffective for true open set recognition. We hypothesize the cause is due to weak adhoc assumptions combined with closed-world assumptions made by existing calibration techniques. Intuitively, if we could accurately model just the positive data for any known class without overfitting, we could reject the large set of unknown classes even under this assumption of incomplete class knowledge. For this, we formulate the problem as one of modeling positive training data by invoking statistical extreme value theory (EVT) near the decision boundary of positive data with respect to negative data. We provide a new algorithm called the PI-SVM for estimating the unnormalized posterior probability of class inclusion. This dissertation also introduces a new open set recognition model called Compact Abating Probability (CAP), where the probability of class membership decreases in value (abates) as points move from known data toward open space. We show that CAP models improve open set recognition for multiple algorithms. Leveraging the CAP formulation, we go on to describe the novel Weibull-calibrated SVM (W-SVM) algorithm, which combines the useful properties of statistical EVT for score calibration with one-class and binary support vector machines. Building from the success of statistical EVT based recognition methods such as PI-SVM and W-SVM on the open set problem, we present a new general supervised learning algorithm for multi-class classification and multi-class open set recognition called the Extreme Value Local Basis (EVLB). The design of this algorithm is motivated by the observation that extrema from known negative class distributions are the closest negative points to any positive sample during training, and thus should be used to define the parameters of a probabilistic decision model. In the EVLB, the kernel distribution for each positive training sample is estimated via an EVT distribution fit over the distances to the separating hyperplane between positive training sample and closest negative samples, with a subset of the overall positive training data retained to form a probabilistic decision boundary. Using this subset as a frame of reference, the probability of a sample at test time decreases as it moves away from the positive class. Possessing this property, the EVLB is well-suited to open set recognition problems where samples from unknown or novel classes are encountered at test. Our experimental evaluation shows that the EVLB provides a substantial improvement in scalability compared to standard radial basis function kernel machines, as well as P I-SVM and W-SVM, with improved accuracy in many cases. We evaluate our algorithm on open set variations of the standard visual learning benchmarks, as well as with an open subset of classes from Caltech 256 and ImageNet. Our experiments show that PI-SVM, WSVM and EVLB provide significant advances over the previous state-of-the-art solutions for the same tasks.
An effective fuzzy kernel clustering analysis approach for gene expression data.
Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao
2015-01-01
Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
NASA Astrophysics Data System (ADS)
Ebrahimi, A.; Pahlavani, P.; Masoumi, Z.
2017-09-01
Traffic monitoring and managing in urban intelligent transportation systems (ITS) can be carried out based on vehicular sensor networks. In a vehicular sensor network, vehicles equipped with sensors such as GPS, can act as mobile sensors for sensing the urban traffic and sending the reports to a traffic monitoring center (TMC) for traffic estimation. The energy consumption by the sensor nodes is a main problem in the wireless sensor networks (WSNs); moreover, it is the most important feature in designing these networks. Clustering the sensor nodes is considered as an effective solution to reduce the energy consumption of WSNs. Each cluster should have a Cluster Head (CH), and a number of nodes located within its supervision area. The cluster heads are responsible for gathering and aggregating the information of clusters. Then, it transmits the information to the data collection center. Hence, the use of clustering decreases the volume of transmitting information, and, consequently, reduces the energy consumption of network. In this paper, Fuzzy C-Means (FCM) and Fuzzy Subtractive algorithms are employed to cluster sensors and investigate their performance on the energy consumption of sensors. It can be seen that the FCM algorithm and Fuzzy Subtractive have been reduced energy consumption of vehicle sensors up to 90.68% and 92.18%, respectively. Comparing the performance of the algorithms implies the 1.5 percent improvement in Fuzzy Subtractive algorithm in comparison.
NASA Astrophysics Data System (ADS)
Arimbi, Mentari Dian; Bustamam, Alhadi; Lestari, Dian
2017-03-01
Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.
Detection of protein complex from protein-protein interaction network using Markov clustering
NASA Astrophysics Data System (ADS)
Ochieng, P. J.; Kusuma, W. A.; Haryanto, T.
2017-05-01
Detection of complexes, or groups of functionally related proteins, is an important challenge while analysing biological networks. However, existing algorithms to identify protein complexes are insufficient when applied to dense networks of experimentally derived interaction data. Therefore, we introduced a graph clustering method based on Markov clustering algorithm to identify protein complex within highly interconnected protein-protein interaction networks. Protein-protein interaction network was first constructed to develop geometrical network, the network was then partitioned using Markov clustering to detect protein complexes. The interest of the proposed method was illustrated by its application to Human Proteins associated to type II diabetes mellitus. Flow simulation of MCL algorithm was initially performed and topological properties of the resultant network were analysed for detection of the protein complex. The results indicated the proposed method successfully detect an overall of 34 complexes with 11 complexes consisting of overlapping modules and 20 non-overlapping modules. The major complex consisted of 102 proteins and 521 interactions with cluster modularity and density of 0.745 and 0.101 respectively. The comparison analysis revealed MCL out perform AP, MCODE and SCPS algorithms with high clustering coefficient (0.751) network density and modularity index (0.630). This demonstrated MCL was the most reliable and efficient graph clustering algorithm for detection of protein complexes from PPI networks.
Automated target recognition and tracking using an optical pattern recognition neural network
NASA Technical Reports Server (NTRS)
Chao, Tien-Hsin
1991-01-01
The on-going development of an automatic target recognition and tracking system at the Jet Propulsion Laboratory is presented. This system is an optical pattern recognition neural network (OPRNN) that is an integration of an innovative optical parallel processor and a feature extraction based neural net training algorithm. The parallel optical processor provides high speed and vast parallelism as well as full shift invariance. The neural network algorithm enables simultaneous discrimination of multiple noisy targets in spite of their scales, rotations, perspectives, and various deformations. This fully developed OPRNN system can be effectively utilized for the automated spacecraft recognition and tracking that will lead to success in the Automated Rendezvous and Capture (AR&C) of the unmanned Cargo Transfer Vehicle (CTV). One of the most powerful optical parallel processors for automatic target recognition is the multichannel correlator. With the inherent advantages of parallel processing capability and shift invariance, multiple objects can be simultaneously recognized and tracked using this multichannel correlator. This target tracking capability can be greatly enhanced by utilizing a powerful feature extraction based neural network training algorithm such as the neocognitron. The OPRNN, currently under investigation at JPL, is constructed with an optical multichannel correlator where holographic filters have been prepared using the neocognitron training algorithm. The computation speed of the neocognitron-type OPRNN is up to 10(exp 14) analog connections/sec that enabling the OPRNN to outperform its state-of-the-art electronics counterpart by at least two orders of magnitude.
Recognizing Age-Separated Face Images: Humans and Machines
Yadav, Daksha; Singh, Richa; Vatsa, Mayank; Noore, Afzel
2014-01-01
Humans utilize facial appearance, gender, expression, aging pattern, and other ancillary information to recognize individuals. It is interesting to observe how humans perceive facial age. Analyzing these properties can help in understanding the phenomenon of facial aging and incorporating the findings can help in designing effective algorithms. Such a study has two components - facial age estimation and age-separated face recognition. Age estimation involves predicting the age of an individual given his/her facial image. On the other hand, age-separated face recognition consists of recognizing an individual given his/her age-separated images. In this research, we investigate which facial cues are utilized by humans for estimating the age of people belonging to various age groups along with analyzing the effect of one's gender, age, and ethnicity on age estimation skills. We also analyze how various facial regions such as binocular and mouth regions influence age estimation and recognition capabilities. Finally, we propose an age-invariant face recognition algorithm that incorporates the knowledge learned from these observations. Key observations of our research are: (1) the age group of newborns and toddlers is easiest to estimate, (2) gender and ethnicity do not affect the judgment of age group estimation, (3) face as a global feature, is essential to achieve good performance in age-separated face recognition, and (4) the proposed algorithm yields improved recognition performance compared to existing algorithms and also outperforms a commercial system in the young image as probe scenario. PMID:25474200
Recognizing age-separated face images: humans and machines.
Yadav, Daksha; Singh, Richa; Vatsa, Mayank; Noore, Afzel
2014-01-01
Humans utilize facial appearance, gender, expression, aging pattern, and other ancillary information to recognize individuals. It is interesting to observe how humans perceive facial age. Analyzing these properties can help in understanding the phenomenon of facial aging and incorporating the findings can help in designing effective algorithms. Such a study has two components--facial age estimation and age-separated face recognition. Age estimation involves predicting the age of an individual given his/her facial image. On the other hand, age-separated face recognition consists of recognizing an individual given his/her age-separated images. In this research, we investigate which facial cues are utilized by humans for estimating the age of people belonging to various age groups along with analyzing the effect of one's gender, age, and ethnicity on age estimation skills. We also analyze how various facial regions such as binocular and mouth regions influence age estimation and recognition capabilities. Finally, we propose an age-invariant face recognition algorithm that incorporates the knowledge learned from these observations. Key observations of our research are: (1) the age group of newborns and toddlers is easiest to estimate, (2) gender and ethnicity do not affect the judgment of age group estimation, (3) face as a global feature, is essential to achieve good performance in age-separated face recognition, and (4) the proposed algorithm yields improved recognition performance compared to existing algorithms and also outperforms a commercial system in the young image as probe scenario.
Key features for ATA / ATR database design in missile systems
NASA Astrophysics Data System (ADS)
Özertem, Kemal Arda
2017-05-01
Automatic target acquisition (ATA) and automatic target recognition (ATR) are two vital tasks for missile systems, and having a robust detection and recognition algorithm is crucial for overall system performance. In order to have a robust target detection and recognition algorithm, an extensive image database is required. Automatic target recognition algorithms use the database of images in training and testing steps of algorithm. This directly affects the recognition performance, since the training accuracy is driven by the quality of the image database. In addition, the performance of an automatic target detection algorithm can be measured effectively by using an image database. There are two main ways for designing an ATA / ATR database. The first and easy way is by using a scene generator. A scene generator can model the objects by considering its material information, the atmospheric conditions, detector type and the territory. Designing image database by using a scene generator is inexpensive and it allows creating many different scenarios quickly and easily. However the major drawback of using a scene generator is its low fidelity, since the images are created virtually. The second and difficult way is designing it using real-world images. Designing image database with real-world images is a lot more costly and time consuming; however it offers high fidelity, which is critical for missile algorithms. In this paper, critical concepts in ATA / ATR database design with real-world images are discussed. Each concept is discussed in the perspective of ATA and ATR separately. For the implementation stage, some possible solutions and trade-offs for creating the database are proposed, and all proposed approaches are compared to each other with regards to their pros and cons.
A 2D range Hausdorff approach to 3D facial recognition.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koch, Mark William; Russ, Trina Denise; Little, Charles Quentin
2004-11-01
This paper presents a 3D facial recognition algorithm based on the Hausdorff distance metric. The standard 3D formulation of the Hausdorff matching algorithm has been modified to operate on a 2D range image, enabling a reduction in computation from O(N2) to O(N) without large storage requirements. The Hausdorff distance is known for its robustness to data outliers and inconsistent data between two data sets, making it a suitable choice for dealing with the inherent problems in many 3D datasets due to sensor noise and object self-occlusion. For optimal performance, the algorithm assumes a good initial alignment between probe and templatemore » datasets. However, to minimize the error between two faces, the alignment can be iteratively refined. Results from the algorithm are presented using 3D face images from the Face Recognition Grand Challenge database version 1.0.« less
Autoregressive statistical pattern recognition algorithms for damage detection in civil structures
NASA Astrophysics Data System (ADS)
Yao, Ruigen; Pakzad, Shamim N.
2012-08-01
Statistical pattern recognition has recently emerged as a promising set of complementary methods to system identification for automatic structural damage assessment. Its essence is to use well-known concepts in statistics for boundary definition of different pattern classes, such as those for damaged and undamaged structures. In this paper, several statistical pattern recognition algorithms using autoregressive models, including statistical control charts and hypothesis testing, are reviewed as potentially competitive damage detection techniques. To enhance the performance of statistical methods, new feature extraction techniques using model spectra and residual autocorrelation, together with resampling-based threshold construction methods, are proposed. Subsequently, simulated acceleration data from a multi degree-of-freedom system is generated to test and compare the efficiency of the existing and proposed algorithms. Data from laboratory experiments conducted on a truss and a large-scale bridge slab model are then used to further validate the damage detection methods and demonstrate the superior performance of proposed algorithms.
Bokov, Plamen; Mahut, Bruno; Flaud, Patrice; Delclaux, Christophe
2016-03-01
Respiratory diseases in children are a common reason for physician visits. A diagnostic difficulty arises when parents hear wheezing that is no longer present during the medical consultation. Thus, an outpatient objective tool for recognition of wheezing is of clinical value. We developed a wheezing recognition algorithm from recorded respiratory sounds with a Smartphone placed near the mouth. A total of 186 recordings were obtained in a pediatric emergency department, mostly in toddlers (mean age 20 months). After exclusion of recordings with artefacts and those with a single clinical operator auscultation, 95 recordings with the agreement of two operators on auscultation diagnosis (27 with wheezing and 68 without) were subjected to a two phase algorithm (signal analysis and pattern classifier using machine learning algorithms) to classify records. The best performance (71.4% sensitivity and 88.9% specificity) was observed with a Support Vector Machine-based algorithm. We further tested the algorithm over a set of 39 recordings having a single operator and found a fair agreement (kappa=0.28, CI95% [0.12, 0.45]) between the algorithm and the operator. The main advantage of such an algorithm is its use in contact-free sound recording, thus valuable in the pediatric population. Copyright © 2016 Elsevier Ltd. All rights reserved.
Hipp, Jason D; Cheng, Jerome Y; Toner, Mehmet; Tompkins, Ronald G; Balis, Ulysses J
2011-02-26
HISTORICALLY, EFFECTIVE CLINICAL UTILIZATION OF IMAGE ANALYSIS AND PATTERN RECOGNITION ALGORITHMS IN PATHOLOGY HAS BEEN HAMPERED BY TWO CRITICAL LIMITATIONS: 1) the availability of digital whole slide imagery data sets and 2) a relative domain knowledge deficit in terms of application of such algorithms, on the part of practicing pathologists. With the advent of the recent and rapid adoption of whole slide imaging solutions, the former limitation has been largely resolved. However, with the expectation that it is unlikely for the general cohort of contemporary pathologists to gain advanced image analysis skills in the short term, the latter problem remains, thus underscoring the need for a class of algorithm that has the concurrent properties of image domain (or organ system) independence and extreme ease of use, without the need for specialized training or expertise. In this report, we present a novel, general case pattern recognition algorithm, Spatially Invariant Vector Quantization (SIVQ), that overcomes the aforementioned knowledge deficit. Fundamentally based on conventional Vector Quantization (VQ) pattern recognition approaches, SIVQ gains its superior performance and essentially zero-training workflow model from its use of ring vectors, which exhibit continuous symmetry, as opposed to square or rectangular vectors, which do not. By use of the stochastic matching properties inherent in continuous symmetry, a single ring vector can exhibit as much as a millionfold improvement in matching possibilities, as opposed to conventional VQ vectors. SIVQ was utilized to demonstrate rapid and highly precise pattern recognition capability in a broad range of gross and microscopic use-case settings. With the performance of SIVQ observed thus far, we find evidence that indeed there exist classes of image analysis/pattern recognition algorithms suitable for deployment in settings where pathologists alone can effectively incorporate their use into clinical workflow, as a turnkey solution. We anticipate that SIVQ, and other related class-independent pattern recognition algorithms, will become part of the overall armamentarium of digital image analysis approaches that are immediately available to practicing pathologists, without the need for the immediate availability of an image analysis expert.
Aided target recognition processing of MUDSS sonar data
NASA Astrophysics Data System (ADS)
Lau, Brian; Chao, Tien-Hsin
1998-09-01
The Mobile Underwater Debris Survey System (MUDSS) is a collaborative effort by the Navy and the Jet Propulsion Lab to demonstrate multi-sensor, real-time, survey of underwater sites for ordnance and explosive waste (OEW). We describe the sonar processing algorithm, a novel target recognition algorithm incorporating wavelets, morphological image processing, expansion by Hermite polynomials, and neural networks. This algorithm has found all planted targets in MUDSS tests and has achieved spectacular success upon another Coastal Systems Station (CSS) sonar image database.
Logo image clustering based on advanced statistics
NASA Astrophysics Data System (ADS)
Wei, Yi; Kamel, Mohamed; He, Yiwei
2007-11-01
In recent years, there has been a growing interest in the research of image content description techniques. Among those, image clustering is one of the most frequently discussed topics. Similar to image recognition, image clustering is also a high-level representation technique. However it focuses on the coarse categorization rather than the accurate recognition. Based on wavelet transform (WT) and advanced statistics, the authors propose a novel approach that divides various shaped logo images into groups according to the external boundary of each logo image. Experimental results show that the presented method is accurate, fast and insensitive to defects.
NASA Astrophysics Data System (ADS)
Kumar, Rohit; Puri, Rajeev K.
2018-03-01
Employing the quantum molecular dynamics (QMD) approach for nucleus-nucleus collisions, we test the predictive power of the energy-based clusterization algorithm, i.e., the simulating annealing clusterization algorithm (SACA), to describe the experimental data of charge distribution and various event-by-event correlations among fragments. The calculations are constrained into the Fermi-energy domain and/or mildly excited nuclear matter. Our detailed study spans over different system masses, and system-mass asymmetries of colliding partners show the importance of the energy-based clusterization algorithm for understanding multifragmentation. The present calculations are also compared with the other available calculations, which use one-body models, statistical models, and/or hybrid models.
Automated detection of extended sources in radio maps: progress from the SCORPIO survey
NASA Astrophysics Data System (ADS)
Riggi, S.; Ingallinera, A.; Leto, P.; Cavallaro, F.; Bufano, F.; Schillirò, F.; Trigilio, C.; Umana, G.; Buemi, C. S.; Norris, R. P.
2016-08-01
Automated source extraction and parametrization represents a crucial challenge for the next-generation radio interferometer surveys, such as those performed with the Square Kilometre Array (SKA) and its precursors. In this paper, we present a new algorithm, called CAESAR (Compact And Extended Source Automated Recognition), to detect and parametrize extended sources in radio interferometric maps. It is based on a pre-filtering stage, allowing image denoising, compact source suppression and enhancement of diffuse emission, followed by an adaptive superpixel clustering stage for final source segmentation. A parametrization stage provides source flux information and a wide range of morphology estimators for post-processing analysis. We developed CAESAR in a modular software library, also including different methods for local background estimation and image filtering, along with alternative algorithms for both compact and diffuse source extraction. The method was applied to real radio continuum data collected at the Australian Telescope Compact Array (ATCA) within the SCORPIO project, a pathfinder of the Evolutionary Map of the Universe (EMU) survey at the Australian Square Kilometre Array Pathfinder (ASKAP). The source reconstruction capabilities were studied over different test fields in the presence of compact sources, imaging artefacts and diffuse emission from the Galactic plane and compared with existing algorithms. When compared to a human-driven analysis, the designed algorithm was found capable of detecting known target sources and regions of diffuse emission, outperforming alternative approaches over the considered fields.
Classifier dependent feature preprocessing methods
NASA Astrophysics Data System (ADS)
Rodriguez, Benjamin M., II; Peterson, Gilbert L.
2008-04-01
In mobile applications, computational complexity is an issue that limits sophisticated algorithms from being implemented on these devices. This paper provides an initial solution to applying pattern recognition systems on mobile devices by combining existing preprocessing algorithms for recognition. In pattern recognition systems, it is essential to properly apply feature preprocessing tools prior to training classification models in an attempt to reduce computational complexity and improve the overall classification accuracy. The feature preprocessing tools extended for the mobile environment are feature ranking, feature extraction, data preparation and outlier removal. Most desktop systems today are capable of processing a majority of the available classification algorithms without concern of processing while the same is not true on mobile platforms. As an application of pattern recognition for mobile devices, the recognition system targets the problem of steganalysis, determining if an image contains hidden information. The measure of performance shows that feature preprocessing increases the overall steganalysis classification accuracy by an average of 22%. The methods in this paper are tested on a workstation and a Nokia 6620 (Symbian operating system) camera phone with similar results.
Recognition of plant parts with problem-specific algorithms
NASA Astrophysics Data System (ADS)
Schwanke, Joerg; Brendel, Thorsten; Jensch, Peter F.; Megnet, Roland
1994-06-01
Automatic micropropagation is necessary to produce cost-effective high amounts of biomass. Juvenile plants are dissected in clean- room environment on particular points on the stem or the leaves. A vision-system detects possible cutting points and controls a specialized robot. This contribution is directed to the pattern- recognition algorithms to detect structural parts of the plant.
Enhanced facial texture illumination normalization for face recognition.
Luo, Yong; Guan, Ye-Peng
2015-08-01
An uncontrolled lighting condition is one of the most critical challenges for practical face recognition applications. An enhanced facial texture illumination normalization method is put forward to resolve this challenge. An adaptive relighting algorithm is developed to improve the brightness uniformity of face images. Facial texture is extracted by using an illumination estimation difference algorithm. An anisotropic histogram-stretching algorithm is proposed to minimize the intraclass distance of facial skin and maximize the dynamic range of facial texture distribution. Compared with the existing methods, the proposed method can more effectively eliminate the redundant information of facial skin and illumination. Extensive experiments show that the proposed method has superior performance in normalizing illumination variation and enhancing facial texture features for illumination-insensitive face recognition.
Optimized data fusion for K-means Laplacian clustering
Yu, Shi; Liu, Xinhai; Tranchevent, Léon-Charles; Glänzel, Wolfgang; Suykens, Johan A. K.; De Moor, Bart; Moreau, Yves
2011-01-01
Motivation: We propose a novel algorithm to combine multiple kernels and Laplacians for clustering analysis. The new algorithm is formulated on a Rayleigh quotient objective function and is solved as a bi-level alternating minimization procedure. Using the proposed algorithm, the coefficients of kernels and Laplacians can be optimized automatically. Results: Three variants of the algorithm are proposed. The performance is systematically validated on two real-life data fusion applications. The proposed Optimized Kernel Laplacian Clustering (OKLC) algorithms perform significantly better than other methods. Moreover, the coefficients of kernels and Laplacians optimized by OKLC show some correlation with the rank of performance of individual data source. Though in our evaluation the K values are predefined, in practical studies, the optimal cluster number can be consistently estimated from the eigenspectrum of the combined kernel Laplacian matrix. Availability: The MATLAB code of algorithms implemented in this paper is downloadable from http://homes.esat.kuleuven.be/~sistawww/bioi/syu/oklc.html. Contact: shiyu@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20980271
Improved fuzzy clustering algorithms in segmentation of DC-enhanced breast MRI.
Kannan, S R; Ramathilagam, S; Devi, Pandiyarajan; Sathya, A
2012-02-01
Segmentation of medical images is a difficult and challenging problem due to poor image contrast and artifacts that result in missing or diffuse organ/tissue boundaries. Many researchers have applied various techniques however fuzzy c-means (FCM) based algorithms is more effective compared to other methods. The objective of this work is to develop some robust fuzzy clustering segmentation systems for effective segmentation of DCE - breast MRI. This paper obtains the robust fuzzy clustering algorithms by incorporating kernel methods, penalty terms, tolerance of the neighborhood attraction, additional entropy term and fuzzy parameters. The initial centers are obtained using initialization algorithm to reduce the computation complexity and running time of proposed algorithms. Experimental works on breast images show that the proposed algorithms are effective to improve the similarity measurement, to handle large amount of noise, to have better results in dealing the data corrupted by noise, and other artifacts. The clustering results of proposed methods are validated using Silhouette Method.
Computer Recognition of Facial Profiles
1974-08-01
facial recognition 20. ABSTRACT (Continue on reverse side It necessary and Identify by block number) A system for the recognition of human faces from...21 2.6 Classification Algorithms ........... ... 32 III FACIAL RECOGNITION AND AUTOMATIC TRAINING . . . 37 3.1 Facial Profile Recognition...provide a fair test of the classification system. The work of Goldstein, Harmon, and Lesk [81 indicates, however, that for facial recognition , a ten class
Banerjee, Arindam; Ghosh, Joydeep
2004-05-01
Competitive learning mechanisms for clustering, in general, suffer from poor performance for very high-dimensional (>1000) data because of "curse of dimensionality" effects. In applications such as document clustering, it is customary to normalize the high-dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regular kmeans and its soft expectation-maximization-based version, spkmeans tends to generate extremely imbalanced clusters in high-dimensional spaces when the desired number of clusters is large (tens or more). This paper first shows that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency-sensitive competitive learning variants that are applicable to static data and produced high-quality and well-balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. A frequency-sensitive algorithm to cluster streaming data is also proposed. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques. Index Terms-Balanced clustering, expectation maximization (EM), frequency-sensitive competitive learning (FSCL), high-dimensional clustering, kmeans, normalized data, scalable clustering, streaming data, text clustering.
A hybrid algorithm for clustering of time series data based on affinity search technique.
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.
Zhang, Junfeng; Chen, Wei; Gao, Mingyi; Shen, Gangxiang
2017-10-30
In this work, we proposed two k-means-clustering-based algorithms to mitigate the fiber nonlinearity for 64-quadrature amplitude modulation (64-QAM) signal, the training-sequence assisted k-means algorithm and the blind k-means algorithm. We experimentally demonstrated the proposed k-means-clustering-based fiber nonlinearity mitigation techniques in 75-Gb/s 64-QAM coherent optical communication system. The proposed algorithms have reduced clustering complexity and low data redundancy and they are able to quickly find appropriate initial centroids and select correctly the centroids of the clusters to obtain the global optimal solutions for large k value. We measured the bit-error-ratio (BER) performance of 64-QAM signal with different launched powers into the 50-km single mode fiber and the proposed techniques can greatly mitigate the signal impairments caused by the amplified spontaneous emission noise and the fiber Kerr nonlinearity and improve the BER performance.
A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network.
Vimalarani, C; Subramanian, R; Sivanandam, S N
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption.
Research on Palmprint Identification Method Based on Quantum Algorithms
Zhang, Zhanzhan
2014-01-01
Quantum image recognition is a technology by using quantum algorithm to process the image information. It can obtain better effect than classical algorithm. In this paper, four different quantum algorithms are used in the three stages of palmprint recognition. First, quantum adaptive median filtering algorithm is presented in palmprint filtering processing. Quantum filtering algorithm can get a better filtering result than classical algorithm through the comparison. Next, quantum Fourier transform (QFT) is used to extract pattern features by only one operation due to quantum parallelism. The proposed algorithm exhibits an exponential speed-up compared with discrete Fourier transform in the feature extraction. Finally, quantum set operations and Grover algorithm are used in palmprint matching. According to the experimental results, quantum algorithm only needs to apply square of N operations to find out the target palmprint, but the traditional method needs N times of calculation. At the same time, the matching accuracy of quantum algorithm is almost 100%. PMID:25105165
NASA Astrophysics Data System (ADS)
Adya Zizwan, Putra; Zarlis, Muhammad; Budhiarti Nababan, Erna
2017-12-01
The determination of Centroid on K-Means Algorithm directly affects the quality of the clustering results. Determination of centroid by using random numbers has many weaknesses. The GenClust algorithm that combines the use of Genetic Algorithms and K-Means uses a genetic algorithm to determine the centroid of each cluster. The use of the GenClust algorithm uses 50% chromosomes obtained through deterministic calculations and 50% is obtained from the generation of random numbers. This study will modify the use of the GenClust algorithm in which the chromosomes used are 100% obtained through deterministic calculations. The results of this study resulted in performance comparisons expressed in Mean Square Error influenced by centroid determination on K-Means method by using GenClust method, modified GenClust method and also classic K-Means.
Hebbian self-organizing integrate-and-fire networks for data clustering.
Landis, Florian; Ott, Thomas; Stoop, Ruedi
2010-01-01
We propose a Hebbian learning-based data clustering algorithm using spiking neurons. The algorithm is capable of distinguishing between clusters and noisy background data and finds an arbitrary number of clusters of arbitrary shape. These properties render the approach particularly useful for visual scene segmentation into arbitrarily shaped homogeneous regions. We present several application examples, and in order to highlight the advantages and the weaknesses of our method, we systematically compare the results with those from standard methods such as the k-means and Ward's linkage clustering. The analysis demonstrates that not only the clustering ability of the proposed algorithm is more powerful than those of the two concurrent methods, the time complexity of the method is also more modest than that of its generally used strongest competitor.
Impact of heuristics in clustering large biological networks.
Shafin, Md Kishwar; Kabir, Kazi Lutful; Ridwan, Iffatur; Anannya, Tasmiah Tamzid; Karim, Rashid Saadman; Hoque, Mohammad Mozammel; Rahman, M Sohel
2015-12-01
Traditional clustering algorithms often exhibit poor performance for large networks. On the contrary, greedy algorithms are found to be relatively efficient while uncovering functional modules from large biological networks. The quality of the clusters produced by these greedy techniques largely depends on the underlying heuristics employed. Different heuristics based on different attributes and properties perform differently in terms of the quality of the clusters produced. This motivates us to design new heuristics for clustering large networks. In this paper, we have proposed two new heuristics and analyzed the performance thereof after incorporating those with three different combinations in a recently celebrated greedy clustering algorithm named SPICi. We have extensively analyzed the effectiveness of these new variants. The results are found to be promising. Copyright © 2015 Elsevier Ltd. All rights reserved.
Mai, Xiaofeng; Liu, Jie; Wu, Xiong; Zhang, Qun; Guo, Changjian; Yang, Yanfu; Li, Zhaohui
2017-02-06
A Stokes-space modulation format classification (MFC) technique is proposed for coherent optical receivers by using a non-iterative clustering algorithm. In the clustering algorithm, two simple parameters are calculated to help find the density peaks of the data points in Stokes space and no iteration is required. Correct MFC can be realized in numerical simulations among PM-QPSK, PM-8QAM, PM-16QAM, PM-32QAM and PM-64QAM signals within practical optical signal-to-noise ratio (OSNR) ranges. The performance of the proposed MFC algorithm is also compared with those of other schemes based on clustering algorithms. The simulation results show that good classification performance can be achieved using the proposed MFC scheme with moderate time complexity. Proof-of-concept experiments are finally implemented to demonstrate MFC among PM-QPSK/16QAM/64QAM signals, which confirm the feasibility of our proposed MFC scheme.
Optimization of wireless sensor networks based on chicken swarm optimization algorithm
NASA Astrophysics Data System (ADS)
Wang, Qingxi; Zhu, Lihua
2017-05-01
In order to reduce the energy consumption of wireless sensor network and improve the survival time of network, the clustering routing protocol of wireless sensor networks based on chicken swarm optimization algorithm was proposed. On the basis of LEACH agreement, it was improved and perfected that the points on the cluster and the selection of cluster head using the chicken group optimization algorithm, and update the location of chicken which fall into the local optimum by Levy flight, enhance population diversity, ensure the global search capability of the algorithm. The new protocol avoided the die of partial node of intensive using by making balanced use of the network nodes, improved the survival time of wireless sensor network. The simulation experiments proved that the protocol is better than LEACH protocol on energy consumption, also is better than that of clustering routing protocol based on particle swarm optimization algorithm.
Predicting the random drift of MEMS gyroscope based on K-means clustering and OLS RBF Neural Network
NASA Astrophysics Data System (ADS)
Wang, Zhen-yu; Zhang, Li-jie
2017-10-01
Measure error of the sensor can be effectively compensated with prediction. Aiming at large random drift error of MEMS(Micro Electro Mechanical System))gyroscope, an improved learning algorithm of Radial Basis Function(RBF) Neural Network(NN) based on K-means clustering and Orthogonal Least-Squares (OLS) is proposed in this paper. The algorithm selects the typical samples as the initial cluster centers of RBF NN firstly, candidates centers with K-means algorithm secondly, and optimizes the candidate centers with OLS algorithm thirdly, which makes the network structure simpler and makes the prediction performance better. Experimental results show that the proposed K-means clustering OLS learning algorithm can predict the random drift of MEMS gyroscope effectively, the prediction error of which is 9.8019e-007°/s and the prediction time of which is 2.4169e-006s
Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation
NASA Astrophysics Data System (ADS)
Sun, Hanwu; Nwe, Tin Lay; Koh, Eugene Chin Wei; Bin, Ma; Li, Haizhou
2007-09-01
This paper presents a speaker diarization system developed at the Institute for Infocomm Research (I2R) for NIST Rich Transcription 2007 (RT-07) evaluation task. We describe in details our primary approaches for the speaker diarization on the Multiple Distant Microphones (MDM) conditions in conference room scenario. Our proposed system consists of six modules: 1). Least-mean squared (NLMS) adaptive filter for the speaker direction estimate via Time Difference of Arrival (TDOA), 2). An initial speaker clustering via two-stage TDOA histogram distribution quantization approach, 3). Multiple microphone speaker data alignment via GCC-PHAT Time Delay Estimate (TDE) among all the distant microphone channel signals, 4). A speaker clustering algorithm based on GMM modeling approach, 5). Non-speech removal via speech/non-speech verification mechanism and, 6). Silence removal via "Double-Layer Windowing"(DLW) method. We achieves error rate of 31.02% on the 2006 Spring (RT-06s) MDM evaluation task and a competitive overall error rate of 15.32% for the NIST Rich Transcription 2007 (RT-07) MDM evaluation task.
Face Recognition Using Local Quantized Patterns and Gabor Filters
NASA Astrophysics Data System (ADS)
Khryashchev, V.; Priorov, A.; Stepanova, O.; Nikitin, A.
2015-05-01
The problem of face recognition in a natural or artificial environment has received a great deal of researchers' attention over the last few years. A lot of methods for accurate face recognition have been proposed. Nevertheless, these methods often fail to accurately recognize the person in difficult scenarios, e.g. low resolution, low contrast, pose variations, etc. We therefore propose an approach for accurate and robust face recognition by using local quantized patterns and Gabor filters. The estimation of the eye centers is used as a preprocessing stage. The evaluation of our algorithm on different samples from a standardized FERET database shows that our method is invariant to the general variations of lighting, expression, occlusion and aging. The proposed approach allows about 20% correct recognition accuracy increase compared with the known face recognition algorithms from the OpenCV library. The additional use of Gabor filters can significantly improve the robustness to changes in lighting conditions.
A Lightweight Hierarchical Activity Recognition Framework Using Smartphone Sensors
Han, Manhyung; Bang, Jae Hun; Nugent, Chris; McClean, Sally; Lee, Sungyoung
2014-01-01
Activity recognition for the purposes of recognizing a user's intentions using multimodal sensors is becoming a widely researched topic largely based on the prevalence of the smartphone. Previous studies have reported the difficulty in recognizing life-logs by only using a smartphone due to the challenges with activity modeling and real-time recognition. In addition, recognizing life-logs is difficult due to the absence of an established framework which enables the use of different sources of sensor data. In this paper, we propose a smartphone-based Hierarchical Activity Recognition Framework which extends the Naïve Bayes approach for the processing of activity modeling and real-time activity recognition. The proposed algorithm demonstrates higher accuracy than the Naïve Bayes approach and also enables the recognition of a user's activities within a mobile environment. The proposed algorithm has the ability to classify fifteen activities with an average classification accuracy of 92.96%. PMID:25184486
A Modified MinMax k-Means Algorithm Based on PSO
2016-01-01
The MinMax k-means algorithm is widely used to tackle the effect of bad initialization by minimizing the maximum intraclustering errors. Two parameters, including the exponent parameter and memory parameter, are involved in the executive process. Since different parameters have different clustering errors, it is crucial to choose appropriate parameters. In the original algorithm, a practical framework is given. Such framework extends the MinMax k-means to automatically adapt the exponent parameter to the data set. It has been believed that if the maximum exponent parameter has been set, then the programme can reach the lowest intraclustering errors. However, our experiments show that this is not always correct. In this paper, we modified the MinMax k-means algorithm by PSO to determine the proper values of parameters which can subject the algorithm to attain the lowest clustering errors. The proposed clustering method is tested on some favorite data sets in several different initial situations and is compared to the k-means algorithm and the original MinMax k-means algorithm. The experimental results indicate that our proposed algorithm can reach the lowest clustering errors automatically. PMID:27656201
NASA Astrophysics Data System (ADS)
Ma, Xiaoke; Wang, Bingbo; Yu, Liang
2018-01-01
Community detection is fundamental for revealing the structure-functionality relationship in complex networks, which involves two issues-the quantitative function for community as well as algorithms to discover communities. Despite significant research on either of them, few attempt has been made to establish the connection between the two issues. To attack this problem, a generalized quantification function is proposed for community in weighted networks, which provides a framework that unifies several well-known measures. Then, we prove that the trace optimization of the proposed measure is equivalent with the objective functions of algorithms such as nonnegative matrix factorization, kernel K-means as well as spectral clustering. It serves as the theoretical foundation for designing algorithms for community detection. On the second issue, a semi-supervised spectral clustering algorithm is developed by exploring the equivalence relation via combining the nonnegative matrix factorization and spectral clustering. Different from the traditional semi-supervised algorithms, the partial supervision is integrated into the objective of the spectral algorithm. Finally, through extensive experiments on both artificial and real world networks, we demonstrate that the proposed method improves the accuracy of the traditional spectral algorithms in community detection.
Approach to recognition of flexible form for credit card expiration date recognition as example
NASA Astrophysics Data System (ADS)
Sheshkus, Alexander; Nikolaev, Dmitry P.; Ingacheva, Anastasia; Skoryukina, Natalya
2015-12-01
In this paper we consider a task of finding information fields within document with flexible form for credit card expiration date field as example. We discuss main difficulties and suggest possible solutions. In our case this task is to be solved on mobile devices therefore computational complexity has to be as low as possible. In this paper we provide results of the analysis of suggested algorithm. Error distribution of the recognition system shows that suggested algorithm solves the task with required accuracy.
Comparison of crisp and fuzzy character networks in handwritten word recognition
NASA Technical Reports Server (NTRS)
Gader, Paul; Mohamed, Magdi; Chiang, Jung-Hsien
1992-01-01
Experiments involving handwritten word recognition on words taken from images of handwritten address blocks from the United States Postal Service mailstream are described. The word recognition algorithm relies on the use of neural networks at the character level. The neural networks are trained using crisp and fuzzy desired outputs. The fuzzy outputs were defined using a fuzzy k-nearest neighbor algorithm. The crisp networks slightly outperformed the fuzzy networks at the character level but the fuzzy networks outperformed the crisp networks at the word level.
Staffaroni, Adam M; Melrose, Rebecca J; Leskin, Lorraine P; Riskin-Jones, Hannah; Harwood, Dylan; Mandelkern, Mark; Sultzer, David L
2017-09-01
The objective of this study was to distinguish the functional neuroanatomy of verbal learning and recognition in Alzheimer's disease (AD) using the Consortium to Establish a Registry for Alzheimer's Disease (CERAD) Word Learning task. In 81 Veterans diagnosed with dementia due to AD, we conducted a cluster-based correlation analysis to assess the relationships between recency and recognition memory scores from the CERAD Word Learning Task and cortical metabolic activity measured using [ 18 F]-fluoro-2-deoxy-D-glucose positron emission tomography (FDG-PET). AD patients (Mini-Mental State Examination, MMSE mean = 20.2) performed significantly better on the recall of recency items during learning trials than of primacy and middle items. Recency memory was associated with cerebral metabolism in the left middle and inferior temporal gyri and left fusiform gyrus (p < .05 at the corrected cluster level). In contrast, recognition memory was correlated with metabolic activity in two clusters: (a) a large cluster that included the left hippocampus, parahippocampal gyrus, entorhinal cortex, anterior temporal lobe, and inferior and middle temporal gyri; (b) the bilateral orbitofrontal cortices (OFC). The present study further informs our understanding of the disparate functional neuroanatomy of recency memory and recognition memory in AD. We anticipated that the recency effect would be relatively preserved and associated with temporoparietal brain regions implicated in short-term verbal memory, while recognition memory would be associated with the medial temporal lobe and possibly the OFC. Consistent with our a priori hypotheses, list learning in our AD sample was characterized by a reduced primacy effect and a relatively spared recency effect; however, recency memory was associated with cerebral metabolism in inferior and lateral temporal regions associated with the semantic memory network, rather than regions associated with short-term verbal memory. The correlates of recognition memory included the medial temporal lobe and OFC, replicating prior studies.
A Comparative Evaluation of Anomaly Detection Algorithms for Maritime Video Surveillance
2011-01-01
of k-means clustering and the k- NN Localized p-value Estimator ( KNN -LPE). K-means is a popular distance-based clustering algorithm while KNN -LPE...implemented the sparse cluster identification rule we described in Section 3.1. 2. k-NN Localized p-value Estimator ( KNN -LPE): We implemented this using...Average Density ( KNN -NAD): This was implemented as described in Section 3.4. Algorithm Parameter Settings The global and local density-based anomaly