Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions.
Zhu, Lin; Chung, Fu-Lai; Wang, Shitong
2009-06-01
The fuzziness index m has important influence on the clustering result of fuzzy clustering algorithms, and it should not be forced to fix at the usual value m = 2. In view of its distinctive features in applications and its limitation in having m = 2 only, a recent advance of fuzzy clustering called fuzzy c-means clustering with improved fuzzy partitions (IFP-FCM) is extended in this paper, and a generalized algorithm called GIFP-FCM for more effective clustering is proposed. By introducing a novel membership constraint function, a new objective function is constructed, and furthermore, GIFP-FCM clustering is derived. Meanwhile, from the viewpoints of L(p) norm distance measure and competitive learning, the robustness and convergence of the proposed algorithm are analyzed. Furthermore, the classical fuzzy c-means algorithm (FCM) and IFP-FCM can be taken as two special cases of the proposed algorithm. Several experimental results including its application to noisy image texture segmentation are presented to demonstrate its average advantage over FCM and IFP-FCM in both clustering and robustness capabilities. PMID:19174354
An improved fuzzy c-means clustering algorithm based on shadowed sets and PSO.
Zhang, Jian; Shen, Ling
2014-01-01
To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect. PMID:25477953
An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO
Zhang, Jian; Shen, Ling
2014-01-01
To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect. PMID:25477953
NASA Astrophysics Data System (ADS)
Wang, Deguang; Han, Baochang; Huang, Ming
Computer forensics is the technology of applying computer technology to access, investigate and analysis the evidence of computer crime. It mainly include the process of determine and obtain digital evidence, analyze and take data, file and submit result. And the data analysis is the key link of computer forensics. As the complexity of real data and the characteristics of fuzzy, evidence analysis has been difficult to obtain the desired results. This paper applies fuzzy c-means clustering algorithm based on particle swarm optimization (FCMP) in computer forensics, and it can be more satisfactory results.
Comments on "A robust fuzzy local information C-means clustering algorithm".
Celik, Turgay; Lee, Hwee Kuan
2013-03-01
In a recent paper, Krinidis and Chatzis proposed a variation of fuzzy c-means algorithm for image clustering. The local spatial and gray-level information are incorporated in a fuzzy way through an energy function. The local minimizers of the designed energy function to obtain the fuzzy membership of each pixel and cluster centers are proposed. In this paper, it is shown that the local minimizers of Krinidis and Chatzis to obtain the fuzzy membership and the cluster centers in an iterative manner are not exclusively solutions for true local minimizers of their designed energy function. Thus, the local minimizers of Krinidis and Chatzis do not converge to the correct local minima of the designed energy function not because of tackling to the local minima, but because of the design of energy function. PMID:23144036
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
ERIC Educational Resources Information Center
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a majorâ€¦
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
ERIC Educational Resources Information Center
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
NASA Astrophysics Data System (ADS)
Zhao, Jianghong; Li, Deren; Wang, Yanmin
2008-12-01
Segmentation of Point cloud data is a key but difficult problem for architecture 3D reconstruction. Because compared to reverse engineering, there are more noise in ancient architecture point cloud data of edge because of mirror reflection and the traditional methods are hard that is not fuzzy in the preceding part of this paper, these methods can't embody the case of the points of borderline belonging two regions and it is difficult to satisfy demands of segmentation of ancient architecture point cloud data. Ancient architecture is mostly composed of columniation, plinth, arch, girder and tile on specifically order. Each of the component's surfaces is regular and smooth and belongingness of borderline points is very blurry. According to the character the author proposed a modified Fuzzy C-means clustering (MFCM) algorithm, which is used to add geometrical information during clustering. In addition this method improves belongingness constraints to avoid influence of noise on the result of segmentation. The algorithm is used in the project "Digital surveying of ancient architecture--- Forbidden City". Experiments show that the method is a good anti-noise, accuracy and adaptability and greater degree of human intervention is reduced. After segmentation internal point and point edge can be districted according membership of every point, so as to facilitate the follow-up to the surface feature extraction and model identification, and effective support for the three-dimensional model of the reconstruction of ancient buildings is provided.
Sequential Competitive Learning and the Fuzzy c-Means Clustering Algorithms.
Hathaway, Richard J.; Bezdek, James C.; Pal, Nikhil R.
1996-07-01
Several recent papers have described sequential competitive learning algorithms that are curious hybrids of algorithms used to optimize the fuzzy c-means (FCM) and learning vector quantization (LVQ) models. First, we show that these hybrids do not optimize the FCM functional. Then we show that the gradient descent conditions they use are not necessary conditions for optimization of a sequential version of the FCM functional. We give a numerical example that demonstrates some weaknesses of the sequential scheme proposed by Chung and Lee. And finally, we explain why these algorithms may work at times, by exhibiting the stochastic approximation problem that they unknowingly attempt to solve. Copyright 1996 Published by Elsevier Science Ltd PMID:12662563
NASA Astrophysics Data System (ADS)
Abdul-Nasir, Aimi Salihah; Mashor, Mohd Yusoff; Halim, Nurul Hazwani Abd; Mohamed, Zeehaida
2015-05-01
Malaria is a life-threatening parasitic infectious disease that corresponds for nearly one million deaths each year. Due to the requirement of prompt and accurate diagnosis of malaria, the current study has proposed an unsupervised pixel segmentation based on clustering algorithm in order to obtain the fully segmented red blood cells (RBCs) infected with malaria parasites based on the thin blood smear images of P. vivax species. In order to obtain the segmented infected cell, the malaria images are first enhanced by using modified global contrast stretching technique. Then, an unsupervised segmentation technique based on clustering algorithm has been applied on the intensity component of malaria image in order to segment the infected cell from its blood cells background. In this study, cascaded moving k-means (MKM) and fuzzy c-means (FCM) clustering algorithms has been proposed for malaria slide image segmentation. After that, median filter algorithm has been applied to smooth the image as well as to remove any unwanted regions such as small background pixels from the image. Finally, seeded region growing area extraction algorithm has been applied in order to remove large unwanted regions that are still appeared on the image due to their size in which cannot be cleaned by using median filter. The effectiveness of the proposed cascaded MKM and FCM clustering algorithms has been analyzed qualitatively and quantitatively by comparing the proposed cascaded clustering algorithm with MKM and FCM clustering algorithms. Overall, the results indicate that segmentation using the proposed cascaded clustering algorithm has produced the best segmentation performances by achieving acceptable sensitivity as well as high specificity and accuracy values compared to the segmentation results provided by MKM and FCM algorithms.
Ma, Li; Li, Yang; Fan, Suohai; Fan, Runzhu
2015-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA). The proposed algorithm combines artificial fish swarm algorithm (AFSA) with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI) are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM). PMID:26649068
Ma, Li; Li, Yang; Fan, Suohai; Fan, Runzhu
2015-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA). The proposed algorithm combines artificial fish swarm algorithm (AFSA) with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI) are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM). PMID:26649068
Efficient inhomogeneity compensation using fuzzy c-means clustering models.
SzilĂˇgyi, LĂˇszlĂł; SzilĂˇgyi, SĂˇndor M; BenyĂł, BalĂˇzs
2012-10-01
Intensity inhomogeneity or intensity non-uniformity (INU) is an undesired phenomenon that represents the main obstacle for magnetic resonance (MR) image segmentation and registration methods. Various techniques have been proposed to eliminate or compensate the INU, most of which are embedded into classification or clustering algorithms, they generally have difficulties when INU reaches high amplitudes and usually suffer from high computational load. This study reformulates the design of c-means clustering based INU compensation techniques by identifying and separating those globally working computationally costly operations that can be applied to gray intensity levels instead of individual pixels. The theoretical assumptions are demonstrated using the fuzzy c-means algorithm, but the proposed modification is compatible with a various range of c-means clustering based INU compensation and MR image segmentation algorithms. Experiments carried out using synthetic phantoms and real MR images indicate that the proposed approach produces practically the same segmentation accuracy as the conventional formulation, but 20-30 times faster. PMID:22405524
Fuzzy c-means clustering of partially missing data sets
NASA Astrophysics Data System (ADS)
Hathaway, Richard J.; Overstreet, Dessa D.; Bezdek, James C.
2000-03-01
The fuzzy c-means algorithm is a useful tool for clustering real s-dimensional data. Typically, each observation consists of numerical values for s feature such as height, length, etc. In some cases, data sets contain vectors that are missing one or more feature values. For example, a particular datum might have the form: (254.3, x, 36.2, 112.7, x), where the second and fifth feature values are missing. The (standard) fuzzy c-means algorithm cannot be applied in this case since the required computations reference numerical features values for all s features of every data point. Two adaptations of fuzzy c-means to the incomplete data case are presented here. One adaptation replaces unknown feature values with additional variables that are optimized to prove an extrapolated data set yielding the smallest possible value of the fuzzy c-means criterion. Another approach uses only the available feature values in distance calculations, and then adjusts for the missing feature values by an appropriately chosen scaling of the computed distances. Numerical convergence properties of the adaptations and computational costs are discussed. Artificial data sets are used to demonstrate the two new approaches.
NASA Astrophysics Data System (ADS)
SchrĂ¶ter, Ingmar; Paasche, Hendik; Dietrich, Peter; WollschlĂ¤ger, Ute
2014-05-01
Soil moisture is a key variable of the hydrological cycle. For example, it controls partitioning of rainfall into a runoff and an infiltration component and modulating physical, chemical and biological processes within the soil. For a better understanding of these processes, knowledge about the spatio-temporal distribution of soil moisture is indispensable. For the field to the small catchment scale with survey areas up to a few square kilometres, there are numerous new and innovative ground-based and remote sensing technologies available which have great potential to provide temporal information about soil moisture patterns. The aim of this work is to design an optimal soil moisture monitoring program for a low-mountain catchment in central Germany. In a first step, the fuzzy c-means clustering technique (Paasche et al., 2006) was used to identify structure-relevant patterns in a set of different terrain attributes derived from a DEM. Based on these patterns optimal measurement locations were identified to conduct in-situ soil moisture measurements. To consider different wetting and drying states in the catchment, several TDR measurement campaigns were conducted from April to October 2013. The TDR measurements have been integrated with the structure-relevant patterns obtained by the fuzzy cluster analysis to regionally predict soil moisture. In this study, we outline the conceptual framework of this integrative approach and present first results from field measurements. The results of the project are expected to improve the monitoring and understanding of small catchment-scale hydrological processes and to contribute to a better representation of soil moisture dynamics in physically-based, hydrological models operating at the field to the small catchment scale. Reference: Paasche, H., J. Tronicke, K. Holliger, A.G. Green, and H. Maurer (2006): Integration of diverse physical-property models: Subsurface zonation and petrophysical parameter estimation based on fuzzy c-means cluster analyses. Geophysics 71(3), H33-H44, doi:10.1190/1.2192927.
Generalized rough fuzzy c-means algorithm for brain MR image segmentation.
Ji, Zexuan; Sun, Quansen; Xia, Yong; Chen, Qiang; Xia, Deshen; Feng, Dagan
2012-11-01
Fuzzy sets and rough sets have been widely used in many clustering algorithms for medical image segmentation, and have recently been combined together to better deal with the uncertainty implied in observed image data. Despite of their wide spread applications, traditional hybrid approaches are sensitive to the empirical weighting parameters and random initialization, and hence may produce less accurate results. In this paper, a novel hybrid clustering approach, namely the generalized rough fuzzy c-means (GRFCM) algorithm is proposed for brain MR image segmentation. In this algorithm, each cluster is characterized by three automatically determined rough-fuzzy regions, and accordingly the membership of each pixel is estimated with respect to the region it locates. The importance of each region is balanced by a weighting parameter, and the bias field in MR images is modeled by a linear combination of orthogonal polynomials. The weighting parameter estimation and bias field correction have been incorporated into the iterative clustering process. Our algorithm has been compared to the existing rough c-means and hybrid clustering algorithms in both synthetic and clinical brain MR images. Experimental results demonstrate that the proposed algorithm is more robust to the initialization, noise, and bias field, and can produce more accurate and reliable segmentations. PMID:22088865
NASA Astrophysics Data System (ADS)
Akinin, M. V.; Akinina, N. V.; Klochkov, A. Y.; Nikiforov, M. B.; Sokolova, A. V.
2015-05-01
The report reviewed the algorithm fuzzy c-means, performs image segmentation, give an estimate of the quality of his work on the criterion of Xie-Beni, contain the results of experimental studies of the algorithm in the context of solving the problem of drawing up detailed two-dimensional maps with the use of unmanned aerial vehicles. According to the results of the experiment concluded that the possibility of applying the algorithm in problems of decoding images obtained as a result of aerial photography. The considered algorithm can significantly break the original image into a plurality of segments (clusters) in a relatively short period of time, which is achieved by modification of the original k-means algorithm to work in a fuzzy task.
Keller, Brad; Nathan, Diane; Wang, Yan; Zheng, Yuanjie; Gee, James; Conant, Emily; Kontos, Despina
2011-01-01
The relative fibroglandular tissue content in the breast, commonly referred to as breast density, has been shown to be the most significant risk factor for breast cancer after age. Currently, the most common approaches to quantify density are based on either semi-automated methods or visual assessment, both of which are highly subjective. This work presents a novel multi-class fuzzy c-means (FCM) algorithm for fully-automated identification and quantification of breast density, optimized for the imaging characteristics of digital mammography. The proposed algorithm involves adaptive FCM clustering based on an optimal number of clusters derived by the tissue properties of the specific mammogram, followed by generation of a final segmentation through cluster agglomeration using linear discriminant analysis. When evaluated on 80 bilateral screening digital mammograms, a strong correlation was observed between algorithm-estimated PD% and radiological ground-truth of r=0.83 (p<0.001) and an average Jaccard spatial similarity coefficient of 0.62. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner. PMID:22003744
A wavelet relational fuzzy C-means algorithm for 2D gel image segmentation.
Rashwan, Shaheera; Faheem, Mohamed Talaat; Sarhan, Amany; Youssef, Bayumy A B
2013-01-01
One of the most famous algorithms that appeared in the area of image segmentation is the Fuzzy C-Means (FCM) algorithm. This algorithm has been used in many applications such as data analysis, pattern recognition, and image segmentation. It has the advantages of producing high quality segmentation compared to the other available algorithms. Many modifications have been made to the algorithm to improve its segmentation quality. The proposed segmentation algorithm in this paper is based on the Fuzzy C-Means algorithm adding the relational fuzzy notion and the wavelet transform to it so as to enhance its performance especially in the area of 2D gel images. Both proposed modifications aim to minimize the oversegmentation error incurred by previous algorithms. The experimental results of comparing both the Fuzzy C-Means (FCM) and the Wavelet Fuzzy C-Means (WFCM) to the proposed algorithm on real 2D gel images acquired from human leukemias, HL-60 cell lines, and fetal alcohol syndrome (FAS) demonstrate the improvement achieved by the proposed algorithm in overcoming the segmentation error. In addition, we investigate the effect of denoising on the three algorithms. This investigation proves that denoising the 2D gel image before segmentation can improve (in most of the cases) the quality of the segmentation. PMID:24174990
Wang, Changmiao; Jia, Fucang; Wu, Jianhuang; Li, Guanglin
2015-01-01
An adaptively regularized kernel-based fuzzy C-means clustering framework is proposed for segmentation of brain magnetic resonance images. The framework can be in the form of three algorithms for the local average grayscale being replaced by the grayscale of the average filter, median filter, and devised weighted images, respectively. The algorithms employ the heterogeneity of grayscales in the neighborhood and exploit this measure for local contextual information and replace the standard Euclidean distance with Gaussian radial basis kernel functions. The main advantages are adaptiveness to local context, enhanced robustness to preserve image details, independence of clustering parameters, and decreased computational costs. The algorithms have been validated against both synthetic and clinical magnetic resonance images with different types and levels of noises and compared with 6 recent soft clustering algorithms. Experimental results show that the proposed algorithms are superior in preserving image details and segmentation accuracy while maintaining a low computational complexity. PMID:26793269
Parastar, Hadi; Bazrafshan, Alisina
2016-03-18
Fuzzy C-means clustering (FCM) is proposed as a promising method for the clustering of chromatographic fingerprints of complex samples, such as essential oils. As an example, secondary metabolites of 14 citrus leaves samples are extracted and analyzed by gas chromatography-mass spectrometry (GC-MS). The obtained chromatographic fingerprints are divided to desired number of chromatographic regions. Owing to the fact that chromatographic problems, such as elution time shift and peak overlap can significantly affect the clustering results, therefore, each chromatographic region is analyzed using multivariate curve resolution-alternating least squares (MCR-ALS) to address these problems. Then, the resolved elution profiles are used to make a new data matrix based on peak areas of pure components to cluster by FCM. The FCM clustering parameters (i.e., fuzziness coefficient and number of cluster) are optimized by two different methods of partial least squares (PLS) as a conventional method and minimization of FCM objective function as our new idea. The results showed that minimization of FCM objective function is an easier and better way to optimize FCM clustering parameters. Then, the optimized FCM clustering algorithm is used to cluster samples and variables to figure out the similarities and dissimilarities among samples and to find discriminant secondary metabolites in each cluster (chemotype). Finally, the FCM clustering results are compared with those of principal component analysis (PCA), hierarchical cluster analysis (HCA) and Kohonon maps. The results confirmed the outperformance of FCM over the frequently used clustering algorithms. PMID:26916594
Fuzzy C-means method with empirical mode decomposition for clustering microarray data.
Wang, Yan-Fei; Yu, Zu-Guo; Anh, Vo
2013-01-01
Microarray techniques have revolutionised genomic research by making it possible to monitor the expression of thousands of genes in parallel. The Fuzzy C-Means (FCM) method is an efficient clustering approach devised for microarray data analysis. However, microarray data contains noise, which would affect clustering results. In this paper, we propose to combine the FCM method with the Empirical Mode Decomposition (EMD) for clustering microarray data to reduce the effect of the noise. The results suggest the clustering structures of denoised microarray data are more reasonable and genes have tighter association with their clusters than those using FCM only. PMID:23777170
Automatic online spike sorting with singular value decomposition and fuzzy C-mean clustering
2012-01-01
Background Understanding how neurons contribute to perception, motor functions and cognition requires the reliable detection of spiking activity of individual neurons during a number of different experimental conditions. An important problem in computational neuroscience is thus to develop algorithms to automatically detect and sort the spiking activity of individual neurons from extracellular recordings. While many algorithms for spike sorting exist, the problem of accurate and fast online sorting still remains a challenging issue. Results Here we present a novel software tool, called FSPS (Fuzzy SPike Sorting), which is designed to optimize: (i) fast and accurate detection, (ii) offline sorting and (iii) online classification of neuronal spikes with very limited or null human intervention. The method is based on a combination of Singular Value Decomposition for fast and highly accurate pre-processing of spike shapes, unsupervised Fuzzy C-mean, high-resolution alignment of extracted spike waveforms, optimal selection of the number of features to retain, automatic identification the number of clusters, and quantitative quality assessment of resulting clusters independent on their size. After being trained on a short testing data stream, the method can reliably perform supervised online classification and monitoring of single neuron activity. The generalized procedure has been implemented in our FSPS spike sorting software (available free for non-commercial academic applications at the address: http://www.spikesorting.com) using LabVIEW (National Instruments, USA). We evaluated the performance of our algorithm both on benchmark simulated datasets with different levels of background noise and on real extracellular recordings from premotor cortex of Macaque monkeys. The results of these tests showed an excellent accuracy in discriminating low-amplitude and overlapping spikes under strong background noise. The performance of our method is competitive with respect to other robust spike sorting algorithms. Conclusions This new software provides neuroscience laboratories with a new tool for fast and robust online classification of single neuron activity. This feature could become crucial in situations when online spike detection from multiple electrodes is paramount, such as in human clinical recordings or in brain-computer interfaces. PMID:22871125
Tsantis, Stavros; Spiliopoulos, Stavros; Karnabatidis, Dimitrios; Skouroliakou, Aikaterini; Hazle, John D.; Kagadis, George C. E-mail: George.Kagadis@med.upatras.gr
2014-07-15
Purpose: Speckle suppression in ultrasound (US) images of various anatomic structures via a novel speckle noise reduction algorithm. Methods: The proposed algorithm employs an enhanced fuzzy c-means (EFCM) clustering and multiresolution wavelet analysis to distinguish edges from speckle noise in US images. The edge detection procedure involves a coarse-to-fine strategy with spatial and interscale constraints so as to classify wavelet local maxima distribution at different frequency bands. As an outcome, an edge map across scales is derived whereas the wavelet coefficients that correspond to speckle are suppressed in the inverse wavelet transform acquiring the denoised US image. Results: A total of 34 thyroid, liver, and breast US examinations were performed on a Logiq 9 US system. Each of these images was subjected to the proposed EFCM algorithm and, for comparison, to commercial speckle reduction imaging (SRI) software and another well-known denoising approach, Pizurica's method. The quantification of the speckle suppression performance in the selected set of US images was carried out via Speckle Suppression Index (SSI) with results of 0.61, 0.71, and 0.73 for EFCM, SRI, and Pizurica's methods, respectively. Peak signal-to-noise ratios of 35.12, 33.95, and 29.78 and edge preservation indices of 0.94, 0.93, and 0.86 were found for the EFCM, SIR, and Pizurica's method, respectively, demonstrating that the proposed method achieves superior speckle reduction performance and edge preservation properties. Based on two independent radiologists’ qualitative evaluation the proposed method significantly improved image characteristics over standard baseline B mode images, and those processed with the Pizurica's method. Furthermore, it yielded results similar to those for SRI for breast and thyroid images significantly better results than SRI for liver imaging, thus improving diagnostic accuracy in both superficial and in-depth structures. Conclusions: A new wavelet-based EFCM clustering model was introduced toward noise reduction and detail preservation. The proposed method improves the overall US image quality, which in turn could affect the decision-making on whether additional imaging and/or intervention is needed.
Image watermarking using a dynamically weighted fuzzy c-means algorithm
NASA Astrophysics Data System (ADS)
Kang, Myeongsu; Ho, Linh Tran; Kim, Yongmin; Kim, Cheol Hong; Kim, Jong-Myon
2011-10-01
Digital watermarking has received extensive attention as a new method of protecting multimedia content from unauthorized copying. In this paper, we present a nonblind watermarking system using a proposed dynamically weighted fuzzy c-means (DWFCM) technique combined with discrete wavelet transform (DWT), discrete cosine transform (DCT), and singular value decomposition (SVD) techniques for copyright protection. The proposed scheme efficiently selects blocks in which the watermark is embedded using new membership values of DWFCM as the embedding strength. We evaluated the proposed algorithm in terms of robustness against various watermarking attacks and imperceptibility compared to other algorithms [DWT-DCT-based and DCT- fuzzy c-means (FCM)-based algorithms]. Experimental results indicate that the proposed algorithm outperforms other algorithms in terms of robustness against several types of attacks, such as noise addition (Gaussian noise, salt and pepper noise), rotation, Gaussian low-pass filtering, mean filtering, median filtering, Gaussian blur, image sharpening, histogram equalization, and JPEG compression. In addition, the proposed algorithm achieves higher values of peak signal-to-noise ratio (approximately 49 dB) and lower values of measure-singular value decomposition (5.8 to 6.6) than other algorithms.
Self-organization and clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
NASA Astrophysics Data System (ADS)
Liu, Shuang; Hou, Biao; Jiao, Licheng; Zhang, Guofeng
2007-11-01
Synthetic Aperture Radar (SAR) images are inherently affected by multiplicative speckle noise, which is due to the coherent nature of the scattering phenomenon. Speckle noise of SAR affects image quality and image interpretation seriously. To alleviate deleterious effects of speckle, various ways have been devised to suppress it. An ideal algorithm should smooth the speckle without blurring edges and fine details. But most classical algorithms cannot satisfy these two demands very well. Due to the property of SAR images speckles is multiplicative noise, it difficult to estimate the variance of the high-frequency subband coefficients. Most classical approaches such as wavelet thresholding or shrinkage scheme of Donoho and Johnstone are not suitable for SAR images speckle noise removal. In this paper, a novel approach to SAR image speckle reduction is presented, which is based on second generation bandelets and a kernel-based possibilistic C-means clustering algorithm (BKPCM).
Segmentation of pomegranate MR images using spatial fuzzy c-means (SFCM) algorithm
NASA Astrophysics Data System (ADS)
Moradi, Ghobad; Shamsi, Mousa; Sedaaghi, M. H.; Alsharif, M. R.
2011-10-01
Segmentation is one of the fundamental issues of image processing and machine vision. It plays a prominent role in a variety of image processing applications. In this paper, one of the most important applications of image processing in MRI segmentation of pomegranate is explored. Pomegranate is a fruit with pharmacological properties such as being anti-viral and anti-cancer. Having a high quality product in hand would be critical factor in its marketing. The internal quality of the product is comprehensively important in the sorting process. The determination of qualitative features cannot be manually made. Therefore, the segmentation of the internal structures of the fruit needs to be performed as accurately as possible in presence of noise. Fuzzy c-means (FCM) algorithm is noise-sensitive and pixels with noise are classified inversely. As a solution, in this paper, the spatial FCM algorithm in pomegranate MR images' segmentation is proposed. The algorithm is performed with setting the spatial neighborhood information in FCM and modification of fuzzy membership function for each class. The segmentation algorithm results on the original and the corrupted Pomegranate MR images by Gaussian, Salt Pepper and Speckle noises show that the SFCM algorithm operates much more significantly than FCM algorithm. Also, after diverse steps of qualitative and quantitative analysis, we have concluded that the SFCM algorithm with 5Ă—5 window size is better than the other windows.
Remote sensing ocean data analyses using fuzzy C-Means clustering
NASA Astrophysics Data System (ADS)
Xu, Suqin; Chen, Jie; Gao, Guoxing
2009-10-01
With the deep understanding and exploitation of the wide Ocean, There are more and more fine instrument installed or loaded on measuring ships or other marines. The high costs and complexity of corrosion place ever-increasing demands on the analyses of surrounding ocean environment. In this paper, the fuzzy C-Means clustering is used to analyze the surrounding ocean environment with remote sensing data. The studied ocean area is considered as a two dimensional gird or an image, and the fuzzy C-Means clustering technique is used to reveal the underlying relationship of the elements and segment the interrelated ocean in regions with similar spectral properties in the influence of instrument corrosion. The influence of the environment elements in instrument corrosion is studied and a priori spatial information is added to improving the segmentation result. The fitness function containing neighbor information was set up based on the gray information and the neighbor relations between the pixels. By making use of the global searching ability of the predator-prey particle swarm optimization, the optimal cluster center could be obtained by iterative optimization and the segmentation could be accomplished. The calculation results show that the segmentation is accurate and reasonable. This ocean environment analysis fruit has used in real application and has proved to be valuable in ship instrument corrosion monitoring and the guide of other ocean activity.
NASA Astrophysics Data System (ADS)
Kesiko?lu, M. H.; Atasever, Ü. H.; Özkan, C.
2013-10-01
Change detection analyze means that according to observations made in different times, the process of defining the change detection occurring in nature or in the state of any objects or the ability of defining the quantity of temporal effects by using multitemporal data sets. There are lots of change detection techniques met in literature. It is possible to group these techniques under two main topics as supervised and unsupervised change detection. In this study, the aim is to define the land cover changes occurring in specific area of Kayseri with unsupervised change detection techniques by using Landsat satellite images belonging to different years which are obtained by the technique of remote sensing. While that process is being made, image differencing method is going to be applied to the images by following the procedure of image enhancement. After that, the method of Principal Component Analysis is going to be applied to the difference image obtained. To determine the areas that have and don't have changes, the image is grouped as two parts by Fuzzy C-Means Clustering method. For achieving these processes, firstly the process of image to image registration is completed. As a result of this, the images are being referred to each other. After that, gray scale difference image obtained is partitioned into 3 × 3 nonoverlapping blocks. With the method of principal component analysis, eigenvector space is gained and from here, principal components are reached. Finally, feature vector space consisting principal component is partitioned into two clusters using Fuzzy C-Means Clustering and after that change detection process has been done.
Thermogram breast cancer prediction approach based on Neutrosophic sets and fuzzy c-means algorithm.
Gaber, Tarek; Ismail, Gehad; Anter, Ahmed; Soliman, Mona; Ali, Mona; Semary, Noura; Hassanien, Aboul Ella; Snasel, Vaclav
2015-08-01
The early detection of breast cancer makes many women survive. In this paper, a CAD system classifying breast cancer thermograms to normal and abnormal is proposed. This approach consists of two main phases: automatic segmentation and classification. For the former phase, an improved segmentation approach based on both Neutrosophic sets (NS) and optimized Fast Fuzzy c-mean (F-FCM) algorithm was proposed. Also, post-segmentation process was suggested to segment breast parenchyma (i.e. ROI) from thermogram images. For the classification, different kernel functions of the Support Vector Machine (SVM) were used to classify breast parenchyma into normal or abnormal cases. Using benchmark database, the proposed CAD system was evaluated based on precision, recall, and accuracy as well as a comparison with related work. The experimental results showed that our system would be a very promising step toward automatic diagnosis of breast cancer using thermograms as the accuracy reached 100%. PMID:26737234
Wen, Ying; He, Lianghua; von Deneen, Karen M; Lu, Yue
2013-11-01
We present an effective method for brain tissue classification based on diffusion tensor imaging (DTI) data. The method accounts for two main DTI segmentation obstacles: random noise and magnetic field inhomogeneities. In the proposed method, DTI parametric maps were used to resolve intensity inhomogeneities of brain tissue segmentation because they could provide complementary information for tissues and define accurate tissue maps. An improved fuzzy c-means with spatial constraints proposal was used to enhance the noise and artifact robustness of DTI segmentation. Fuzzy c-means clustering with spatial constraints (FCM_S) could effectively segment images corrupted by noise, outliers, and other imaging artifacts. Its effectiveness contributes not only to the introduction of fuzziness for belongingness of each pixel but also to the exploitation of spatial contextual information. We proposed an improved FCM_S applied on DTI parametric maps, which explores the mean and covariance of the feature spatial information for automated segmentation of DTI. The experiments on synthetic images and real-world datasets showed that our proposed algorithms, especially with new spatial constraints, were more effective. PMID:23891435
T1- and T2-weighted spatially constrained fuzzy c-means clustering for brain MRI segmentation
NASA Astrophysics Data System (ADS)
Despotovi?, Ivana; Goossens, Bart; Vansteenkiste, Ewout; Philips, Wilfried
2010-03-01
The segmentation of brain tissue in magnetic resonance imaging (MRI) plays an important role in clinical analysis and is useful for many applications including studying brain diseases, surgical planning and computer assisted diagnoses. In general, accurate tissue segmentation is a difficult task, not only because of the complicated structure of the brain and the anatomical variability between subjects, but also because of the presence of noise and low tissue contrasts in the MRI images, especially in neonatal brain images. Fuzzy clustering techniques have been widely used in automated image segmentation. However, since the standard fuzzy c-means (FCM) clustering algorithm does not consider any spatial information, it is highly sensitive to noise. In this paper, we present an extension of the FCM algorithm to overcome this drawback, by combining information from both T1-weighted (T1-w) and T2-weighted (T2-w) MRI scans and by incorporating spatial information. This new spatially constrained FCM (SCFCM) clustering algorithm preserves the homogeneity of the regions better than existing FCM techniques, which often have difficulties when tissues have overlapping intensity profiles. The performance of the proposed algorithm is tested on simulated and real adult MR brain images with different noise levels, as well as on neonatal MR brain images with the gestational age of 39 weeks. Experimental quantitative and qualitative segmentation results show that the proposed method is effective and more robust to noise than other FCM-based methods. Also, SCFCM appears as a very promising tool for complex and noisy image segmentation of the neonatal brain.
Carotid artery image segmentation using modified spatial fuzzy c-means and ensemble clustering.
Hassan, Mehdi; Chaudhry, Asmatullah; Khan, Asifullah; Kim, Jin Young
2012-12-01
Disease diagnosis based on ultrasound imaging is popular because of its non-invasive nature. However, ultrasound imaging system produces low quality images due to the presence of spackle noise and wave interferences. This shortcoming requires a considerable effort from experts to diagnose a disease from the carotid artery ultrasound images. Image segmentation is one of the techniques, which can help efficiently in diagnosing a disease from the carotid artery ultrasound images. Most of the pixels in an image are highly correlated. Considering the spatial information of surrounding pixels in the process of image segmentation may further improve the results. When data is highly correlated, one pixel may belong to more than one clusters with different degree of membership. In this paper, we present an image segmentation technique namely improved spatial fuzzy c-means and an ensemble clustering approach for carotid artery ultrasound images to identify the presence of plaque. Spatial, wavelets and gray level co-occurrence matrix (GLCM) features are extracted from carotid artery ultrasound images. Redundant and less important features are removed from the features set using genetic search process. Finally, segmentation process is performed on optimal or reduced features. Ensemble clustering with reduced feature set outperforms with respect to segmentation time as well as clustering accuracy. Intima-media thickness (IMT) is measured from the images segmented by the proposed approach. Based on IMT measured values, Multi-Layer Back-Propagation Neural Networks (MLBPNN) is used to classify the images into normal or abnormal. Experimental results show the learning capability of MLBPNN classifier and validate the effectiveness of our proposed technique. The proposed approach of segmentation and classification of carotid artery ultrasound images seems to be very useful for detection of plaque in carotid artery. PMID:22981822
NASA Astrophysics Data System (ADS)
Rapstine, Thomas D.
Gravity gradiometry has been used as a geophysical tool to image salt structure in hydrocarbon exploration. The knowledge of the location, orientation, and spatial extent of salt bodies helps characterize possible petroleum prospects. Imaging around and underneath salt bodies can be challenging given the petrophysical properties and complicated geometry of salt. Methods for imaging beneath salt using seismic data exist but are often iterative and expensive, requiring a refinement of a velocity model at each iteration. Fortunately, the relatively strong density contrast between salt and background density structure pro- vides the opportunity for gravity gradiometry to be useful in exploration, especially when integrated with other geophysical data such as seismic. Quantitatively integrating multiple geophysical data is not trivial, but can improve the recovery of salt body geometry and petrophysical composition using inversion. This thesis provides two options for quantitatively integrating seismic, AGG, and petrophysical data that may aid the imaging of salt bodies. Both methods leverage and expand upon previously developed deterministic inversion methods. The inversion methods leverage seismically derived information, such as horizon slope and salt body interpretation, to constrain the inversion of airborne gravity gradiometry data (AGG) to arrive at a density contrast model. The first method involves constraining a top of salt inversion using slope in a seismic image. The second method expands fuzzy c-means (FCM) clustering inversion to include spatial control on clustering based on a seismically derived salt body interpretation. The effective- ness of the methods are illustrated on a 2D synthetic earth model derived from the SEAM Phase 1 salt model. Both methods show that constraining the inversion of AGG data using information derived from seismic images can improve the recovery of salt.
Ultrametric Hierarchical Clustering Algorithms.
ERIC Educational Resources Information Center
Milligan, Glenn W.
1979-01-01
Johnson has shown that the single linkage and complete linkage hierarchical clustering algorithms induce a metric on the data known as the ultrametric. Johnson's proof is extended to four other common clustering algorithms. Two additional methods also produce hierarchical structures which can violate the ultrametric inequality. (Author/CTM)
NASA Astrophysics Data System (ADS)
GĂĽler, CĂĽneyt; Thyne, Geoffrey D.
2004-12-01
In this paper, classification of a large hydrochemical data set (more than 600 water samples and 11 hydrochemical variables) from southeastern California by fuzzy c-means (FCM) and hierarchical cluster analysis (HCA) clustering techniques is performed and its application to hydrochemical facies delineation is discussed. Results from both FCM and HCA clustering produced cluster centers (prototypes) that can be used to identify the physical and chemical processes creating the variations in the water chemistries. There are several advantages to FCM, and it is concluded that FCM, as an exploratory data analysis technique, is potentially useful in establishing hydrochemical facies distribution and may provide a better tool than HCA for clustering large data sets when overlapping or continuous clusters exist.
NASA Astrophysics Data System (ADS)
Nasseri, Aynur; Jafar Mohammadzadeh, Mohammad; Hashem Tabatabaei Raeisi, S.
2015-04-01
This paper deals with the application of the ant colony algorithm (AC) to a seismic dataset from Dezful Embayment in the southwest region of Iran. The objective of the approach is to generate an accurate representation of faults and discontinuities to assist in pertinent matters such as well planning and field optimization. The AC analyzed all spatial discontinuities in the seismic attributes from which features were extracted. True fault information from the attributes was detected by many artificial ants, whereas noise and the remains of the reflectors were eliminated. Furthermore, the fracture enhancement procedure was conducted by three steps on seismic data of the area. In the first step several attributes such as chaos, variance/coherence and dip deviation were taken into account; the resulting maps indicate high-resolution contrast for the variance attribute. Subsequently, the enhancement of spatial discontinuities was performed and finally elimination of the noise and remains of non-faulting events was carried out by simulating the behavior of ant colonies. After considering stepwise attribute optimization, focusing on chaos and variance in particular, an attribute fusion was generated and used in the ant colony algorithm. The resulting map displayed the highest performance in feature detection along the main structural feature trend, confined to a NW-SE direction. Thus, the optimized attribute fusion might be used with greater confidence to map the structural feature network with more accuracy and resolution. In order to assess the performance of the AC in feature detection, and cross validate the reliability of the method used, fuzzy c-means clustering (FCMC) was employed for the same dataset. Comparing the maps illustrates the effectiveness and preference of the AC approach due to its high resolution contrast for structural feature detection compared to the FCMC method. Accordingly, 3D planes of discontinuity determined spatial distribution of fractures in the field in order to assist well planning. Results revealed that the high impedance location probability related to an area in the vicinity of the faults, whilst low impedance location probably could indicate zones of high permeability which indicate flow conduits. Analysis under the present study suggests that the orientation and magnitude of fractures exhibiting the main trend of NW-SE in Dezful Embayment is more susceptible to stimulation and is more likely to open for fluid flow.
Effect of co-operative fuzzy c-means clustering on estimates of three parameters AVA inversion
NASA Astrophysics Data System (ADS)
Nair, Rajesh R.; Kandpal, Suresh Ch
2010-04-01
We determine the degree of variation of model fitness, to a true model based on amplitude variation with angle (AVA) methodology for a synthetic gas hydrate model, using co-operative fuzzy c-means clustering, constrained to a rock physics model. When a homogeneous starting model is used, with only traditional least squares optimization scheme for inversion, the variance of the parameters is found to be comparatively high. In this co-operative methodology, the output from the least squares inversion is fed as an input to the fuzzy scheme. Tests with co-operative inversion using fuzzy c-means with damped least squares technique and constraints derived from empirical relationship based on rock properties model show improved stability, model fitness and variance for all the three parameters in comparison with the standard inversion alone.
Hassan, Mehdi; Chaudhry, Asmatullah; Khan, Asifullah; Iftikhar, M Aksam
2014-02-01
In this paper, a robust method is proposed for segmentation of medical images by exploiting the concept of information gain. Medical images contain inherent noise due to imaging equipment, operating environment and patient movement during image acquisition. A robust medical image segmentation technique is thus inevitable for accurate results in subsequent stages. The clustering technique proposed in this work updates fuzzy membership values and cluster centroids based on information gain computed from the local neighborhood of a pixel. The proposed approach is less sensitive to noise and produces homogeneous clustering. Experiments are performed on medical and non-medical images and results are compared with state of the art segmentation approaches. Analysis of visual and quantitative results verifies that the proposed approach outperforms other techniques both on noisy and noise free images. Furthermore, the proposed technique is used to segment a dataset of 300 real carotid artery ultrasound images. A decision system for plaque detection in the carotid artery is then proposed. Intima media thickness (IMT) is measured from the segmented images produced by the proposed approach. A feature vector based on IMT values is constructed for making decision about the presence of plaque in carotid artery using probabilistic neural network (PNN). The proposed decision system detects plaque in carotid artery images with high accuracy. Finally, effect of the proposed segmentation technique has also been investigated on classification of carotid artery ultrasound images. PMID:24239296
Basic cluster compression algorithm
NASA Technical Reports Server (NTRS)
Hilbert, E. E.; Lee, J.
1980-01-01
Feature extraction and data compression of LANDSAT data is accomplished by BCCA program which reduces costs associated with transmitting, storing, distributing, and interpreting multispectral image data. Algorithm uses spatially local clustering to extract features from image data to describe spectral characteristics of data set. Approach requires only simple repetitive computations, and parallel processing can be used for very high data rates. Program is written in FORTRAN IV for batch execution and has been implemented on SEL 32/55.
Hierarchical modularization of biochemical pathways using fuzzy-c means clustering.
de Luis Balaguer, Maria A; Williams, Cranos M
2014-08-01
Biological systems that are representative of regulatory, metabolic, or signaling pathways can be highly complex. Mathematical models that describe such systems inherit this complexity. As a result, these models can often fail to provide a path toward the intuitive comprehension of these systems. More coarse information that allows a perceptive insight of the system is sometimes needed in combination with the model to understand control hierarchies or lower level functional relationships. In this paper, we present a method to identify relationships between components of dynamic models of biochemical pathways that reside in different functional groups. We find primary relationships and secondary relationships. The secondary relationships reveal connections that are present in the system, which current techniques that only identify primary relationships are unable to show. We also identify how relationships between components dynamically change over time. This results in a method that provides the hierarchy of the relationships among components, which can help us to understand the low level functional structure of the system and to elucidate potential hierarchical control. As a proof of concept, we apply the algorithm to the epidermal growth factor signal transduction pathway, and to the C3 photosynthesis pathway. We identify primary relationships among components that are in agreement with previous computational decomposition studies, and identify secondary relationships that uncover connections among components that current computational approaches were unable to reveal. PMID:24196983
NASA Astrophysics Data System (ADS)
An, Yu; Liu, Jie; Ye, Jinzuo; Mao, Yamin; Yang, Xin; Jiang, Shixin; Chi, Chongwei; Tian, Jie
2015-03-01
As an important molecular imaging modality, fluorescence molecular imaging (FMI) has the advantages of high sensitivity, low cost and ease of use. By labeling the regions of interest with fluorophore, FMI can noninvasively obtain the distribution of fluorophore in-vivo. However, due to the fact that the spectrum of fluorescence is in the section of the visible light range, there are mass of autofluorescence on the surface of the bio-tissues, which is a major disturbing factor in FMI. Meanwhile, the high-level of dark current for charge-coupled device (CCD) camera and other influencing factor can also produce a lot of background noise. In this paper, a novel method for image denoising of FMI based on fuzzy C-Means clustering (FCM) is proposed, because the fluorescent signal is the major component of the fluorescence images, and the intensity of autofluorescence and other background signals is relatively lower than the fluorescence signal. First, the fluorescence image is smoothed by sliding-neighborhood operations to initially eliminate the noise. Then, the wavelet transform (WLT) is performed on the fluorescence images to obtain the major component of the fluorescent signals. After that, the FCM method is adopt to separate the major component and background of the fluorescence images. Finally, the proposed method was validated using the original data obtained by in vivo implanted fluorophore experiment, and the results show that our proposed method can effectively obtain the fluorescence signal while eliminate the background noise, which could increase the quality of fluorescence images.
Tang, Jing Rui; Mat Isa, Nor Ashidi; Ch’ng, Ewe Seng
2015-01-01
Despite the effectiveness of Pap-smear test in reducing the mortality rate due to cervical cancer, the criteria of the reporting standard of the Pap-smear test are mostly qualitative in nature. This study addresses the issue on how to define the criteria in a more quantitative and definite term. A negative Pap-smear test result, i.e. negative for intraepithelial lesion or malignancy (NILM), is qualitatively defined to have evenly distributed, finely granular chromatin in the nuclei of cervical squamous cells. To quantify this chromatin pattern, this study employed Fuzzy C-Means clustering as the segmentation technique, enabling different degrees of chromatin segmentation to be performed on sample images of non-neoplastic squamous cells. From the simulation results, a model representing the chromatin distribution of non-neoplastic cervical squamous cell is constructed with the following quantitative characteristics: at the best representative sensitivity level 4 based on statistical analysis and human experts’ feedbacks, a nucleus of non-neoplastic squamous cell has an average of 67 chromatins with a total area of 10.827?m2; the average distance between the nearest chromatin pair is 0.508?m and the average eccentricity of the chromatin is 0.47. PMID:26560331
Keller, Brad M.; Nathan, Diane L.; Wang Yan; Zheng Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina
2012-08-15
Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') and vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r= 0.82, p < 0.001) and processed (r= 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r= 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's {kappa}{>=} 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies.
NASA Astrophysics Data System (ADS)
Brandl, Miriam B.; Beck, Dominik; Pham, Tuan D.
2011-06-01
The high dimensionality of image-based dataset can be a drawback for classification accuracy. In this study, we propose the application of fuzzy c-means clustering, cluster validity indices and the notation of a joint-feature-clustering matrix to find redundancies of image-features. The introduced matrix indicates how frequently features are grouped in a mutual cluster. The resulting information can be used to find data-derived feature prototypes with a common biological meaning, reduce data storage as well as computation times and improve the classification accuracy.
Effective FCM noise clustering algorithms in medical images.
Kannan, S R; Devi, R; Ramathilagam, S; Takezawa, K
2013-02-01
The main motivation of this paper is to introduce a class of robust non-Euclidean distance measures for the original data space to derive new objective function and thus clustering the non-Euclidean structures in data to enhance the robustness of the original clustering algorithms to reduce noise and outliers. The new objective functions of proposed algorithms are realized by incorporating the noise clustering concept into the entropy based fuzzy C-means algorithm with suitable noise distance which is employed to take the information about noisy data in the clustering process. This paper presents initial cluster prototypes using prototype initialization method, so that this work tries to obtain the final result with less number of iterations. To evaluate the performance of the proposed methods in reducing the noise level, experimental work has been carried out with a synthetic image which is corrupted by Gaussian noise. The superiority of the proposed methods has been examined through the experimental study on medical images. The experimental results show that the proposed algorithms perform significantly better than the standard existing algorithms. The accurate classification percentage of the proposed fuzzy C-means segmentation method is obtained using silhouette validity index. PMID:23219569
Abdulbaqi, Hayder Saad; Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin; Abood, Loay Kadom
2015-04-24
Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.
NASA Astrophysics Data System (ADS)
Abdulbaqi, Hayder Saad; Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin; Abood, Loay Kadom
2015-04-01
Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.
Lin, Thy-Hou; Wang, Ging-Ming; Hsu, Yao-Hua
2002-01-01
A fuzzy c-means algorithm was used to classify some 3D convex hull descriptors computed for 345 active HIV-1 protease inhibitors collected from literature and 437 inactive analogues searched from the MDL/ISIS database. The number of descriptors used to represent each compound was from 4 to 8, and they were uncorrelated using the principal component analysis. These uncorrelated descriptors were then divided into two groups and classified by the fuzzy c-means algorithm. The classification produced a clear-cut switch in membership functions computed for each uncorrelated descriptor at the group boundary. Compounds with nonswitching membership functions computed were treated as outliers, and they were counted for estimating the accuracy of the classification. The averaged accuracy of classification for the active inhibitor set was about 80% which was better than that directly classified by a linear discriminant function on the original 3D convex hull descriptors. The whole classification scheme was also applied to several sets of some conventional descriptors computed for each compound, but the averaged accuracy was around 58%. Further classification using some 3D convex hull descriptors searched from comparing the distribution of these descriptors was performed on a new data set composed of 289 outliers-deducted active inhibitors and 63 outliers identified from the inactive analogues through previous classification. This final classification identified 19 inactive analogues which were similar in structural and topological features to those of some highly active inhibitors classified together with them. PMID:12444748
Chang, Yeun-Chung; Huang, Yan-Hao; Huang, Chiun-Sheng; Chang, Pei-Kang; Chen, Jeon-Hor; Chang, Ruey-Feng
2012-04-01
The purpose of this study is to evaluate the diagnostic efficacy of the representative characteristic kinetic curve of dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) extracted by fuzzy c-means (FCM) clustering for the discrimination of benign and malignant breast tumors using a novel computer-aided diagnosis (CAD) system. About the research data set, DCE-MRIs of 132 solid breast masses with definite histopathologic diagnosis (63 benign and 69 malignant) were used in this study. At first, the tumor region was automatically segmented using the region growing method based on the integrated color map formed by the combination of kinetic and area under curve color map. Then, the FCM clustering was used to identify the time-signal curve with the larger initial enhancement inside the segmented region as the representative kinetic curve, and then the parameters of the Tofts pharmacokinetic model for the representative kinetic curve were compared with conventional curve analysis (maximal enhancement, time to peak, uptake rate and washout rate) for each mass. The results were analyzed with a receiver operating characteristic curve and Student's t test to evaluate the classification performance. Accuracy, sensitivity, specificity, positive predictive value and negative predictive value of the combined model-based parameters of the extracted kinetic curve from FCM clustering were 86.36% (114/132), 85.51% (59/69), 87.30% (55/63), 88.06% (59/67) and 84.62% (55/65), better than those from a conventional curve analysis. The A(Z) value was 0.9154 for Tofts model-based parametric features, better than that for conventional curve analysis (0.8673), for discriminating malignant and benign lesions. In conclusion, model-based analysis of the characteristic kinetic curve of breast mass derived from FCM clustering provides effective lesion classification. This approach has potential in the development of a CAD system for DCE breast MRI. PMID:22245697
An object-oriented cluster search algorithm
Silin, Dmitry; Patzek, Tad
2003-01-24
In this work we describe two object-oriented cluster search algorithms, which can be applied to a network of an arbitrary structure. First algorithm calculates all connected clusters, whereas the second one finds a path with the minimal number of connections. We estimate the complexity of the algorithm and infer that the number of operations has linear growth with respect to the size of the network.
Kumar, Surendra; Ghosh, Subhojit; Tetarway, Suhash; Sinha, Rakesh Kumar
2015-07-01
In this study, the magnitude and spatial distribution of frequency spectrum in the resting electroencephalogram (EEG) were examined to address the problem of detecting alcoholism in the cerebral motor cortex. The EEG signals were recorded from chronic alcoholic conditions (nÂ =Â 20) and the control group (nÂ =Â 20). Data were taken from motor cortex region and divided into five sub-bands (delta, theta, alpha, beta-1 and beta-2). Three methodologies were adopted for feature extraction: (1) absolute power, (2) relative power and (3) peak power frequency. The dimension of the extracted features is reduced by linear discrimination analysis and classified by support vector machine (SVM) and fuzzy C-mean clustering. The maximum classification accuracy (88Â %) with SVM clustering was achieved with the EEG spectral features with absolute power frequency on F4 channel. Among the bands, relatively higher classification accuracy was found over theta band and beta-2 band in most of the channels when computed with the EEG features of relative power. Electrodes wise CZ, C3 and P4 were having more alteration. Considering the good classification accuracy obtained by SVM with relative band power features in most of the EEG channels of motor cortex, it can be suggested that the noninvasive automated online diagnostic system for the chronic alcoholic condition can be developed with the help of EEG signals. PMID:25773367
Y-Means: An Autonomous Clustering Algorithm
NASA Astrophysics Data System (ADS)
Ghorbani, Ali A.; Onut, Iosif-Viorel
This paper proposes an unsupervised clustering technique for data classification based on the K-means algorithm. The K-means algorithm is well known for its simplicity and low time complexity. However, the algorithm has three main drawbacks: dependency on the initial centroids, dependency on the number of clusters, and degeneracy. Our solution accommodates these three issues, by proposing an approach to automatically detect a semi-optimal number of clusters according to the statistical nature of the data. As a side effect, the method also makes choices of the initial centroid-seeds not critical to the clustering results. The experimental results show the robustness of the Y-means algorithm as well as its good performance against a set of other well known unsupervised clustering techniques. Furthermore, we study the performance of our proposed solution against different distance and outlier-detection functions and recommend the best combinations.
Basic firefly algorithm for document clustering
NASA Astrophysics Data System (ADS)
Mohammed, Athraa Jasim; Yusof, Yuhanis; Husni, Husniza
2015-12-01
The Document clustering plays significant role in Information Retrieval (IR) where it organizes documents prior to the retrieval process. To date, various clustering algorithms have been proposed and this includes the K-means and Particle Swarm Optimization. Even though these algorithms have been widely applied in many disciplines due to its simplicity, such an approach tends to be trapped in a local minimum during its search for an optimal solution. To address the shortcoming, this paper proposes a Basic Firefly (Basic FA) algorithm to cluster text documents. The algorithm employs the Average Distance to Document Centroid (ADDC) as the objective function of the search. Experiments utilizing the proposed algorithm were conducted on the 20Newsgroups benchmark dataset. Results demonstrate that the Basic FA generates a more robust and compact clusters than the ones produced by K-means and Particle Swarm Optimization (PSO).
NASA Astrophysics Data System (ADS)
Castro, Marcelo A.; Thomasson, David; Avila, Nilo A.; Hufton, Jennifer; Senseney, Justin; Johnson, Reed F.; Dyall, Julie
2013-03-01
Monkeypox virus is an emerging zoonotic pathogen that results in up to 10% mortality in humans. Knowledge of clinical manifestations and temporal progression of monkeypox disease is limited to data collected from rare outbreaks in remote regions of Central and West Africa. Clinical observations show that monkeypox infection resembles variola infection. Given the limited capability to study monkeypox disease in humans, characterization of the disease in animal models is required. A previous work focused on the identification of inflammatory patterns using PET/CT image modality in two non-human primates previously inoculated with the virus. In this work we extended techniques used in computer-aided detection of lung tumors to identify inflammatory lesions from monkeypox virus infection and their progression using CT images. Accurate estimation of partial volumes of lung lesions via segmentation is difficult because of poor discrimination between blood vessels, diseased regions, and outer structures. We used hard C-means algorithm in conjunction with landmark based registration to estimate the extent of monkeypox virus induced disease before inoculation and after disease progression. Automated estimation is in close agreement with manual segmentation.
A local distribution based spatial clustering algorithm
NASA Astrophysics Data System (ADS)
Deng, Min; Liu, Qiliang; Li, Guangqiang; Cheng, Tao
2009-10-01
Spatial clustering is an important means for spatial data mining and spatial analysis, and it can be used to discover the potential spatial association rules and outliers among the spatial data. Most existing spatial clustering algorithms only utilize the spatial distance or local density to find the spatial clusters in a spatial database, without taking the spatial local distribution characters into account, so that the clustered results are unreasonable in many cases. To overcome such limitations, this paper develops a new indicator (i.e. local median angle) to measure the local distribution at first, and further proposes a new algorithm, called local distribution based spatial clustering algorithm (LDBSC in abbreviation). In the process of spatial clustering, a series of recursive search are implemented for all the entities so that those entities with its local median angle being very close or equal are clustered. In this way, all the spatial entities in the spatial database can be automatically divided into some clusters. Finally, two tests are implemented to demonstrate that the method proposed in this paper is more prominent than DBSCAN, as well as that it is very robust and feasible, and can be used to find the clusters with different shapes.
Hierarchical link clustering algorithm in networks
NASA Astrophysics Data System (ADS)
Bodlaj, Jernej; Batagelj, Vladimir
2015-06-01
Hierarchical network clustering is an approach to find tightly and internally connected clusters (groups or communities) of nodes in a network based on its structure. Instead of nodes, it is possible to cluster links of the network. The sets of nodes belonging to clusters of links can overlap. While overlapping clusters of nodes are not always expected, they are natural in many applications. Using appropriate dissimilarity measures, we can complement the clustering strategy to consider, for example, the semantic meaning of links or nodes based on their properties. We propose a new hierarchical link clustering algorithm which in comparison to existing algorithms considers node and/or link properties (descriptions, attributes) of the input network alongside its structure using monotonic dissimilarity measures. The algorithm determines communities that form connected subnetworks (relational constraint) containing locally similar nodes with respect to their description. It is only implicitly based on the corresponding line graph of the input network, thus reducing its space and time complexities. We investigate both complexities analytically and statistically. Using provided dissimilarity measures, our algorithm can, in addition to the general overlapping community structure of input networks, uncover also related subregions inside these communities in a form of hierarchy. We demonstrate this ability on real-world and artificial network examples.
NASA Astrophysics Data System (ADS)
Zainuddin, Zarita; Lai, Kee Huong; Ong, Pauline
2013-04-01
Artificial neural networks (ANNs) are powerful mathematical models that are used to solve complex real world problems. Wavelet neural networks (WNNs), which were developed based on the wavelet theory, are a variant of ANNs. During the training phase of WNNs, several parameters need to be initialized; including the type of wavelet activation functions, translation vectors, and dilation parameter. The conventional k-means and fuzzy c-means clustering algorithms have been used to select the translation vectors. However, the solution vectors might get trapped at local minima. In this regard, the evolutionary harmony search algorithm, which is capable of searching for near-optimum solution vectors, both locally and globally, is introduced to circumvent this problem. In this paper, the conventional k-means and fuzzy c-means clustering algorithms were hybridized with the metaheuristic harmony search algorithm. In addition to obtaining the estimation of the global minima accurately, these hybridized algorithms also offer more than one solution to a particular problem, since many possible solution vectors can be generated and stored in the harmony memory. To validate the robustness of the proposed WNNs, the real world problem of epileptic seizure detection was presented. The overall classification accuracy from the simulation showed that the hybridized metaheuristic algorithms outperformed the standard k-means and fuzzy c-means clustering algorithms.
An algorithm for spatial heirarchy clustering
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Velasco, F. R. D.
1981-01-01
A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.
Farjam, Reza; Tsien, Christina I.; Lawrence, Theodore S.; Cao, Yue; Department of Radiology, University of Michigan, 1500 East Medical Center Drive, Med Inn Building C478, Ann Arbor, Michigan 48109-5842; Department of Biomedical Engineering, University of Michigan, 2200 Bonisteel Boulevard, Ann Arbor, Michigan 48109-2099
2014-01-15
Purpose: To develop a pharmacokinetic modelfree framework to analyze the dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data for assessment of response of brain metastases to radiation therapy. Methods: Twenty patients with 45 analyzable brain metastases had MRI scans prior to whole brain radiation therapy (WBRT) and at the end of the 2-week therapy. The volumetric DCE images covering the whole brain were acquired on a 3T scanner with approximately 5 s temporal resolution and a total scan time of about 3 min. DCE curves from all voxels of the 45 brain metastases were normalized and then temporally aligned. A DCE matrix that is constructed from the aligned DCE curves of all voxels of the 45 lesions obtained prior to WBRT is processed by principal component analysis to generate the principal components (PCs). Then, the projection coefficient maps prior to and at the end of WBRT are created for each lesion. Next, a pattern recognition technique, based upon fuzzy-c-means clustering, is used to delineate the tumor subvolumes relating to the value of the significant projection coefficients. The relationship between changes in different tumor subvolumes and treatment response was evaluated to differentiate responsive from stable and progressive tumors. Performance of the PC-defined tumor subvolume was also evaluated by receiver operating characteristic (ROC) analysis in prediction of nonresponsive lesions and compared with physiological-defined tumor subvolumes. Results: The projection coefficient maps of the first three PCs contain almost all response-related information in DCE curves of brain metastases. The first projection coefficient, related to the area under DCE curves, is the major component to determine response while the third one has a complimentary role. In ROC analysis, the area under curve of 0.88 Â± 0.05 and 0.86 Â± 0.06 were achieved for the PC-defined and physiological-defined tumor subvolume in response assessment. Conclusions: The PC-defined subvolume of a brain metastasis could predict tumor response to therapy similar to the physiological-defined one, while the former is determined more rapidly for clinical decision-making support.
Farjam, Reza; Tsien, Christina I.; Lawrence, Theodore S.; Cao, Yue; Department of Radiology, University of Michigan, 1500 East Medical Center Drive, Med Inn Building C478, Ann Arbor, Michigan 48109-5842; Department of Biomedical Engineering, University of Michigan, 2200 Bonisteel Boulevard, Ann Arbor, Michigan 48109-2099
2014-01-15
Purpose: To develop a pharmacokinetic modelfree framework to analyze the dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data for assessment of response of brain metastases to radiation therapy. Methods: Twenty patients with 45 analyzable brain metastases had MRI scans prior to whole brain radiation therapy (WBRT) and at the end of the 2-week therapy. The volumetric DCE images covering the whole brain were acquired on a 3T scanner with approximately 5 s temporal resolution and a total scan time of about 3 min. DCE curves from all voxels of the 45 brain metastases were normalized and then temporally aligned. A DCE matrix that is constructed from the aligned DCE curves of all voxels of the 45 lesions obtained prior to WBRT is processed by principal component analysis to generate the principal components (PCs). Then, the projection coefficient maps prior to and at the end of WBRT are created for each lesion. Next, a pattern recognition technique, based upon fuzzy-c-means clustering, is used to delineate the tumor subvolumes relating to the value of the significant projection coefficients. The relationship between changes in different tumor subvolumes and treatment response was evaluated to differentiate responsive from stable and progressive tumors. Performance of the PC-defined tumor subvolume was also evaluated by receiver operating characteristic (ROC) analysis in prediction of nonresponsive lesions and compared with physiological-defined tumor subvolumes. Results: The projection coefficient maps of the first three PCs contain almost all response-related information in DCE curves of brain metastases. The first projection coefficient, related to the area under DCE curves, is the major component to determine response while the third one has a complimentary role. In ROC analysis, the area under curve of 0.88 ± 0.05 and 0.86 ± 0.06 were achieved for the PC-defined and physiological-defined tumor subvolume in response assessment. Conclusions: The PC-defined subvolume of a brain metastasis could predict tumor response to therapy similar to the physiological-defined one, while the former is determined more rapidly for clinical decision-making support.
Performance Comparison Of Evolutionary Algorithms For Image Clustering
NASA Astrophysics Data System (ADS)
Civicioglu, P.; Atasever, U. H.; Ozkan, C.; Besdok, E.; Karkinli, A. E.; Kesikoglu, A.
2014-09-01
Evolutionary computation tools are able to process real valued numerical sets in order to extract suboptimal solution of designed problem. Data clustering algorithms have been intensively used for image segmentation in remote sensing applications. Despite of wide usage of evolutionary algorithms on data clustering, their clustering performances have been scarcely studied by using clustering validation indexes. In this paper, the recently proposed evolutionary algorithms (i.e., Artificial Bee Colony Algorithm (ABC), Gravitational Search Algorithm (GSA), Cuckoo Search Algorithm (CS), Adaptive Differential Evolution Algorithm (JADE), Differential Search Algorithm (DSA) and Backtracking Search Optimization Algorithm (BSA)) and some classical image clustering techniques (i.e., k-means, fcm, som networks) have been used to cluster images and their performances have been compared by using four clustering validation indexes. Experimental test results exposed that evolutionary algorithms give more reliable cluster-centers than classical clustering techniques, but their convergence time is quite long.
Equilibriumlike extension of the invaded cluster algorithm
NASA Astrophysics Data System (ADS)
Balog, I.; Uzelac, K.
2008-05-01
We propose an extension of the nonequilibrium invaded cluster (IC) algorithm, which reestablishes a correct scaling of fluctuations at criticality and also self-adjusts to the critical temperature. We show that by introducing a single constraint to the intrinsic quantity of the IC algorithm the temperature becomes well defined and the sampling of the equilibrium ensemble is regained. The procedure is applied to the Potts model in two and three dimensions.
Genetic algorithm optimization of atomic clusters
Morris, J.R.; Deaven, D.M.; Ho, K.M.; Wang, C.Z.; Pan, B.C.; Wacker, J.G.; Turner, D.E. |
1996-12-31
The authors have been using genetic algorithms to study the structures of atomic clusters and related problems. This is a problem where local minima are easy to locate, but barriers between the many minima are large, and the number of minima prohibit a systematic search. They use a novel mating algorithm that preserves some of the geometrical relationship between atoms, in order to ensure that the resultant structures are likely to inherit the best features of the parent clusters. Using this approach, they have been able to find lower energy structures than had been previously obtained. Most recently, they have been able to turn around the building block idea, using optimized structures from the GA to learn about systematic structural trends. They believe that an effective GA can help provide such heuristic information, and (conversely) that such information can be introduced back into the algorithm to assist in the search process.
Ergen, Burhan
2014-01-01
This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT) and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM) clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases. PMID:24790590
Sparse subspace clustering: algorithm, theory, and applications.
Elhamifar, Ehsan; Vidal, René
2013-11-01
Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering. PMID:24051734
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various conditions were collected in conjunction with JSC postural control studies using a Tilt-Translation Device (TTD). The University of West Florida proposed applying the Fuzzy C-Means Clustering (FCM) Algorithms to this data with a view towards identifying various states and stages. Data supplied by NASA/JSC were submitted to the FCM algorithms in an attempt to identify and characterize cluster substructure in a mixed ensemble of pre- and post-adaptational TTD data. Following several unsuccessful trials with FCM using a full 11 dimensional data set, a set of two channels (features) were found to enable FCM to separate pre- from post-adaptational TTD data. The main conclusions are that: (1) FCM seems able to separate pre- from post-TTD subject no. 2 on the one trial that was used, but only in certain subintervals of time; and (2) Channels 2 (right rear transducer force) and 8 (hip sway bar) contain better discrimination information than other supersets and combinations of the data that were tried so far.
Chaotic map clustering algorithm for EEG analysis
NASA Astrophysics Data System (ADS)
Bellotti, R.; De Carlo, F.; Stramaglia, S.
2004-03-01
The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
CLASSY: An adaptive maximum likelihood clustering algorithm
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Rassbach, M. E. (Principal Investigator)
1979-01-01
The CLASSY clustering method alternates maximum likelihood iterative techniques for estimating the parameters of a mixture distribution with an adaptive procedure for splitting, combining, and eliminating the resultant components of the mixture. The adaptive procedure is based on maximizing the fit of a mixture of multivariate normal distributions to the observed data using its first through fourth central moments. It generates estimates of the number of multivariate normal components in the mixture as well as the proportion, mean vector, and covariance matrix for each component. The basic mathematical model for CLASSY and the actual operation of the algorithm as currently implemented are described. Results of applying CLASSY to real and simulated LANDSAT data are presented and compared with those generated by the iterative self-organizing clustering system algorithm on the same data sets.
Cluster compression algorithm: A joint clustering/data compression concept
NASA Technical Reports Server (NTRS)
Hilbert, E. E.
1977-01-01
The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.
Filamentary galaxy clustering - A mapping algorithm
NASA Technical Reports Server (NTRS)
Gott, J. R., III; Moody, J. E.; Turner, E. L.
1983-01-01
A simple and objective algorithm is presented which not only accurately identifies the filamentary structures in the Shane-Wirtanen galaxy count catalog, but also finds a set of visually less impressive filaments in a static hierarchical model of the clustering conducted by Soneira and Peebles (1978). The statistical properties of the elements in the model, while very similar to those in the data, show a significant excess of long and bright filaments in the data relative to the model. Two possible interpretations of these results are presented and discussed.
Improved Ant Colony Clustering Algorithm and Its Performance Study
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Parallelization of Edge Detection Algorithm using MPI on Beowulf Cluster
NASA Astrophysics Data System (ADS)
Haron, Nazleeni; Amir, Ruzaini; Aziz, Izzatdin A.; Jung, Low Tan; Shukri, Siti Rohkmah
In this paper, we present the design of parallel Sobel edge detection algorithm using Foster's methodology. The parallel algorithm is implemented using MPI message passing library and master/slave algorithm. Every processor performs the same sequential algorithm but on different part of the image. Experimental results conducted on Beowulf cluster are presented to demonstrate the performance of the parallel algorithm.
A Hybrid Monkey Search Algorithm for Clustering Analysis
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis. PMID:24772039
Efficient Fuzzy C-Means Architecture for Image Segmentation
Li, Hui-Ya; Hwang, Wen-Jyi; Chang, Chia-Yen
2011-01-01
This paper presents a novel VLSI architecture for image segmentation. The architecture is based on the fuzzy c-means algorithm with spatial constraint for reducing the misclassification rate. In the architecture, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. In addition, an efficient pipelined circuit is used for the updating process for accelerating the computational speed. Experimental results show that the the proposed circuit is an effective alternative for real-time image segmentation with low area cost and low misclassification rate. PMID:22163980
Energy Aware Clustering Algorithms for Wireless Sensor Networks
NASA Astrophysics Data System (ADS)
Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian
2011-09-01
The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
A Novel Clustering Algorithm Inspired by Membrane Computing
Luo, Xiaohui; Gao, Zhisheng; Wang, Jun; Pei, Zheng
2015-01-01
P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature. PMID:25874264
A novel clustering algorithm inspired by membrane computing.
Peng, Hong; Luo, Xiaohui; Gao, Zhisheng; Wang, Jun; Pei, Zheng
2015-01-01
P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature. PMID:25874264
Color sorting algorithm based on K-means clustering algorithm
NASA Astrophysics Data System (ADS)
Zhang, BaoFeng; Huang, Qian
2009-11-01
In the process of raisin production, there were a variety of color impurities, which needs be removed effectively. A new kind of efficient raisin color-sorting algorithm was presented here. First, the technology of image processing basing on the threshold was applied for the image pre-processing, and then the gray-scale distribution characteristic of the raisin image was found. In order to get the chromatic aberration image and reduce some disturbance, we made the flame image subtraction that the target image data minus the background image data. Second, Haar wavelet filter was used to get the smooth image of raisins. According to the different colors and mildew, spots and other external features, the calculation was made to identify the characteristics of their images, to enable them to fully reflect the quality differences between the raisins of different types. After the processing above, the image were analyzed by K-means clustering analysis method, which can achieve the adaptive extraction of the statistic features, in accordance with which, the image data were divided into different categories, thereby the categories of abnormal colors were distinct. By the use of this algorithm, the raisins of abnormal colors and ones with mottles were eliminated. The sorting rate was up to 98.6%, and the ratio of normal raisins to sorted grains was less than one eighth.
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models
NASA Technical Reports Server (NTRS)
Mjoisness, Eric; Castano, Rebecca; Gray, Alexander
1999-01-01
We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
Genetic algorithm for clustering mixed-type data
NASA Astrophysics Data System (ADS)
Yang, Shiueng-Bien; Wu, Yung-Gi
2011-01-01
The k-modes algorithm was recently proposed to cluster mixed-type data. However, in solving clustering problems, the k-modes algorithm and its variants usually ask the user to provide the number of clusters in the data sets. Unfortunately, the number of clusters is generally unknown to the user. Therefore, clustering becomes a tedious task of trial-and-error and the clustering result is often poor, especially when the number of clusters is large and not easy to guess. Also, it is hard for a user to select the weight between categorical and numeric attributes in the k-modes algorithm. In this paper, a genetic algorithm for clustering large data sets with mixed-type data is proposed, and this algorithm can automatically search the number of clusters in the data set. Also, a weight can be automatically selected by the genetic algorithm to prevent favoring either type of attribute. Experimental results illustrate the effectiveness of the genetic algorithm.
NASA Astrophysics Data System (ADS)
Dekkers, M. J.; Heslop, D.; Herrero-Bervera, E.; Acton, G.; Krasa, D.
2014-12-01
Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6.44.1' N, 91.56.1' W) on the Cocos Plate occurs in 15.2 Ma oceanic crust generated by superfast seafloor spreading. Presently, it is the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Here we interpret down-hole trends in several rock-magnetic parameters with fuzzy c-means cluster analysis, a multivariate statistical technique. The parameters include the magnetization ratio, the coercivity ratio, the coercive force, the low-field susceptibility, and the Curie temperature. By their combined, multivariate, analysis the effects of magmatic and hydrothermal processes can be evaluated. The optimal number of clusters - a key point in the analysis because there is no a priori information on this - was determined through a combination of approaches: by calculation of several cluster validity indices, by testing for coherent cluster distributions on non-linear-map plots, and importantly by testing for stability of the cluster solution from all possible starting points. Here, we consider a solution robust if the cluster allocation is independent of the starting configuration. The five-cluster solution appeared to be robust. Three clusters are distinguished in the extrusive segment of the Hole that express increasing hydrothermal alteration of the lavas. The sheeted dike and gabbro portions are characterized by two clusters, both with higher coercivities than in lava samples. Extensive alteration, however, can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. All clusters display rock magnetic characteristics in line with a stable NRM. This implies that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Determination of the absolute paleointensity with thermal techniques is not straightforward because of the propensity of oxyexsolution during laboratory heating and/or the presence of intergrowths. The upper part of the extrusive sequence, the granoblastic portion of the dikes, and moderately altered gabbros may contain a comparatively uncontaminated thermoremanent magnetization.
Incremental Clustering Algorithm For Earth Science Data Mining
Vatsavai, Raju
2009-01-01
Remote sensing data plays a key role in understanding the complex geographic phenomena. Clustering is a useful tool in discovering interesting patterns and structures within the multivariate geospatial data. One of the key issues in clustering is the specication of appropriate number of clusters, which is not obvious in many practical situations. In this paper we provide an extension of G-means algorithm which automatically learns the number of clusters present in the data and avoids over estimation of the number of clusters. Experimental evaluation on simulated and remotely sensed image data shows the effectiveness of our algorithm.
Clustering algorithms for Stokes space modulation format recognition.
Boada, Ricard; Borkowski, Robert; Monroy, Idelfonso Tafur
2015-06-15
Stokes space modulation format recognition (Stokes MFR) is a blind method enabling digital coherent receivers to infer modulation format information directly from a received polarization-division-multiplexed signal. A crucial part of the Stokes MFR is a clustering algorithm, which largely influences the performance of the detection process, particularly at low signal-to-noise ratios. This paper reports on an extensive study of six different clustering algorithms: k-means, expectation maximization, density-based DBSCAN and OPTICS, spectral clustering and maximum likelihood clustering, used for discriminating between dual polarization: BPSK, QPSK, 8-PSK, 8-QAM, and 16-QAM. We determine essential performance metrics for each clustering algorithm and modulation format under test: minimum required signal-to-noise ratio, detection accuracy and algorithm complexity. PMID:26193532
A spectral image clustering algorithm based on ant colony optimization
NASA Astrophysics Data System (ADS)
Ashok, Luca; Messinger, David W.
2012-06-01
Ant Colony Optimization (ACO) is a computational method used for optimization problems. The ACO algorithm uses virtual ants to create candidate solutions that are represented by paths on a mathematical graph. We develop an algorithm using ACO that takes a multispectral image as input and outputs a cluster map denoting a cluster label for each pixel. The algorithm does this through identication of a series of one dimensional manifolds on the spectral data cloud via the ACO approach, and then associates pixels to these paths based on their spectral similarity to the paths. We apply the algorithm to multispectral imagery to divide the pixels into clusters based on their representation by a low dimensional manifold estimated by the best t ant path" through the data cloud. We present results from application of the algorithm to a multispectral Worldview-2 image and show that it produces useful cluster maps.
APPROXIMATION ALGORITHMS FOR CLUSTERING TO MINIMIZE THE SUM OF DIAMETERS
Kopp, S.; Mortveit, H.S.; Reidys, S.M.
2000-02-01
We consider the problem of partitioning the nodes of a complete edge weighted graph into {kappa} clusters so as to minimize the sum of the diameters of the clusters. Since the problem is NP-complete, our focus is on the development of good approximation algorithms. When edge weights satisfy the triangle inequality, we present the first approximation algorithm for the problem. The approximation algorithm yields a solution that has no more than 10k clusters such the total diameter of these clusters is within a factor O(log (n/{kappa})) of the optimal value fork clusters, where n is the number of nodes in the complete graph. For any fixed {kappa}, we present an approximation algorithm that produces {kappa} clusters whose total diameter is at most twice the optimal value. When the distances are not required to satisfy the triangle inequality, we show that, unless P = NP, for any {rho} {ge} 1, there is no polynomial time approximation algorithm that can provide a performance guarantee of {rho} even when the number of clusters is fixed at 3. Other results obtained include a polynomial time algorithm for the problem when the underlying graph is a tree with edge weights.
A fuzzy clustering algorithm to detect planar and quadric shapes
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa
1992-01-01
In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.
The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm
Ahmed, Zakir Hussain
2014-01-01
The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148
A novel spatial clustering algorithm based on Delaunay triangulation
NASA Astrophysics Data System (ADS)
Yang, Xiankun; Cui, Weihong
2008-12-01
Exploratory data analysis is increasingly more necessary as larger spatial data is managed in electro-magnetic media. Spatial clustering is one of the very important spatial data mining techniques. So far, a lot of spatial clustering algorithms have been proposed. In this paper we propose a robust spatial clustering algorithm named SCABDT (Spatial Clustering Algorithm Based on Delaunay Triangulation). SCABDT demonstrates important advantages over the previous works. First, it discovers even arbitrary shape of cluster distribution. Second, in order to execute SCABDT, we do not need to know any priori nature of distribution. Third, like DBSCAN, Experiments show that SCABDT does not require so much CPU processing time. Finally it handles efficiently outliers.
Efficient Cluster Algorithm for Spin Glasses in Any Space Dimension
NASA Astrophysics Data System (ADS)
Zhu, Zheng; Ochoa, Andrew J.; Katzgraber, Helmut G.
2015-08-01
Spin systems with frustration and disorder are notoriously difficult to study, both analytically and numerically. While the simulation of ferromagnetic statistical mechanical models benefits greatly from cluster algorithms, these accelerated dynamics methods remain elusive for generic spin-glass-like systems. Here, we present a cluster algorithm for Ising spin glasses that works in any space dimension and speeds up thermalization by at least one order of magnitude at temperatures where thermalization is typically difficult. Our isoenergetic cluster moves are based on the Houdayer cluster algorithm for two-dimensional spin glasses and lead to a speedup over conventional state-of-the-art methods that increases with the system size. We illustrate the benefits of the isoenergetic cluster moves in two and three space dimensions, as well as the nonplanar chimera topology found in the D-Wave Inc. quantum annealing machine.
Efficient Cluster Algorithm for Spin Glasses in Any Space Dimension.
Zhu, Zheng; Ochoa, Andrew J; Katzgraber, Helmut G
2015-08-14
Spin systems with frustration and disorder are notoriously difficult to study, both analytically and numerically. While the simulation of ferromagnetic statistical mechanical models benefits greatly from cluster algorithms, these accelerated dynamics methods remain elusive for generic spin-glass-like systems. Here, we present a cluster algorithm for Ising spin glasses that works in any space dimension and speeds up thermalization by at least one order of magnitude at temperatures where thermalization is typically difficult. Our isoenergetic cluster moves are based on the Houdayer cluster algorithm for two-dimensional spin glasses and lead to a speedup over conventional state-of-the-art methods that increases with the system size. We illustrate the benefits of the isoenergetic cluster moves in two and three space dimensions, as well as the nonplanar chimera topology found in the D-Wave Inc. quantum annealing machine. PMID:26317743
A space-time cluster algorithm for stochastic processes.
Gulbahce, N.
2003-01-01
We introduce a space-time cluster algorithm that will generate histories of stochastic processes. Michael Zimmer introduced a spacetime MC algorithm for stochastic classical dynamics and he applied it to simulate Ising model with Glauber dynamics. Following his steps, we extended Brower and Tamayo's embedded {phi}{sup 4} dynamics to space and time. We believe our algorithm can be applied to more general stochastic systems. Why space-time? To be able to study nonequilibrium systems, we need to know the probability of the 'history' of a nonequilibrium state. Histories are the entire space-time configurations. Cluster algorithms first introduced by SW, are useful to overcome critical slowing down. Brower and Tamayo have mapped continous field variables to Ising spins, and have grown and flipped SW clusters to gain speed. Our algorithm is an extended version of theirs to space and time.
Ultrafast clustering algorithms for metagenomic sequence analysis
Fu, Limin; Niu, Beifang; Wu, Sitao; Wooley, John
2012-01-01
The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters. PMID:22772836
A Fast Implementation of the ISODATA Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2005-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
Clustering of Hadronic Showers with a Structural Algorithm
Charles, M.J.; /SLAC
2005-12-13
The internal structure of hadronic showers can be resolved in a high-granularity calorimeter. This structure is described in terms of simple components and an algorithm for reconstruction of hadronic clusters using these components is presented. Results from applying this algorithm to simulated hadronic Z-pole events in the SiD concept are discussed.
Shao, Jianyin; Tanner, Stephen W; Thompson, Nephi; Cheatham, Thomas E
2007-11-01
Molecular dynamics simulation methods produce trajectories of atomic positions (and optionally velocities and energies) as a function of time and provide a representation of the sampling of a given molecule's energetically accessible conformational ensemble. As simulations on the 10-100 ns time scale become routine, with sampled configurations stored on the picosecond time scale, such trajectories contain large amounts of data. Data-mining techniques, like clustering, provide one means to group and make sense of the information in the trajectory. In this work, several clustering algorithms were implemented, compared, and utilized to understand MD trajectory data. The development of the algorithms into a freely available C code library, and their application to a simple test example of random (or systematically placed) points in a 2D plane (where the pairwise metric is the distance between points) provide a means to understand the relative performance. Eleven different clustering algorithms were developed, ranging from top-down splitting (hierarchical) and bottom-up aggregating (including single-linkage edge joining, centroid-linkage, average-linkage, complete-linkage, centripetal, and centripetal-complete) to various refinement (means, Bayesian, and self-organizing maps) and tree (COBWEB) algorithms. Systematic testing in the context of MD simulation of various DNA systems (including DNA single strands and the interaction of a minor groove binding drug DB226 with a DNA hairpin) allows a more direct assessment of the relative merits of the distinct clustering algorithms. Additionally, means to assess the relative performance and differences between the algorithms, to dynamically select the initial cluster count, and to achieve faster data mining by "sieved clustering" were evaluated. Overall, it was found that there is no one perfect "one size fits all" algorithm for clustering MD trajectories and that the results strongly depend on the choice of atoms for the pairwise comparison. Some algorithms tend to produce homogeneously sized clusters, whereas others have a tendency to produce singleton clusters. Issues related to the choice of a pairwise metric, clustering metrics, which atom selection is used for the comparison, and about the relative performance are discussed. Overall, the best performance was observed with the average-linkage, means, and SOM algorithms. If the cluster count is not known in advance, the hierarchical or average-linkage clustering algorithms are recommended. Although these algorithms perform well, it is important to be aware of the limitations or weaknesses of each algorithm, specifically the high sensitivity to outliers with hierarchical, the tendency to generate homogenously sized clusters with means, and the tendency to produce small or singleton clusters with average-linkage. PMID:26636222
CCL: an algorithm for the efficient comparison of clusters
Hundt, R.; Schön, J. C.; Neelamraju, S.; Zagorac, J.; Jansen, M.
2013-01-01
The systematic comparison of the atomic structure of solids and clusters has become an important task in crystallography, chemistry, physics and materials science, in particular in the context of structure prediction and structure determination of nanomaterials. In this work, an efficient and robust algorithm for the comparison of cluster structures is presented, which is based on the mapping of the point patterns of the two clusters onto each other. This algorithm has been implemented as the module CCL in the structure visualization and analysis program KPLOT. PMID:23682193
MRI brain tumor segmentation based on improved fuzzy c-means method
NASA Astrophysics Data System (ADS)
Deng, Wankai; Xiao, Wei; Pan, Chao; Liu, Jianguo
2009-10-01
This paper focuses on the image segmentation, which is one of the key problems in medical image processing. A new medical image segmentation method is proposed based on fuzzy c- means algorithm and spatial information. Firstly, we classify the image into the region of interest and background using fuzzy c means algorithm. Then we use the information of the tissues' gradient and the intensity inhomogeneities of regions to improve the quality of segmentation. The sum of the mean variance in the region and the reciprocal of the mean gradient along the edge of the region are chosen as an objective function. The minimum of the sum is optimum result. The result shows that the clustering segmentation algorithm is effective.
A modified density-based clustering algorithm and its implementation
NASA Astrophysics Data System (ADS)
Ban, Zhihua; Liu, Jianguo; Yuan, Lulu; Yang, Hua
2015-12-01
This paper presents an improved density-based clustering algorithm based on the paper of clustering by fast search and find of density peaks. A distance threshold is introduced for the purpose of economizing memory. In order to reduce the probability that two points share the same density value, similarity is utilized to define proximity measure. We have tested the modified algorithm on a large data set, several small data sets and shape data sets. It turns out that the proposed algorithm can obtain acceptable results and can be applied more wildly.
Measuring Constraint-Set Utility for Partitional Clustering Algorithms
NASA Technical Reports Server (NTRS)
Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato
2006-01-01
Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.
K-Distributions: A New Algorithm for Clustering Categorical Data
NASA Astrophysics Data System (ADS)
Cai, Zhihua; Wang, Dianhong; Jiang, Liangxiao
Clustering is one of the most important tasks in data mining. The K-means algorithm is the most popular one for achieving this task because of its efficiency. However, it works only on numeric values although data sets in data mining often contain categorical values. Responding to this fact, the K-modes algorithm is presented to extend the K-means algorithm to categorical domains. Unfortunately, it suffers from computing the dissimilarity between each pair of objects and the mode of each cluster. Aiming at addressing these problems confronting K-modes, we present a new algorithm called K-distributions in this paper. We experimentally tested K-distributions using the well known 36 UCI data sets selected by Weka, and compared it to K-modes. The experimental results show that K-distributions significantly outperforms K-modes in term of clustering accuracy and log likelihood.
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.
He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej
2011-12-01
Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: ?-SNMF and ? -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression. PMID:22042156
Diluvian Clustering: A Fast, Effective Algorithm for Clustering Compositional and Other Data.
Ritchie, Nicholas W M
2015-10-01
Diluvian Clustering is an unsupervised grid-based clustering algorithm well suited to interpreting large sets of noisy compositional data. The algorithm is notable for its ability to identify clusters that are either compact or diffuse and clusters that have either a large number or a small number of members. Diluvian Clustering is fundamentally different from most algorithms previously applied to cluster compositional data in that its implementation does not depend upon a metric. The algorithm reduces in two-dimensions to a case for which there is an intuitive, real-world parallel. Furthermore, the algorithm has few tunable parameters and these parameters have intuitive interpretations. By eliminating the dependence on an explicit metric, it is possible to derive reasonable clusters with disparate variances like those in real-world compositional data sets. The algorithm is computationally efficient. While the worst case scales as O(NÂ˛) most cases are closer to O(N) where N is the number of discrete data points. On a mid-range 2014 vintage computer, a typical 20,000 particle, 30 element data set can be clustered in a fraction of a second. PMID:26299780
Sampling Within k-Means Algorithm to Cluster Large Datasets
Bejarano, Jeremy; Bose, Koushiki; Brannan, Tyler; Thomas, Anita; Adragni, Kofi; Neerchal, Nagaraj; Ostrouchov, George
2011-08-01
Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.
Note on Ultrametric Hierarchical Clustering Algorithms.
ERIC Educational Resources Information Center
Batagelj, Vladimir
1981-01-01
Milligan presented the conditions that are required for a hierarchical clustering strategy to be monotonic, based on a formula by Lance and Williams. The statement of the conditions is improved and shown to provide necessary and sufficient conditions. (Author/GK)
Cluster algorithms with empahsis on quantum spin systems
Gubernatis, J.E.; Kawashima, Naoki
1995-10-06
The purpose of this lecture is to discuss in detail the generalized approach of Kawashima and Gubernatis for the construction of cluster algorithms. We first present a brief refresher on the Monte Carlo method, describe the Swendsen-Wang algorithm, show how this algorithm follows from the Fortuin-Kastelyn transformation, and re=interpret this transformation in a form which is the basis of the generalized approach. We then derive the essential equations of the generalized approach. This derivation is remarkably simple if done from the viewpoint of probability theory, and the essential assumptions will be clearly stated. These assumptions are implicit in all useful cluster algorithms of which we are aware. They lead to a quite different perspective on cluster algorithms than found in the seminal works and in Ising model applications. Next, we illustrate how the generalized approach leads to a cluster algorithm for world-line quantum Monte Carlo simulations of Heisenberg models with S = 1/2. More succinctly, we also discuss the generalization of the Fortuin- Kasetelyn transformation to higher spin models and illustrate the essential steps for a S = 1 Heisenberg model. Finally, we summarize how to go beyond S = 1 to a general spin, XYZ model.
Functional clustering algorithm for the analysis of dynamic network data
NASA Astrophysics Data System (ADS)
Feldt, S.; Waddell, J.; Hetrick, V. L.; Berke, J. D.; ?ochowski, M.
2009-05-01
We formulate a technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.
NCUBE - A clustering algorithm based on a discretized data space
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Northouse, R. A.
1974-01-01
Cluster analysis involves the unsupervised grouping of data. The process provides an automatic procedure for generating known training samples for pattern classification. NCUBE, the clustering algorithm presented, is based upon the concept of imposing a gridwork on the data space. The NCUBE computer implementation of this concept provides an easily derived form of piecewise linear discrimination. This piecewise linear discrimination permits the separation of some types of data groups that are not linearly separable.
Particle flow reconstruction based on the directed tree clustering algorithm
Chakraborty, D.; Lima, J. G. R.; McIntosh, R.; Zutshi, V.
2006-10-27
We present the status of particle flow algorithm development at Northern Illinois University. A key element in our approach is the calorimeter-based directed tree clustering algorithm. We have attempted to identify and tackle the essential challenges and analyze the effect of several different approaches to the reconstruction of jet energies and the Z-boson mass. A number of possibilities have been studied, such as analog vs. digital energy measurement, hit density-based clustering and the use of single or multiple energy thresholds. We plan to use this PFA-based reconstruction to compare some of the proposed detector technologies and geometries.
ORCA: The Overdense Red-sequence Cluster Algorithm
NASA Astrophysics Data System (ADS)
Murphy, D. N. A.; Geach, J. E.; Bower, R. G.
2012-03-01
We present a new cluster-detection algorithm designed for the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) survey but with generic application to any multiband data. The method makes no prior assumptions about the properties of clusters other than (i) the similarity in colour of cluster galaxies (the 'red sequence'); and (ii) an enhanced projected surface density. The detector has three main steps: (i) it identifies cluster members by photometrically filtering the input catalogue to isolate galaxies in colour-magnitude space; (ii) a Voronoi diagram identifies regions of high surface density; and (iii) galaxies are grouped into clusters with a Friends-of-Friends technique. Where multiple colours are available, we require systems to exhibit sequences in two colours. In this paper, we present the algorithm and demonstrate it on two data sets. The first is a 7-deg2 sample of the deep Sloan Digital Sky Survey (SDSS) equatorial stripe (Stripe 82), from which we detect 97 clusters with zâ‰¤ 0.6. Benefitting from deeper data, we are 100 per cent complete in the maxBCG optically selected cluster catalogue (based on shallower single-epoch SDSS data) and find an additional 78 previously unidentified clusters. The second data set is a mock Medium Deep Survey Pan-STARRS catalogue, based on the Î› cold dark matter (Î›CDM) model and a semi-analytic galaxy formation recipe. Knowledge of galaxy-halo memberships in the mock catalogue allows for the quantification of algorithm performance. We detect 305 mock clusters in haloes with mass >1013 h-1 MâŠ™ at zâ‰˛ 0.6 and determine a spurious detection rate of <1 per cent, consistent with tests on the Stripe 82 catalogue. The detector performs well in the recovery of model Î›CDM clusters. At the median redshift of the catalogue, the algorithm achieves >75 per cent completeness down to halo masses of 1013.4 h-1 MâŠ™ and recovers >75 per cent of the total stellar mass of clusters in haloes down to 1013.8 h-1 MâŠ™. A companion paper presents the complete cluster catalogue over the full 270-deg2 Stripe 82 catalogue.
Fuzzy ellipsoidal shell clustering algorithm and detection of elliptical shapes
NASA Astrophysics Data System (ADS)
Dave, Rajesh N.; Patel, Kalpesh J.
1991-02-01
Fuzzyc-Efflpsoidal Shell (FCES) algorithm that utilizes hyper-ellipsoidal-shells as cluster prototypes is proposed. FCES is a generalization of the Fuzzy Shell Clustering (FSC) algorithm. The generalization is achieved by allowing the distances measured through a norm inducing matrix that is symmetric positive definite. In case offixed known norms the extension of FcS to FCS is straightforward. Two different strategies are recommended when the norm is unknown. The first strategy considers use of non-linear least-squared fit approach with fuzzy memberships as weights. The second approach considers norm inducing matrix as a variable of optimization thus making FCES an adaptive norm type algorithm. An adaptive norm theorem is presented. The results of first approach is used to detect ellipses having unequal sizes and orientations in two-dimensional data-sets. Non-linear equations of the FCES algorithm are more complex than those of the FSC algorithm. Numerical issues related to both the FCES algorithm and the FSC algorithm are discussed.
An algorithmic approach to mining unknown clusters in training data
NASA Astrophysics Data System (ADS)
Lynch, Robert S., Jr.; Willett, Peter K.
2006-04-01
In this paper, unsupervised learning is utilized to develop a method for mining unknown clusters in training data. The approach is based on the Bayesian Data Reduction Algorithm (BDRA), which has recently been developed into a patented system called the Data Extraction and Mining Software Tool (DEMIST). In the BDRA, the modeling assumption is that the discrete symbol probabilities of each class are a priori uniformly Dirichlet distributed, and it employs a "greedy" approach to selecting and discretizing the relevant features of each class for best performance. The primary metric for selecting and discretizing all relevant features contained in each class is an analytic formula for the probability of error conditioned on the training data. Thus, the primary contribution of this work is to demonstrate an algorithmic approach to finding multiple unknown clusters in training data, which represents an extension to the original data clustering algorithm. To illustrate performance, results are demonstrated using simulated data that contains multiple clusters. In general, the results of this work will demonstrate an effective method for finding multiple clusters in data mining applications.
Adaptive clustering algorithm for community detection in complex networks.
Ye, Zhenqing; Hu, Songnian; Yu, Jun
2008-10-01
Community structure is common in various real-world networks; methods or algorithms for detecting such communities in complex networks have attracted great attention in recent years. We introduced a different adaptive clustering algorithm capable of extracting modules from complex networks with considerable accuracy and robustness. In this approach, each node in a network acts as an autonomous agent demonstrating flocking behavior where vertices always travel toward their preferable neighboring groups. An optimal modular structure can emerge from a collection of these active nodes during a self-organization process where vertices constantly regroup. In addition, we show that our algorithm appears advantageous over other competing methods (e.g., the Newman-fast algorithm) through intensive evaluation. The applications in three real-world networks demonstrate the superiority of our algorithm to find communities that are parallel with the appropriate organization in reality. PMID:18999501
High-dimensional cluster analysis with the Masked EM Algorithm
Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.
2014-01-01
Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694
Coupled cluster algorithms for networks of shared memory parallel processors
NASA Astrophysics Data System (ADS)
Bentz, Jonathan L.; Olson, Ryan M.; Gordon, Mark S.; Schmidt, Michael W.; Kendall, Ricky A.
2007-05-01
As the popularity of using SMP systems as the building blocks for high performance supercomputers increases, so too increases the need for applications that can utilize the multiple levels of parallelism available in clusters of SMPs. This paper presents a dual-layer distributed algorithm, using both shared-memory and distributed-memory techniques to parallelize a very important algorithm (often called the "gold standard") used in computational chemistry, the single and double excitation coupled cluster method with perturbative triples, i.e. CCSD(T). The algorithm is presented within the framework of the GAMESS [M.W. Schmidt, K.K. Baldridge, J.A. Boatz, S.T. Elbert, M.S. Gordon, J.J. Jensen, S. Koseki, N. Matsunaga, K.A. Nguyen, S. Su, T.L. Windus, M. Dupuis, J.A. Montgomery, General atomic and molecular electronic structure system, J. Comput. Chem. 14 (1993) 1347-1363]. (General Atomic and Molecular Electronic Structure System) program suite and the Distributed Data Interface [M.W. Schmidt, G.D. Fletcher, B.M. Bode, M.S. Gordon, The distributed data interface in GAMESS, Comput. Phys. Comm. 128 (2000) 190]. (DDI), however, the essential features of the algorithm (data distribution, load-balancing and communication overhead) can be applied to more general computational problems. Timing and performance data for our dual-level algorithm is presented on several large-scale clusters of SMPs.
A novel hierarchical clustering algorithm for gene sequences
2012-01-01
Background Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of k-tuples in DNA sequence. Afterwards, a hierarchical procedure is applied to clustering DNA sequences based on the feature vectors. Results The proposed distance measure and clustering method are evaluated by clustering functionally related genes and by phylogenetic analysis. This method is also compared with BlastClust, CD-HIT-EST and some others. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences. Conclusions We introduced a novel clustering algorithm which is based on a new sequence similarity measure. It is effective in classifying DNA sequences with similar biological characteristics and in discovering the relationship among the sequences. PMID:22823405
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Johnson, J. K.
1979-01-01
An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey
Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer, Hans; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U., ICG /North Carolina U. /Chicago U., Astron. Astrophys. Ctr. /Chicago U., EFI /Michigan U. /Fermilab /Princeton U. Observ. /Garching, Max Planck Inst., MPE /Pittsburgh U. /Tokyo U., ICRR /Baltimore, Space Telescope Sci. /Penn State U. /Chicago U. /Stavropol, Astrophys. Observ. /Heidelberg, Max Planck Inst. Astron. /INI, SAO
2005-03-01
We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster. However, if we exclude clusters embedded in complex large-scale environments, we find that the velocity dispersion of the remaining clusters is as good an estimator of M{sub 200} as L{sub r}. The final C4 catalog will contain {approx_equal} 2500 clusters using the full SDSS data set and will represent one of the largest and most homogeneous samples of local clusters.
Improved Gravitation Field Algorithm and Its Application in Hierarchical Clustering
Zheng, Ming; Sun, Ying; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang
2012-01-01
Background Gravitation field algorithm (GFA) is a new optimization algorithm which is based on an imitation of natural phenomena. GFA can do well both for searching global minimum and multi-minima in computational biology. But GFA needs to be improved for increasing efficiency, and modified for applying to some discrete data problems in system biology. Method An improved GFA called IGFA was proposed in this paper. Two parts were improved in IGFA. The first one is the rule of random division, which is a reasonable strategy and makes running time shorter. The other one is rotation factor, which can improve the accuracy of IGFA. And to apply IGFA to the hierarchical clustering, the initial part and the movement operator were modified. Results Two kinds of experiments were used to test IGFA. And IGFA was applied to hierarchical clustering. The global minimum experiment was used with IGFA, GFA, GA (genetic algorithm) and SA (simulated annealing). Multi-minima experiment was used with IGFA and GFA. The two experiments results were compared with each other and proved the efficiency of IGFA. IGFA is better than GFA both in accuracy and running time. For the hierarchical clustering, IGFA is used to optimize the smallest distance of genes pairs, and the results were compared with GA and SA, singular-linkage clustering, UPGMA. The efficiency of IGFA is proved. PMID:23173043
A comparison of clustering algorithms in article recommendation system
NASA Astrophysics Data System (ADS)
Tantanasiriwong, Supaporn
2012-01-01
Recommendation system is considered a tool that can be used to recommend researchers about resources that are suitable for their research of interest by using content-based filtering. In this paper, clustering algorithm as an unsupervised learning is introduced for grouping objects based on their feature selection and similarities. The information of publication in Science Cited Index is used to be dataset for clustering as a feature extraction in terms of dimensionality reduction of these articles by comparing Latent Dirichlet Allocation (LDA), Principal Component Analysis (PCA), and K-Mean to determine the best algorithm. In my experiment, the selected database consists of 2625 documents extraction extracted from SCI corpus from 2001 to 2009. Clustering into ranks as 50,100,200,250 is used to consider and using F-Measure evaluate among them in three algorithms. The result of this paper showed that LDA technique given the accuracy up to 95.5% which is the highest effective than any other clustering technique.
A comparison of clustering algorithms in article recommendation system
NASA Astrophysics Data System (ADS)
Tantanasiriwong, Supaporn
2011-12-01
Recommendation system is considered a tool that can be used to recommend researchers about resources that are suitable for their research of interest by using content-based filtering. In this paper, clustering algorithm as an unsupervised learning is introduced for grouping objects based on their feature selection and similarities. The information of publication in Science Cited Index is used to be dataset for clustering as a feature extraction in terms of dimensionality reduction of these articles by comparing Latent Dirichlet Allocation (LDA), Principal Component Analysis (PCA), and K-Mean to determine the best algorithm. In my experiment, the selected database consists of 2625 documents extraction extracted from SCI corpus from 2001 to 2009. Clustering into ranks as 50,100,200,250 is used to consider and using F-Measure evaluate among them in three algorithms. The result of this paper showed that LDA technique given the accuracy up to 95.5% which is the highest effective than any other clustering technique.
ABCluster: the artificial bee colony algorithm for cluster global optimization.
Zhang, Jun; Dolg, Michael
2015-10-01
Global optimization of cluster geometries is of fundamental importance in chemistry and an interesting problem in applied mathematics. In this work, we introduce a relatively new swarm intelligence algorithm, i.e. the artificial bee colony (ABC) algorithm proposed in 2005, to this field. It is inspired by the foraging behavior of a bee colony, and only three parameters are needed to control it. We applied it to several potential functions of quite different nature, i.e., the Coulomb-Born-Mayer, Lennard-Jones, Morse, Z and Gupta potentials. The benchmarks reveal that for long-ranged potentials the ABC algorithm is very efficient in locating the global minimum, while for short-ranged ones it is sometimes trapped into a local minimum funnel on a potential energy surface of large clusters. We have released an efficient, user-friendly, and free program "ABCluster" to realize the ABC algorithm. It is a black-box program for non-experts as well as experts and might become a useful tool for chemists to study clusters. PMID:26327507
clusterMaker: a multi-algorithm clustering plugin for Cytoscape
2011-01-01
Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin clusterMaker provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the clusterMaker plugin. clusterMaker is available via the Cytoscape plugin manager. PMID:22070249
Synchronous Firefly Algorithm for Cluster Head Selection in WSN.
Baskaran, Madhusudhanan; Sadagopan, Chitra
2015-01-01
Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC. PMID:26495431
Synchronous Firefly Algorithm for Cluster Head Selection in WSN
Baskaran, Madhusudhanan; Sadagopan, Chitra
2015-01-01
Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC. PMID:26495431
Advanced defect detection algorithm using clustering in ultrasonic NDE
NASA Astrophysics Data System (ADS)
Gongzhang, Rui; Gachagan, Anthony
2016-02-01
A range of materials used in industry exhibit scattering properties which limits ultrasonic NDE. Many algorithms have been proposed to enhance defect detection ability, such as the well-known Split Spectrum Processing (SSP) technique. Scattering noise usually cannot be fully removed and the remaining noise can be easily confused with real feature signals, hence becoming artefacts during the image interpretation stage. This paper presents an advanced algorithm to further reduce the influence of artefacts remaining in A-scan data after processing using a conventional defect detection algorithm. The raw A-scan data can be acquired from either traditional single transducer or phased array configurations. The proposed algorithm uses the concept of unsupervised machine learning to cluster segmental defect signals from pre-processed A-scans into different classes. The distinction and similarity between each class and the ensemble of randomly selected noise segments can be observed by applying a classification algorithm. Each class will then be labelled as `legitimate reflector' or `artefacts' based on this observation and the expected probability of defection (PoD) and probability of false alarm (PFA) determined. To facilitate data collection and validate the proposed algorithm, a 5MHz linear array transducer is used to collect A-scans from both austenitic steel and Inconel samples. Each pulse-echo A-scan is pre-processed using SSP and the subsequent application of the proposed clustering algorithm has provided an additional reduction to PFA while maintaining PoD for both samples compared with SSP results alone.
An Improved Distance Matrix Computation Algorithm for Multicore Clusters
Al-Neama, Mohammed W.; Reda, Naglaa M.; Ghaleb, Fayed F. M.
2014-01-01
Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations. DistVect algorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes DistVect1 as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulates DistVect1 vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI. PMID:25013779
An improved distance matrix computation algorithm for multicore clusters.
Al-Neama, Mohammed W; Reda, Naglaa M; Ghaleb, Fayed F M
2014-01-01
Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations. DistVect algorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes DistVect1 as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulates DistVect1 vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI. PMID:25013779
Comparison of cluster expansion fitting algorithms for interactions at surfaces
NASA Astrophysics Data System (ADS)
Herder, Laura M.; Bray, Jason M.; Schneider, William F.
2015-10-01
Cluster expansions (CEs) are Ising-type interaction models that are increasingly used to model interaction and ordering phenomena at surfaces, such as the adsorbate-adsorbate interactions that control coverage-dependent adsorption or surface-vacancy interactions that control surface reconstructions. CEs are typically fit to a limited set of data derived from density functional theory (DFT) calculations. The CE fitting process involves iterative selection of DFT data points to include in a fit set and selection of interaction clusters to include in the CE. Here we compare the performance of three CE fitting algorithms-the MIT Ab-initio Phase Stability code (MAPS, the default in ATAT software), a genetic algorithm (GA), and a steepest descent (SD) algorithm-against synthetic data. The synthetic data is encoded in model Hamiltonians of varying complexity motivated by the observed behavior of atomic adsorbates on a face-centered-cubic transition metal close-packed (111) surface. We compare the performance of the leave-one-out cross-validation score against the true fitting error available from knowledge of the hidden CEs. For these systems, SD achieves lowest overall fitting and prediction error independent of the underlying system complexity. SD also most accurately predicts cluster interaction energies without ignoring or introducing extra interactions into the CE. MAPS achieves good results in fewer iterations, while the GA performs least well for these particular problems.
A GA-based clustering algorithm for large data sets with mixed numeric and categorical values
NASA Astrophysics Data System (ADS)
Li, Jie; Gao, Xinbo; Jiao, Licheng
2003-09-01
In the field of data mining, it is often encountered to perform cluster analysis on large data sets with mixed numeric and categorical values. However, most exciting clustering algorithms are only efficient for the numeric data rather than the mixed data set. For this purpose, this paper presents a novel clustering algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The genetic algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. Experimental result illustrates that the GA-based new clustering algorithm is feasible for the large data sets with mixed numeric and categorical values.
Mammographic images segmentation based on chaotic map clustering algorithm
2014-01-01
Background This work investigates the applicability of a novel clustering approach to the segmentation of mammographic digital images. The chaotic map clustering algorithm is used to group together similar subsets of image pixels resulting in a medically meaningful partition of the mammography. Methods The image is divided into pixels subsets characterized by a set of conveniently chosen features and each of the corresponding points in the feature space is associated to a map. A mutual coupling strength between the maps depending on the associated distance between feature space points is subsequently introduced. On the system of maps, the simulated evolution through chaotic dynamics leads to its natural partitioning, which corresponds to a particular segmentation scheme of the initial mammographic image. Results The system provides a high recognition rate for small mass lesions (about 94% correctly segmented inside the breast) and the reproduction of the shape of regions with denser micro-calcifications in about 2/3 of the cases, while being less effective on identification of larger mass lesions. Conclusions We can summarize our analysis by asserting that due to the particularities of the mammographic images, the chaotic map clustering algorithm should not be used as the sole method of segmentation. It is rather the joint use of this method along with other segmentation techniques that could be successfully used for increasing the segmentation performance and for providing extra information for the subsequent analysis stages such as the classification of the segmented ROI. PMID:24666766
Dynamically Incremental K-means++ Clustering Algorithm Based on Fuzzy Rough Set Theory
NASA Astrophysics Data System (ADS)
Li, Wei; Wang, Rujing; Jia, Xiufang; Jiang, Qing
Being classic K-means++ clustering algorithm only for static data, dynamically incremental K-means++ clustering algorithm (DK-Means++) is presented based on fuzzy rough set theory in this paper. Firstly, in DK-Means++ clustering algorithm, the formula of similar degree is improved by weights computed by using of the important degree of attributes which are reduced on the basis of rough fuzzy set theory. Secondly, new data only need match granular which was clustered by K-means++ algorithm or seldom new data is clustered by classic K-means++ algorithm in global data. In this way, that all data is re-clustered each time in dynamic data set is avoided, so the efficiency of clustering is improved. Throughout our experiments showing, DK-Means++ algorithm can objectively and efficiently deal with clustering problem of dynamically incremental data.
Gravitation field algorithm and its application in gene cluster
2010-01-01
Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA) which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM) of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab) are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA. PMID:20854683
Sweeney, Timothy E.; Chen, Albert C.; Gevaert, Olivier
2015-01-01
In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a sort of ‘dark art’, with multiple techniques available for assessing the validity of unsupervised clustering algorithms. Here, we present a new technique for unsupervised clustering that uses multiple clustering algorithms, multiple validity metrics, and progressively bigger subsets of the data to produce an intuitive 3D map of cluster stability that can help determine the optimal number of clusters in a data set, a technique we call COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL). COMMUNAL locally optimizes algorithms and validity measures for the data being used. We show its application to simulated data with a known K, and then apply this technique to several well-known cancer gene expression datasets, showing that COMMUNAL provides new insights into clustering behavior and stability in all tested cases. COMMUNAL is shown to be a useful tool for determining K in complex biological datasets, and is freely available as a package for R. PMID:26581809
Texture Detect on Rotary-Veneer Surface Based on Semi-Fuzzy Clustering Algorithm
NASA Astrophysics Data System (ADS)
Cheng, Wei; Liang, Ping; Cao, Suqun
The texture of rotary-veneer can interference in defects detection, this paper presented a modified semi-fuzzy clustering (SFC) algorithm. SFC algorithm incorporates Fisher discrimination method with fuzzy theory using fuzzy scatter matrix. By iteratively optimizing the fuzzy Fisher criterion function, the final clustering results are obtained. SFC algorithm exhibits its robustness and capability to obtain well separable clustering results. This algorithm can detect the texture and defects on rotary-veneer surface exactly.
A new detection algorithm for microcalcification clusters in mammographic screening
NASA Astrophysics Data System (ADS)
Xie, Weiying; Ma, Yide; Li, Yunsong
2015-05-01
A novel approach for microcalcification clusters detection is proposed. At the first time, we make a short analysis of mammographic images with microcalcification lesions to confirm these lesions have much greater gray values than normal regions. After summarizing the specific feature of microcalcification clusters in mammographic screening, we make more focus on preprocessing step including eliminating the background, image enhancement and eliminating the pectoral muscle. In detail, Chan-Vese Model is used for eliminating background. Then, we do the application of combining morphology method and edge detection method. After the AND operation and Sobel filter, we use Hough Transform, it can be seen that the result have outperformed for eliminating the pectoral muscle which is approximately the gray of microcalcification. Additionally, the enhancement step is achieved by morphology. We make effort on mammographic image preprocessing to achieve lower computational complexity. As well known, it is difficult to robustly achieve mammograms analysis due to low contrast between normal and lesion tissues, there are also much noise in such images. After a serious preprocessing algorithm, a method based on blob detection is performed to microcalcification clusters according their specific features. The proposed algorithm has employed Laplace operator to improve Difference of Gaussians (DoG) function in terms of low contrast images. A preliminary evaluation of the proposed method performs on a known public database namely MIAS, rather than synthetic images. The comparison experiments and Cohen's kappa coefficients all demonstrate that our proposed approach can potentially obtain better microcalcification clusters detection results in terms of accuracy, sensitivity and specificity.
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1992-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.
A decentralized fuzzy C-means-based energy-efficient routing protocol for wireless sensor networks.
Alia, Osama Moh'd
2014-01-01
Energy conservation in wireless sensor networks (WSNs) is a vital consideration when designing wireless networking protocols. In this paper, we propose a Decentralized Fuzzy Clustering Protocol, named DCFP, which minimizes total network energy dissipation to promote maximum network lifetime. The process of constructing the infrastructure for a given WSN is performed only once at the beginning of the protocol at a base station, which remains unchanged throughout the network's lifetime. In this initial construction step, a fuzzy C-means algorithm is adopted to allocate sensor nodes into their most appropriate clusters. Subsequently, the protocol runs its rounds where each round is divided into a CH-Election phase and a Data Transmission phase. In the CH-Election phase, the election of new cluster heads is done locally in each cluster where a new multicriteria objective function is proposed to enhance the quality of elected cluster heads. In the Data Transmission phase, the sensing and data transmission from each sensor node to their respective cluster head is performed and cluster heads in turn aggregate and send the sensed data to the base station. Simulation results demonstrate that the proposed protocol improves network lifetime, data delivery, and energy consumption compared to other well-known energy-efficient protocols. PMID:25162060
A Fast Clustering Algorithm for Data with a Few Labeled Instances
Yang, Jinfeng; Xiao, Yong; Wang, Jiabing; Ma, Qianli; Shen, Yanhua
2015-01-01
The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality. PMID:25861252
A fast clustering algorithm for data with a few labeled instances.
Yang, Jinfeng; Xiao, Yong; Wang, Jiabing; Ma, Qianli; Shen, Yanhua
2015-01-01
The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality. PMID:25861252
Thermodynamic Casimir effect in films: the exchange cluster algorithm.
Hasenbusch, Martin
2015-02-01
We study the thermodynamic Casimir force for films with various types of boundary conditions and the bulk universality class of the three-dimensional Ising model. To this end, we perform Monte Carlo simulations of the improved Blume-Capel model on the simple cubic lattice. In particular, we employ the exchange or geometric cluster cluster algorithm [Heringa and Blöte, Phys. Rev. E 57, 4976 (1998)]. In a previous work, we demonstrated that this algorithm allows us to compute the thermodynamic Casimir force for the plate-sphere geometry efficiently. It turns out that also for the film geometry a substantial reduction of the statistical error can achieved. Concerning physics, we focus on (O,O) boundary conditions, where O denotes the ordinary surface transition. These are implemented by free boundary conditions on both sides of the film. Films with such boundary conditions undergo a phase transition in the universality class of the two-dimensional Ising model. We determine the inverse transition temperature for a large range of thicknesses L(0) of the film and study the scaling of this temperature with L(0). In the neighborhood of the transition, the thermodynamic Casimir force is affected by finite size effects, where finite size refers to a finite transversal extension L of the film. We demonstrate that these finite size effects can be computed by using the universal finite size scaling function of the free energy of the two-dimensional Ising model. PMID:25768461
jClustering, an open framework for the development of 4D clustering algorithms.
Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J
2013-01-01
We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913
Textural defect detect using a revised ant colony clustering algorithm
NASA Astrophysics Data System (ADS)
Zou, Chao; Xiao, Li; Wang, Bingwen
2007-11-01
We propose a totally novel method based on a revised ant colony clustering algorithm (ACCA) to explore the topic of textural defect detection. In this algorithm, our efforts are mainly made on the definition of local irregularity measurement and the implementation of the revised ACCA. The local irregular measurement defined evaluates the local textural inconsistency of each pixel against their mini-environment. In our revised ACCA, the behaviors of each ant are divided into two steps: release pheromone and act. The quantity of pheromone released is proportional to the irregularity measurement; the actions of the ants to act next are chosen independently of each other in a stochastic way according to some evaluated heuristic knowledge. The independency of ants implies the inherent parallel computation architecture of this algorithm. We apply the proposed method in some typical textural images with defects. From the series of pheromone distribution map (PDM), it can be clearly seen that the pheromone distribution approaches the textual defects gradually. By some post-processing, the final distribution of pheromone can demonstrate the shape and area of the defects well.
GX-Means: A model-based divide and merge algorithm for geospatial image clustering
Vatsavai, Raju; Symons, Christopher T; Chandola, Varun; Jun, Goo
2011-01-01
One of the practical issues in clustering is the specification of the appropriate number of clusters, which is not obvious when analyzing geospatial datasets, partly because they are huge (both in size and spatial extent) and high dimensional. In this paper we present a computationally efficient model-based split and merge clustering algorithm that incrementally finds model parameters and the number of clusters. Additionally, we attempt to provide insights into this problem and other data mining challenges that are encountered when clustering geospatial data. The basic algorithm we present is similar to the G-means and X-means algorithms; however, our proposed approach avoids certain limitations of these well-known clustering algorithms that are pertinent when dealing with geospatial data. We compare the performance of our approach with the G-means and X-means algorithms. Experimental evaluation on simulated data and on multispectral and hyperspectral remotely sensed image data demonstrates the effectiveness of our algorithm.
Mekhmoukh, Abdenour; Mokrani, Karim
2015-11-01
In this paper, a new image segmentation method based on Particle Swarm Optimization (PSO) and outlier rejection combined with level set is proposed. A traditional approach to the segmentation of Magnetic Resonance (MR) images is the Fuzzy C-Means (FCM) clustering algorithm. The membership function of this conventional algorithm is sensitive to the outlier and does not integrate the spatial information in the image. The algorithm is very sensitive to noise and in-homogeneities in the image, moreover, it depends on cluster centers initialization. To improve the outlier rejection and to reduce the noise sensitivity of conventional FCM clustering algorithm, a novel extended FCM algorithm for image segmentation is presented. In general, in the FCM algorithm the initial cluster centers are chosen randomly, with the help of PSO algorithm the clusters centers are chosen optimally. Our algorithm takes also into consideration the spatial neighborhood information. These a priori are used in the cost function to be optimized. For MR images, the resulting fuzzy clustering is used to set the initial level set contour. The results confirm the effectiveness of the proposed algorithm. PMID:26299609
NASA Astrophysics Data System (ADS)
Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
User-Based Document Clustering by Redescribing Subject Descriptions with a Genetic Algorithm.
ERIC Educational Resources Information Center
Gordon, Michael D.
1991-01-01
Discussion of clustering of documents and queries in information retrieval systems focuses on the use of a genetic algorithm to adapt subject descriptions so that documents become more effective in matching relevant queries. Various types of clustering are explained, and simulation experiments used to test the genetic algorithm are described. (27…
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
ERIC Educational Resources Information Center
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by usingâ€¦
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
ERIC Educational Resources Information Center
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
Security clustering algorithm based on reputation in hierarchical peer-to-peer network
NASA Astrophysics Data System (ADS)
Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji
2013-03-01
For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.
On the impact of dissimilarity measure in k-modes clustering algorithm.
Ng, Michael K; Li, Mark Junjie; Huang, Joshua Zhexue; He, Zengyou
2007-03-01
This correspondence describes extensions to the k-modes algorithm for clustering categorical data. By modifying a simple matching dissimilarity measure for categorical objects, a heuristic approach was developed in [4], [12] which allows the use of the k-modes paradigm to obtain a cluster with strong intrasimilarity and to efficiently cluster large categorical data sets. The main aim of this paper is to rigorously derive the updating formula of the k-modes clustering algorithm with the new dissimilarity measure and the convergence of the algorithm under the optimization framework. PMID:17224620
Global optimization of clusters of rigid molecules using the artificial bee colony algorithm.
Zhang, Jun; Dolg, Michael
2016-01-20
The global optimization of molecular clusters is an important topic encountered in many fields of chemistry. In our previous work (Phys. Chem. Chem. Phys., 2015, 17, 24173), we successfully applied the recently introduced artificial bee colony (ABC) algorithm to the global optimization of atomic clusters and introduced the corresponding software "ABCluster". In the present work, ABCluster was extended to the optimization of clusters of rigid molecules. Here "rigid" means that all internal degrees of freedom of the constituent molecules are frozen. The algorithm was benchmarked by TIP4P water clusters (H2O)N (N ? 20), for which all global minima were successfully located. It was further applied to various clusters of different chemical nature: 10 microhydration clusters, 4 methanol microsolvation clusters, 4 nonpolar clusters and 2 ion-aromatic clusters. In all the cases we obtained results consistent with previous experimental or theoretical studies. PMID:26738568
Quantum clustering algorithm and its application in warning forecast of tourism emergency
NASA Astrophysics Data System (ADS)
Wang, Ruijie; Du, Junping; Zuo, Min; Tu, Xuyan
2007-12-01
In this paper, we combine quantum computation and clustering algorithm in data mining. In this algorithm, we give the suppose that around the clustering centers exits a potential field, in Hilbert space, we get the potential energy function through Schrödinger equation. We use this as rules to assign element to clusters. Finally, through the simulative experiment, we validated its validity and feasibility while applied in data mining of tourism emergency.
NASA Technical Reports Server (NTRS)
Lambeck, P. F.; Rice, D. P.
1976-01-01
Signature extension is intended to increase the space-time range over which a set of training statistics can be used to classify data without significant loss of recognition accuracy. A first cluster matching algorithm MASC (Multiplicative and Additive Signature Correction) was developed at the Environmental Research Institute of Michigan to test the concept of using associations between training and recognition area cluster statistics to define an average signature transformation. A more recent signature extension module CROP-A (Cluster Regression Ordered on Principal Axis) has shown evidence of making significant associations between training and recognition area cluster statistics, with the clusters to be matched being selected automatically by the algorithm.
A Formal Algorithm for Verifying the Validity of Clustering Results Based on Model Checking
Huang, Shaobin; Cheng, Yuan; Lang, Dapeng; Chi, Ronghua; Liu, Guofeng
2014-01-01
The limitations in general methods to evaluate clustering will remain difficult to overcome if verifying the clustering validity continues to be based on clustering results and evaluation index values. This study focuses on a clustering process to analyze crisp clustering validity. First, we define the properties that must be satisfied by valid clustering processes and model clustering processes based on program graphs and transition systems. We then recast the analysis of clustering validity as the problem of verifying whether the model of clustering processes satisfies the specified properties with model checking. That is, we try to build a bridge between clustering and model checking. Experiments on several datasets indicate the effectiveness and suitability of our algorithms. Compared with traditional evaluation indices, our formal method can not only indicate whether the clustering results are valid but, in the case the results are invalid, can also detect the objects that have led to the invalidity. PMID:24608823
Parallelization of the Wolff single-cluster algorithm.
Kaupuzs, J; Rims?ns, J; Melnik, R V N
2010-02-01
A parallel [open multiprocessing (OpenMP)] implementation of the Wolff single-cluster algorithm has been developed and tested for the three-dimensional (3D) Ising model. The developed procedure is generalizable to other lattice spin models and its effectiveness depends on the specific application at hand. The applicability of the developed methodology is discussed in the context of the applications, where a sophisticated shuffling scheme is used to generate pseudorandom numbers of high quality, and an iterative method is applied to find the critical temperature of the 3D Ising model with a great accuracy. For the lattice with linear size L=1024, we have reached the speedup about 1.79 times on two processors and about 2.67 times on four processors, as compared to the serial code. According to our estimation, the speedup about three times on four processors is reachable for the O(n) models with n> or =2. Furthermore, the application of the developed OpenMP code allows us to simulate larger lattices due to greater operative (shared) memory available. PMID:20365669
NASA Astrophysics Data System (ADS)
Morales-Esteban, Antonio; Martínez-Álvarez, Francisco; Scitovski, Sanja; Scitovski, Rudolf
2014-12-01
In this paper we construct an efficient adaptive Mahalanobis k-means algorithm. In addition, we propose a new efficient algorithm to search for a globally optimal partition obtained by using the adoptive Mahalanobis distance-like function. The algorithm is a generalization of the previously proposed incremental algorithm (Scitovski and Scitovski, 2013). It successively finds optimal partitions with k = 2 , 3 , … clusters. Therefore, it can also be used for the estimation of the most appropriate number of clusters in a partition by using various validity indexes. The algorithm has been applied to the seismic catalogues of Croatia and the Iberian Peninsula. Both regions are characterized by a moderate seismic activity. One of the main advantages of the algorithm is its ability to discover not only circular but also elliptical shapes, whose geometry fits the faults better. Three seismogenic zonings are proposed for Croatia and two for the Iberian Peninsula and adjacent areas, according to the clusters discovered by the algorithm.
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect. PMID:25435862
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect. PMID:25435862
C-element: A New Clustering Algorithm to Find High Quality Functional Modules in PPI Networks
Ghasemi, Mahdieh; Rahgozar, Maseud; Bidkhori, Gholamreza; Masoudi-Nejad, Ali
2013-01-01
Graph clustering algorithms are widely used in the analysis of biological networks. Extracting functional modules in protein-protein interaction (PPI) networks is one such use. Most clustering algorithms whose focuses are on finding functional modules try either to find a clique like sub networks or to grow clusters starting from vertices with high degrees as seeds. These algorithms do not make any difference between a biological network and any other networks. In the current research, we present a new procedure to find functional modules in PPI networks. Our main idea is to model a biological concept and to use this concept for finding good functional modules in PPI networks. In order to evaluate the quality of the obtained clusters, we compared the results of our algorithm with those of some other widely used clustering algorithms on three high throughput PPI networks from Sacchromyces Cerevisiae, Homo sapiens and Caenorhabditis elegans as well as on some tissue specific networks. Gene Ontology (GO) analyses were used to compare the results of different algorithms. Each algorithm's result was then compared with GO-term derived functional modules. We also analyzed the effect of using tissue specific networks on the quality of the obtained clusters. The experimental results indicate that the new algorithm outperforms most of the others, and this improvement is more significant when tissue specific networks are used. PMID:24039752
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Deb, Suash; Yang, Xin-She
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Cluster algorithms in the O(4). phi. sup 4 theory in four dimensions
Frick, C.; Jansen, K.; Seuferling, P. Hochstleistungsrechenzentrum c/o Kernforschungsanlage Juelich G.m.b.H., P.O. Box 1913, D-5170 Juelich, Federal Republic of Germany Deutsches Elektronen-Synchroton DESY, Hamburg, D-2000 Hamburg, Federal Republic of Germany)
1989-12-11
We study the surface and the reflection cluster algorithms, which have been developed for systems with global continuous symmetries by Niedermayer and Wolff, respectively, in the O(4) spin model in four dimensions. The surface algorithm appears not to be substantially better than local algorithms for this model. We show that the reflection algorithm has big advantages: It fights critical slowing down and due to the use of improved estimators it gives a variance reduction.
Clustering dynamic textures with the hierarchical em algorithm for modeling video.
Mumtaz, Adeel; Coviello, Emanuele; Lanckriet, Gert R G; Chan, Antoni B
2013-07-01
Dynamic texture (DT) is a probabilistic generative model, defined over space and time, that represents a video as the output of a linear dynamical system (LDS). The DT model has been applied to a wide variety of computer vision problems, such as motion segmentation, motion classification, and video registration. In this paper, we derive a new algorithm for clustering DT models that is based on the hierarchical EM algorithm. The proposed clustering algorithm is capable of both clustering DTs and learning novel DT cluster centers that are representative of the cluster members in a manner that is consistent with the underlying generative probabilistic model of the DT. We also derive an efficient recursive algorithm for sensitivity analysis of the discrete-time Kalman smoothing filter, which is used as the basis for computing expectations in the E-step of the HEM algorithm. Finally, we demonstrate the efficacy of the clustering algorithm on several applications in motion analysis, including hierarchical motion clustering, semantic motion annotation, and learning bag-of-systems (BoS) codebooks for dynamic texture recognition. PMID:23681990
A density-based algorithm for discovering clusters in large spatial databases with noise
Ester, M.; Kriegel, H.P.; Sander, J.; Xu Xiaowei
1996-12-31
Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DB SCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLARANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.
An adaptive spatial clustering algorithm based on the minimum spanning tree-like
NASA Astrophysics Data System (ADS)
Deng, Min; Liu, Qiliang; Li, Guangqiang; Cheng, Tao
2009-10-01
Spatial clustering is an important means for spatial data mining and spatial analysis, and it can be used to discover the potential rules and outliers among the spatial data. Most existing spatial clustering methods cannot deal with the uneven density of the data and usually require predefined parameters which are hard to justify. In order to overcome such limitations, we firstly propose the concept of edge variation factor based upon the definition of distance variation among the entities in the spatial neighborhood. Then, an approach is presented to construct the minimum spanning tree-like (MST-L). Further, an adaptive MST-L based spatial clustering algorithm (AMSTLSC) is developed in this paper. The spatial clustering algorithm only involves the setting of the threshold of edge variation factor as an input parameter, which is easily made with the support of little priori information. Through this parameter, a series of MST-L can be automatically generated from the high-density region to the low-density one, where each MST-L represents a cluster. As a result, the algorithm proposed in this paper can adapt to the change of local density among spatial points. This property is also called the adaptiveness. Finally, two tests are implemented to demonstrate that the AMSTLSC algorithm is very robust and suitable to find the clusters with different shapes. Especially the algorithm has good adaptiveness. A comparative test is made to further prove the AMSTLSC algorithm better than classic DBSCAN algorithm.
A scalable and practical one-pass clustering algorithm for recommender system
NASA Astrophysics Data System (ADS)
Khalid, Asra; Ghazanfar, Mustansar Ali; Azam, Awais; Alahmari, Saad Ali
2015-12-01
KMeans clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple, fast, and accurate. We show empirically that the proposed algorithm outperforms K-Means in terms of recommendation and training time while maintaining a good level of accuracy.
NASA Technical Reports Server (NTRS)
Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William
2006-01-01
We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.
Doostparast Torshizi, Abolfazl; Fazel Zarandi, Mohammad Hossein
2015-09-01
This paper considers microarray gene expression data clustering using a novel two stage meta-heuristic algorithm based on the concept of ?-planes in general type-2 fuzzy sets. The main aim of this research is to present a powerful data clustering approach capable of dealing with highly uncertain environments. In this regard, first, a new objective function using ?-planes for general type-2 fuzzy c-means clustering algorithm is represented. Then, based on the philosophy of the meta-heuristic optimization framework 'Simulated Annealing', a two stage optimization algorithm is proposed. The first stage of the proposed approach is devoted to the annealing process accompanied by its proposed perturbation mechanisms. After termination of the first stage, its output is inserted to the second stage where it is checked with other possible local optima through a heuristic algorithm. The output of this stage is then re-entered to the first stage until no better solution is obtained. The proposed approach has been evaluated using several synthesized datasets and three microarray gene expression datasets. Extensive experiments demonstrate the capabilities of the proposed approach compared with some of the state-of-the-art techniques in the literature. PMID:25035233
A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
A novel artificial bee colony based clustering algorithm for categorical data.
Ji, Jinchao; Pang, Wei; Zheng, Yanlin; Wang, Zhe; Ma, Zhiqiang
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
Zhu, Bohui; Ding, Yongsheng; Hao, Kuangrong
2013-01-01
This paper presents a novel maximum margin clustering method with immune evolution (IEMMC) for automatic diagnosis of electrocardiogram (ECG) arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias. PMID:23690875
NASA Astrophysics Data System (ADS)
Sun, Xu; Yang, Lina; Gao, Lianru; Zhang, Bing; Li, Shanshan; Li, Jun
2015-01-01
Center-oriented hyperspectral image clustering methods have been widely applied to hyperspectral remote sensing image processing; however, the drawbacks are obvious, including the over-simplicity of computing models and underutilized spatial information. In recent years, some studies have been conducted trying to improve this situation. We introduce the artificial bee colony (ABC) and Markov random field (MRF) algorithms to propose an ABC-MRF-cluster model to solve the problems mentioned above. In this model, a typical ABC algorithm framework is adopted in which cluster centers and iteration conditional model algorithm's results are considered as feasible solutions and objective functions separately, and MRF is modified to be capable of dealing with the clustering problem. Finally, four datasets and two indices are used to show that the application of ABC-cluster and ABC-MRF-cluster methods could help to obtain better image accuracy than conventional methods. Specifically, the ABC-cluster method is superior when used for a higher power of spectral discrimination, whereas the ABC-MRF-cluster method can provide better results when used for an adjusted random index. In experiments on simulated images with different signal-to-noise ratios, ABC-cluster and ABC-MRF-cluster showed good stability.
NASA Astrophysics Data System (ADS)
Zhang, Xian-Kun; Tian, Xue; Li, Ya-Nan; Song, Chen
2014-08-01
The label propagation algorithm (LPA) is a graph-based semi-supervised learning algorithm, which can predict the information of unlabeled nodes by a few of labeled nodes. It is a community detection method in the field of complex networks. This algorithm is easy to implement with low complexity and the effect is remarkable. It is widely applied in various fields. However, the randomness of the label propagation leads to the poor robustness of the algorithm, and the classification result is unstable. This paper proposes a LPA based on edge clustering coefficient. The node in the network selects a neighbor node whose edge clustering coefficient is the highest to update the label of node rather than a random neighbor node, so that we can effectively restrain the random spread of the label. The experimental results show that the LPA based on edge clustering coefficient has made improvement in the stability and accuracy of the algorithm.
Datta, Susmita; Datta, Somnath
2006-01-01
Background A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. In most cases, a post hoc analysis is done to see if the genes in the same clusters can be functionally correlated. While past successes of such analyses have often been reported in a number of microarray studies (most of which used the standard hierarchical clustering, UPGMA, with one minus the Pearson's correlation coefficient as a measure of dissimilarity), often times such groupings could be misleading. More importantly, a systematic evaluation of the entire set of clusters produced by such unsupervised procedures is necessary since they also contain genes that are seemingly unrelated or may have more than one common function. Here we quantify the performance of a given unsupervised clustering algorithm applied to a given microarray study in terms of its ability to produce biologically meaningful clusters using a reference set of functional classes. Such a reference set may come from prior biological knowledge specific to a microarray study or may be formed using the growing databases of gene ontologies (GO) for the annotated genes of the relevant species. Results In this paper, we introduce two performance measures for evaluating the results of a clustering algorithm in its ability to produce biologically meaningful clusters. The first measure is a biological homogeneity index (BHI). As the name suggests, it is a measure of how biologically homogeneous the clusters are. This can be used to quantify the performance of a given clustering algorithm such as UPGMA in grouping genes for a particular data set and also for comparing the performance of a number of competing clustering algorithms applied to the same data set. The second performance measure is called a biological stability index (BSI). For a given clustering algorithm and an expression data set, it measures the consistency of the clustering algorithm's ability to produce biologically meaningful clusters when applied repeatedly to similar data sets. A good clustering algorithm should have high BHI and moderate to high BSI. We evaluated the performance of ten well known clustering algorithms on two gene expression data sets and identified the optimal algorithm in each case. The first data set deals with SAGE profiles of differentially expressed tags between normal and ductal carcinoma in situ samples of breast cancer patients. The second data set contains the expression profiles over time of positively expressed genes (ORF's) during sporulation of budding yeast. Two separate choices of the functional classes were used for this data set and the results were compared for consistency. Conclusion Functional information of annotated genes available from various GO databases mined using ontology tools can be used to systematically judge the results of an unsupervised clustering algorithm as applied to a gene expression data set in clustering genes. This information could be used to select the right algorithm from a class of clustering algorithms for the given data set. PMID:16945146
A Speed-Up Hierarchical Compact Clustering Algorithm for Dynamic Document Collections
NASA Astrophysics Data System (ADS)
Gil-García, Reynaldo; Pons-Porrata, Aurora
In this paper, a speed-up version of the Dynamic Hierarchical Compact (DHC) algorithm is presented. Our approach profits from the cluster hierarchy already built to reduce the number of calculated similarities. The experimental results on several benchmark text collections show that the proposed method is significantly faster than DHC while achieving approximately the same clustering quality.
A new clustering algorithm applicable to multispectral and polarimetric SAR images
NASA Technical Reports Server (NTRS)
Wong, Yiu-Fai; Posner, Edward C.
1993-01-01
We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.
A fast general-purpose clustering algorithm based on FPGAs for high-throughput data processing
NASA Astrophysics Data System (ADS)
Annovi, A.; Beretta, M.
2010-05-01
We present a fast general-purpose algorithm for high-throughput clustering of data "with a two-dimensional organization". The algorithm is designed to be implemented with FPGAs or custom electronics. The key feature is a processing time that scales linearly with the amount of data to be processed. This means that clustering can be performed in pipeline with the readout, without suffering from combinatorial delays due to looping multiple times through all the data. This feature makes this algorithm especially well suited for problems where the data have high density, e.g. in the case of tracking devices working under high-luminosity condition such as those of LHC or super-LHC. The algorithm is organized in two steps: the first step (core) clusters the data; the second step analyzes each cluster of data to extract the desired information. The current algorithm is developed as a clustering device for modern high-energy physics pixel detectors. However, the algorithm has much broader field of applications. In fact, its core does not specifically rely on the kind of data or detector it is working for, while the second step can and should be tailored for a given application. For example, in case of spatial measurement with silicon pixel detectors, the second step performs center of charge calculation. Applications can thus be foreseen to other detectors and other scientific fields ranging from HEP calorimeters to medical imaging. An additional advantage of this two steps approach is that the typical clustering related calculations (second step) are separated from the combinatorial complications of clustering. This separation simplifies the design of the second step and it enables it to perform sophisticated calculations achieving offline quality in online applications. The algorithm is general purpose in the sense that only minimal assumptions on the kind of clustering to be performed are made.
NASA Astrophysics Data System (ADS)
Huang, Mingqiang; Hao, Liqing; Guo, Xiaoyong; Hu, Changjin; Gu, Xuejun; Zhao, Weixiong; Wang, Zhenya; Fang, Li; Zhang, Weijun
2013-01-01
Experiments for formation of secondary organic aerosol (SOA) from photooxidation of 1,3,5-trimethylbenzene in the CH3ONO/NO/air mixture were carried out in the laboratory chamber. The size and chemical composition of the resultant individual particles were measured in real-time by an aerosol laser time of flight mass spectrometer (ALTOFMS) recently designed in our group. We also developed Fuzzy C-Means (FCM) algorithm to classify the mass spectra of large numbers of SOA particles. The study first started with mixed particles generated from the standards benzaldehyde, phenol, benzoic acid, and nitrobenzene solutions to test the feasibility of application of the FCM. The FCM was then used to extract out potential aerosol classes in the chamber experiments. The results demonstrate that FCM allowed a clear identification of ten distinct chemical particle classes in this study, namely, 3,5-dimethylbenzoic acid, 3,5-dimethylbenzaldehyde, 2,4,6-trimethyl-5-nitrophenol, 2-methyl-4-oxo-2-pentenal, 2,4,6-trimethylphenol, 3,5-dimethyl-2-furanone, glyoxal, and high-molecular-weight (HMW) components. Compared to offline method such as gas chromatography-mass spectrometry (GC-MS) measurement, the real-time ALTOFMS detection approach coupled with the FCM data processing algorithm can make cluster analysis of SOA successfully and provide more information of products. Thus ALTOFMS is a useful tool to reveal the formation and transformation processes of SOA particles in smog chambers.
A vector reconstruction based clustering algorithm particularly for large-scale text collection.
Liu, Ming; Wu, Chong; Chen, Lei
2015-03-01
Along with the fast evolvement of internet technology, internet users have to face the large amount of textual data every day. Apparently, organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection, which mainly attributes to the high-dimensional vector space and semantic similarity among texts. To effectively and efficiently cluster large-scale text collection, this paper puts forward a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster's representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature's weight is fine-tuned by iterative process similar to self-organizing-mapping (SOM) algorithm. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster's representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high-quality performances on both small-scale and large-scale text collections. PMID:25539500
An improved clustering algorithm of tunnel monitoring data for cloud computing.
Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing
2014-01-01
With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network.
Vimalarani, C; Subramanian, R; Sivanandam, S N
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption. PMID:26881273
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network
Vimalarani, C.; Subramanian, R.; Sivanandam, S. N.
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption. PMID:26881273
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing
Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing
2014-01-01
With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
A hybrid algorithm for clustering of time series data based on affinity search technique.
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966
Ju, Chunhua
2013-01-01
Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
NASA Astrophysics Data System (ADS)
Gatos, Ilias; Tsantis, Stavros; Skouroliakou, Aikaterini; Theotokas, Ioannis; Zoumpoulis, Pavlos S.; Kagadis, George C.
2015-09-01
The aim of the present study is to determine an optimal elasticity cut-off value for discriminating Healthy from Pathological fibrotic patients by means of Fuzzy C-Means automatic segmentation and maximum participation cluster mean value employment in Shear Wave Elastography (SWE) images. The clinical dataset comprised 32 subjects (16 Healthy and 16 histological or Fibroscan verified Chronic Liver Disease). An experienced Radiologist performed SWE measurement placing a region of interest (ROI) on each subject's right liver lobe providing a SWE image for each patient. Subsequently Fuzzy C-Means clustering was performed on every SWE image utilizing 5 clusters. Mean Stiffness value and pixels number of each cluster were calculated. The mean stiffness value feature of the cluster with maximum pixels number was then fed as input for ROC analysis. The selected Mean Stiffness value feature an Area Under the Curve (AUC) of 0.8633 with Optimum Cut-off value of 7.5 kPa with sensitivity and specificity values of 0.8438 and 0.875 and balanced accuracy of 0.8594. Examiner's classification measurements exhibited sensitivity, specificity and balanced accuracy value of 0.8125 with 7.1 kPa cutoff value. A new promising automatic algorithm was implemented with more objective criteria of defining optimum elasticity cut-off values for discriminating fibrosis stages for SWE. More subjects are needed in order to define if this algorithm is an objective tool to outperform manual ROI selection.
Uy, D.L.
1996-02-01
An algorithm for detection and identification of image clusters or {open_quotes}blobs{close_quotes} based on color information for an autonomous mobile robot is developed. The input image data are first processed using a crisp color fuszzyfier, a binary smoothing filter, and a median filter. The processed image data is then inputed to the image clusters detection and identification program. The program employed the concept of {open_quotes}elastic rectangle{close_quotes}that stretches in such a way that the whole blob is finally enclosed in a rectangle. A C-program is develop to test the algorithm. The algorithm is tested only on image data of 8x8 sizes with different number of blobs in them. The algorithm works very in detecting and identifying image clusters.
An approximation polynomial-time algorithm for a sequence bi-clustering problem
NASA Astrophysics Data System (ADS)
Kel'manov, A. V.; Khamidullin, S. A.
2015-06-01
We consider a strongly NP-hard problem of partitioning a finite sequence of vectors in Euclidean space into two clusters using the criterion of the minimal sum of the squared distances from the elements of the clusters to the centers of the clusters. The center of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The center of the other cluster is fixed at the origin. Moreover, the partition is such that the difference between the indices of two successive vectors in the first cluster is bounded above and below by prescribed constants. A 2-approximation polynomial-time algorithm is proposed for this problem.
A fast density-based clustering algorithm for real-time Internet of Things stream.
Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream
Ying Wah, Teh
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
Cluster-Based Solidification and Growth Algorithm for Decagonal Quasicrystals
NASA Astrophysics Data System (ADS)
Kuczera, P.; Steurer, W.
2015-08-01
A novel approach is used for the simulation of decagonal quasicrystal (DQC) solidification and growth. It is based on the observation that in well-ordered DQCs the atoms are largely arranged along quasiperiodically spaced planes parallel to the tenfold axis, running throughout the whole structure in five different directions. The structures themselves can be described as quasiperiodic arrangements of decagonal columnar clusters (cluster covering) that partially overlap in a systematic way. Based on these findings, we define a cluster interaction model within the mean field approximation, with effectively asymmetric interactions ranging beyond the nearest neighbors. In our Monte Carlo simulations, this leads to a long-range ordered quasiperiodic ground state. Indications of two finite-temperature unlocking phase transitions are observed, and are related to the two fundamental length scales that are characteristic for the system.
A Community Detection Algorithm Based on Topology Potential and Spectral Clustering
Wang, Zhixiao; Chen, Zhaotong; Zhao, Ya; Chen, Shaoda
2014-01-01
Community detection is of great value for complex networks in understanding their inherent law and predicting their behavior. Spectral clustering algorithms have been successfully applied in community detection. This kind of methods has two inadequacies: one is that the input matrixes they used cannot provide sufficient structural information for community detection and the other is that they cannot necessarily derive the proper community number from the ladder distribution of eigenvector elements. In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with nodes' topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes. Experiments results showed that the new algorithm gave excellent performance on artificial networks and real world networks and outperforms other community detection methods. PMID:25147846
Neural-network-assisted genetic algorithm applied to silicon clusters
Marim, L.R.; Lemes, M.R.; Pino, A. Jr. dal
2003-03-01
Recently, a new optimization procedure that combines the power of artificial neural-networks with the versatility of the genetic algorithm (GA) was introduced. This method, called neural-network-assisted genetic algorithm (NAGA), uses a neural network to restrict the search space and it is expected to speed up the solution of global optimization problems if some previous information is available. In this paper, we have tested NAGA to determine the ground-state geometry of Si{sub n} (10{<=}n{<=}15) according to a tight-binding total-energy method. Our results indicate that NAGA was able to find the desired global minimum of the potential energy for all the test cases and it was at least ten times faster than pure genetic algorithm.
Fuzzy-rough supervised attribute clustering algorithm and classification of microarray data.
Maji, Pradipta
2011-02-01
One of the major tasks with gene expression data is to find groups of coregulated genes whose collective expression is strongly associated with sample categories. In this regard, a new clustering algorithm, termed as fuzzy-rough supervised attribute clustering (FRSAC), is proposed to find such groups of genes. The proposed algorithm is based on the theory of fuzzy-rough sets, which directly incorporates the information of sample categories into the gene clustering process. A new quantitative measure is introduced based on fuzzy-rough sets that incorporates the information of sample categories to measure the similarity among genes. The proposed algorithm is based on measuring the similarity between genes using the new quantitative measure, whereby redundancy among the genes is removed. The clusters are refined incrementally based on sample categories. The effectiveness of the proposed FRSAC algorithm, along with a comparison with existing supervised and unsupervised gene selection and clustering algorithms, is demonstrated on six cancer and two arthritis data sets based on the class separability index and predictive accuracy of the naive Bayes' classifier, the K-nearest neighbor rule, and the support vector machine. PMID:20542768
A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics.
Oyana, Tonny J
2010-01-01
The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering technique-the Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines. PMID:20689710
SAKM: self-adaptive kernel machine. A kernel-based algorithm for online clustering.
Amadou Boubacar, Habiboulaye; Lecoeuche, Stéphane; Maouche, Salah
2008-11-01
This paper presents a new online clustering algorithm called SAKM (Self-Adaptive Kernel Machine) which is developed to learn continuously evolving clusters from non-stationary data. Based on SVM and kernel methods, the SAKM algorithm uses a fast adaptive learning procedure to take into account variations over time. Dedicated to online clustering in a multi-class environment, the algorithm designs an unsupervised neural architecture with self-adaptive abilities. Based on a specific kernel-induced similarity measure, the SAKM learning procedures consist of four main stages: Creation, Adaptation, Fusion and Elimination. In addition to these properties, the SAKM algorithm is attractive to be computationally efficient in online learning of real-drifting targets. After a theoretical study of the error convergence bound of the SAKM local learning, a comparison with NORMA and ALMA algorithms is made. In the end, some experiments conducted on simulation data, UCI benchmarks and real data are given to illustrate the capacities of the SAKM algorithm for online clustering in non-stationary and multi-class environment. PMID:18835695
Accelerating Bayesian Hierarchical Clustering of Time Series Data with a Randomised Algorithm
Darkins, Robert; Cooke, Emma J.; Ghahramani, Zoubin; Kirk, Paul D. W.; Wild, David L.; Savage, Richard S.
2013-01-01
We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/. PMID:23565168
Algorithm for the detection of fine clustered calcifications on film mammograms.
Fam, B W; Olson, S L; Winter, P F; Scholz, F J
1988-11-01
An algorithmic process for the detection and marking of clustered calcifications in digitized film-screen mammograms has been applied to mammograms from 50 clinical cases sampled at two digitization levels, in both the craniocaudal and mediolateral views. In all but one case the detector accurately located suggestive clusters found by radiologists in normal screening. In five cases additional clusters were also found by the detector. The detector has a negligible false-positive rate for the detection of clustered calcifications, although it is sensitive to clusters of emulsion defects displayed as artifactual calcification densities in the original film. The detector is flexible in structure and is easily adapted to various calcification/cluster criteria. The detector shows considerable promise when applied to clinical examples but will require refinement before formal testing. PMID:3174981
[The multi-spectra classification algorithm based on K-means clustering and spectral angle cosine].
Wei, Jun-xia; Xiangli, Bin; Gao, Xiao-hui; Duan, Xiao-feng
2011-05-01
The classification and de-aliasing methods with respect to multi-spectra and hyper-spectra have been widely studied in recent years. And both K-mean clustering algorithm and spectral similarity algorithm are familiar classification methods. The present paper improved the K-mean clustering algorithm by using spectral similarity match algorithm to perform a new spectral classification algorithm. Two spectra with the farthest distance first were chosen as reference spectra. The Euclidean distance method or spectral angle cosine method then were used to classify data cube on the basis of the two reference spectra, and delete the spectra which belongs to the two reference spectra. The rest data cube was used to perform new classification according to a third spectrum, which is the farthest distance or the biggest angle one corresponding to the two reference spectra. Multi-spectral data cube was applied in the experimental test. The results of K-mean clustering classification by ENVI, compared with simulation results of the improved K-mean algorithm and the spectral angle cosine method, demonstrated that the latter two classify two air bubbles explicitly and effectively, and the improved K-mean algorithm classifies backgrounds better, especially the Euclidean distance method can classify the backgrounds integrally. PMID:21800600
An efficient clustering algorithm for partitioning Y-short tandem repeats data
2012-01-01
Background Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. Results Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91), k-Modes-RVF (0.81), New Fuzzy k-Modes (0.80), k-Modes (0.76), k-Modes-Hybrid 1 (0.76), k-Modes-Hybrid 2 (0.75), Fuzzy k-Modes (0.74), and k-Modes-UAVM (0.70). Conclusions The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k)) and considered to be linear. PMID:23039132
Clustering WHO-ART terms using semantic distance and machine learning algorithms.
Iavindrasana, Jimison; Bousquet, Cedric; Degoulet, Patrice; Jaulent, Marie-Christine
2006-01-01
WHO-ART was developed by the WHO collaborating centre for international drug monitoring in order to code adverse drug reactions. We assume that computation of semantic distance between WHO-ART terms may be an efficient way to group related medical conditions in the WHO database in order to improve signal detection. Our objective was to develop a method for clustering WHO-ART terms according to some proximity of their meanings. Our material comprises 758 WHO-ART terms. A formal definition was acquired for each term as a list of elementary concepts belonging to SNOMED international axes and characterized by modifier terms in some cases. Clustering was implemented as a terminology service on a J2EE server. Two different unsupervised machine learning algorithms (KMeans, Pvclust) clustered WHO-ART terms according to a semantic distance operator previously described. Pvclust grouped 51% of WHO-ART terms. K-Means grouped 100% of WHO-ART terms but 25% clusters were heterogeneous with k = 180 clusters and 6% clusters were heterogeneous with k = 32 clusters. Clustering algorithms associated to semantic distance could suggest potential groupings of WHO-ART terms that need validation according to the user's requirements. PMID:17238365
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network.
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-01-01
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-01-01
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
Node Non-Uniform Deployment Based on Clustering Algorithm for Underwater Sensor Networks
Jiang, Peng; Liu, Jun; Wu, Feng
2015-01-01
A node non-uniform deployment based on clustering algorithm for underwater sensor networks (UWSNs) is proposed in this study. This algorithm is proposed because optimizing network connectivity rate and network lifetime is difficult for the existing node non-uniform deployment algorithms under the premise of improving the network coverage rate for UWSNs. A high network connectivity rate is achieved by determining the heterogeneous communication ranges of nodes during node clustering. Moreover, the concept of aggregate contribution degree is defined, and the nodes with lower aggregate contribution degrees are used to substitute the dying nodes to decrease the total movement distance of nodes and prolong the network lifetime. Simulation results show that the proposed algorithm can achieve a better network coverage rate and network connectivity rate, as well as decrease the total movement distance of nodes and prolong the network lifetime. PMID:26633408
Node Non-Uniform Deployment Based on Clustering Algorithm for Underwater Sensor Networks.
Jiang, Peng; Liu, Jun; Wu, Feng
2015-01-01
A node non-uniform deployment based on clustering algorithm for underwater sensor networks (UWSNs) is proposed in this study. This algorithm is proposed because optimizing network connectivity rate and network lifetime is difficult for the existing node non-uniform deployment algorithms under the premise of improving the network coverage rate for UWSNs. A high network connectivity rate is achieved by determining the heterogeneous communication ranges of nodes during node clustering. Moreover, the concept of aggregate contribution degree is defined, and the nodes with lower aggregate contribution degrees are used to substitute the dying nodes to decrease the total movement distance of nodes and prolong the network lifetime. Simulation results show that the proposed algorithm can achieve a better network coverage rate and network connectivity rate, as well as decrease the total movement distance of nodes and prolong the network lifetime. PMID:26633408
Two generalizations of Kohonen clustering
NASA Technical Reports Server (NTRS)
Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.
1993-01-01
The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.
Invaded cluster algorithm for a tricritical point in a diluted Potts model
NASA Astrophysics Data System (ADS)
Balog, I.; Uzelac, K.
2007-07-01
The invaded cluster approach is extended to the two-dimensional Potts model with annealed vacancies by using the random-cluster representation. Geometrical arguments are used to propose the algorithm which converges to the tricritical point in the two-dimensional parameter space spanned by temperature and the chemical potential of the vacancies. The tricritical point is identified as a simultaneous onset of the percolation of a Fortuin-Kasteleyn cluster and of a percolation of the “geometrical disorder cluster.” The location of the tricritical point and the concentration of vacancies for q=1,2,3 are found to be in good agreement with the best known results. Scaling properties of the percolating scaling cluster and related critical exponents are also presented.
A seed expanding cluster algorithm for deriving upwelling areas on sea surface temperature images
NASA Astrophysics Data System (ADS)
Nascimento, Susana; Casca, Sérgio; Mirkin, Boris
2015-12-01
In this paper a novel clustering algorithm is proposed as a version of the seeded region growing (SRG) approach for the automatic recognition of coastal upwelling from sea surface temperature (SST) images. The new algorithm, one seed expanding cluster (SEC), takes advantage of the concept of approximate clustering due to Mirkin (1996, 2013) to derive a homogeneity criterion in the format of a product rather than the conventional difference between a pixel value and the mean of values over the region of interest. It involves a boundary-oriented pixel labeling so that the cluster growing is performed by expanding its boundary iteratively. The starting point is a cluster consisting of just one seed, the pixel with the coldest temperature. The baseline version of the SEC algorithm uses Otsu's thresholding method to fine-tune the homogeneity threshold. Unfortunately, this method does not always lead to a satisfactory solution. Therefore, we introduce a self-tuning version of the algorithm in which the homogeneity threshold is locally derived from the approximation criterion over a window around the pixel under consideration. The window serves as a boundary regularizer. These two unsupervised versions of the algorithm have been applied to a set of 28 SST images of the western coast of mainland Portugal, and compared against a supervised version fine-tuned by maximizing the F-measure with respect to manually labeled ground-truth maps. The areas built by the unsupervised versions of the SEC algorithm are significantly coincident over the ground-truth regions in the cases at which the upwelling areas consist of a single continuous fragment of the SST map.
Experimental realization of the Deutsch-Jozsa algorithm with a six-qubit cluster state
Vallone, Giuseppe; Donati, Gaia; Bruno, Natalia; Chiuri, Andrea; Mataloni, Paolo
2010-05-15
We describe an experimental realization of the Deutsch-Jozsa quantum algorithm to evaluate the properties of a two-bit Boolean function in the framework of one-way quantum computation. For this purpose, a two-photon six-qubit cluster state was engineered. Its peculiar topological structure is the basis of the original measurement pattern allowing the algorithm realization. The good agreement of the experimental results with the theoretical predictions, obtained at {approx}1 kHz success rate, demonstrates the correct implementation of the algorithm.
A Clustering Algorithm for Liver Lesion Segmentation of Diffusion-Weighted MR Images
Jha, Abhinav K.; Rodríguez, Jeffrey J.; Stephen, Renu M.; Stopeck, Alison T.
2010-01-01
In diffusion-weighted magnetic resonance imaging, accurate segmentation of liver lesions in the diffusion-weighted images is required for computation of the apparent diffusion coefficient (ADC) of the lesion, the parameter that serves as an indicator of lesion response to therapy. However, the segmentation problem is challenging due to low SNR, fuzzy boundaries and speckle and motion artifacts. We propose a clustering algorithm that incorporates spatial information and a geometric constraint to solve this issue. We show that our algorithm provides improved accuracy compared to existing segmentation algorithms. PMID:21151837
Lee, Chongdeuk; Jeong, Taegwon
2011-01-01
Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms. PMID:22163905
Lee, Chongdeuk; Jeong, Taegwon
2011-01-01
Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms. PMID:22163905
Solving the depth of the repeated texture areas based on the clustering algorithm
NASA Astrophysics Data System (ADS)
Xiong, Zhang; Zhang, Jun; Tian, Jinwen
2015-12-01
The reconstruction of the 3D scene in the monocular stereo vision needs to get the depth of the field scenic points in the picture scene. But there will inevitably be error matching in the process of image matching, especially when there are a large number of repeat texture areas in the images, there will be lots of error matches. At present, multiple baseline stereo imaging algorithm is commonly used to eliminate matching error for repeated texture areas. This algorithm can eliminate the ambiguity correspond to common repetition texture. But this algorithm has restrictions on the baseline, and has low speed. In this paper, we put forward an algorithm of calculating the depth of the matching points in the repeat texture areas based on the clustering algorithm. Firstly, we adopt Gauss Filter to preprocess the images. Secondly, we segment the repeated texture regions in the images into image blocks by using spectral clustering segmentation algorithm based on super pixel and tag the image blocks. Then, match the two images and solve the depth of the image. Finally, the depth of the image blocks takes the median in all depth values of calculating point in the bock. So the depth of repeated texture areas is got. The results of a lot of image experiments show that the effect of our algorithm for calculating the depth of repeated texture areas is very good.
Tame, M. S.; Kim, M. S.
2010-09-15
We show that fundamental versions of the Deutsch-Jozsa and Bernstein-Vazirani quantum algorithms can be performed using a small entangled cluster state resource of only six qubits. We then investigate the minimal resource states needed to demonstrate general n-qubit versions and a scalable method to produce them. For this purpose, we propose a versatile photonic on-chip setup.
A multilevel gamma-clustering layout algorithm for visualization of biological networks.
Hruz, Tomas; Wyss, Markus; Lucas, Christoph; Laule, Oliver; von Rohr, Peter; Zimmermann, Philip; Bleuler, Stefan
2013-01-01
Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel Îł -clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs. PMID:23864855
A Multilevel Gamma-Clustering Layout Algorithm for Visualization of Biological Networks
Hruz, Tomas; Lucas, Christoph; Laule, Oliver; Zimmermann, Philip
2013-01-01
Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel Îł-clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs. PMID:23864855
An effective trust-based recommendation method using a novel graph clustering algorithm
NASA Astrophysics Data System (ADS)
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
BMI optimization by using parallel UNDX real-coded genetic algorithm with Beowulf cluster
NASA Astrophysics Data System (ADS)
Handa, Masaya; Kawanishi, Michihiro; Kanki, Hiroshi
2007-12-01
This paper deals with the global optimization algorithm of the Bilinear Matrix Inequalities (BMIs) based on the Unimodal Normal Distribution Crossover (UNDX) GA. First, analyzing the structure of the BMIs, the existence of the typical difficult structures is confirmed. Then, in order to improve the performance of algorithm, based on results of the problem structures analysis and consideration of BMIs characteristic properties, we proposed the algorithm using primary search direction with relaxed Linear Matrix Inequality (LMI) convex estimation. Moreover, in these algorithms, we propose two types of evaluation methods for GA individuals based on LMI calculation considering BMI characteristic properties more. In addition, in order to reduce computational time, we proposed parallelization of RCGA algorithm, Master-Worker paradigm with cluster computing technique.
MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms
Melnykov, Volodymyr; Chen, Wei-Chen; Maitra, Ranjan
2012-01-01
The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models. All features of the package are illustrated in great detail. The utility of the package is highlighted through a small comparison study of several popular clustering algorithms.
Kim, Chang Sik; Bae, Cheol Soo; Tcha, Hong Joon
2008-01-01
Background The previous studies of genome-wide expression patterns show that a certain percentage of genes are cell cycle regulated. The expression data has been analyzed in a number of different ways to identify cell cycle dependent genes. In this study, we pose the hypothesis that cell cycle dependent genes are considered as oscillating systems with a rhythm, i.e. systems producing response signals with period and frequency. Therefore, we are motivated to apply the theory of multivariate phase synchronization for clustering cell cycle specific genome-wide expression data. Results We propose the strategy to find groups of genes according to the specific biological process by analyzing cell cycle specific gene expression data. To evaluate the propose method, we use the modified Kuramoto model, which is a phase governing equation that provides the long-term dynamics of globally coupled oscillators. With this equation, we simulate two groups of expression signals, and the simulated signals from each group shares their own common rhythm. Then, the simulated expression data are mixed with randomly generated expression data to be used as input data set to the algorithm. Using these simulated expression data, it is shown that the algorithm is able to identify expression signals that are involved in the same oscillating process. We also evaluate the method with yeast cell cycle expression data. It is shown that the output clusters by the proposed algorithm include genes, which are closely associated with each other by sharing significant Gene Ontology terms of biological process and/or having relatively many known biological interactions. Therefore, the evaluation analysis indicates that the method is able to identify expression signals according to the specific biological process. Our evaluation analysis also indicates that some portion of output by the proposed algorithm is not obtainable by the traditional clustering algorithm with Euclidean distance or linear correlation. Conclusion Based on the evaluation experiments, we draw the conclusion as follows: 1) Based on the theory of multivariate phase synchronization, it is feasible to find groups of genes, which have relevant biological interactions and/or significantly shared GO slim terms of biological process, using cell cycle specific gene expression signals. 2) Among all the output clusters by the proposed algorithm, the cluster with relatively large size has a tendency to include more known interactions than the one with relatively small size. 3) It is feasible to understand the cell cycle specific gene expression patterns as the phenomenon of collective synchronization. 4) The proposed algorithm is able to find prominent groups of genes, which are not obtainable by traditional clustering algorithm. PMID:18221564
NASA Astrophysics Data System (ADS)
Rahmah, Nadia; Sukaesih Sitanggang, Imas
2016-01-01
In this work we determine the optimal epsilon value on peatland on DBSCAN Algorithm to clustering data on peatland hotspots in sumatera. DBSCAN is a base algorithm for density based data clustering which contain noise and outliers. We found using this method that the area which has the highest density of hotspots in Sumatra in 2013 peatland is contained in cluster 1 of Riau Province that is equal to 2112 hotspots.
The RedGOLD cluster detection algorithm and its cluster candidate catalogue for the CFHT-LS W1
NASA Astrophysics Data System (ADS)
Licitra, Rossella; Mei, Simona; Raichoor, Anand; Erben, Thomas; Hildebrandt, Hendrik
2016-01-01
We present RedGOLD (Red-sequence Galaxy Overdensity cLuster Detector), a new optical/NIR galaxy cluster detection algorithm, and apply it to the CFHT-LS W1 field. RedGOLD searches for red-sequence galaxy overdensities while minimizing contamination from dusty star-forming galaxies. It imposes an Navarro-Frenk-White profile and calculates cluster detection significance and richness. We optimize these latter two parameters using both simulations and X-ray-detected cluster catalogues, and obtain a catalogue Ëś80 per cent pure up to z Ëś 1, and Ëś100 per cent (Ëś70 per cent) complete at z â‰¤ 0.6 (z â‰˛ 1) for galaxy clusters with M â‰ł 1014 MâŠ™ at the CFHT-LS Wide depth. In the CFHT-LS W1, we detect 11 cluster candidates per deg2 out to z Ëś 1.1. When we optimize both completeness and purity, RedGOLD obtains a cluster catalogue with higher completeness and purity than other public catalogues, obtained using CFHT-LS W1 observations, for M â‰ł 1014 MâŠ™. We use X-ray-detected cluster samples to extend the study of the X-ray temperature-optical richness relation to a lower mass threshold, and find a mass scatter at fixed richness of ĎlnM|Î» = 0.39 Â± 0.07 and ĎlnM|Î» = 0.30 Â± 0.13 for the Gozaliasl et al. and Mehrtens et al. samples. When considering similar mass ranges as previous work, we recover a smaller scatter in mass at fixed richness. We recover 93 per cent of the redMaPPer detections, and find that its richness estimates is on average Ëś40-50 per cent larger than ours at z > 0.3. RedGOLD recovers X-ray cluster spectroscopic redshifts at better than 5 per cent up to z Ëś 1, and the centres within a few tens of arcseconds.
Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs
NASA Astrophysics Data System (ADS)
Choi, Woo-Yong; Chatterjee, Mainak
2015-03-01
With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.
Uchiyama, Ikuo
2006-01-01
Ortholog identification is a crucial first step in comparative genomics. Here, we present a rapid method of ortholog grouping which is effective enough to allow the comparison of many genomes simultaneously. The method takes as input all-against-all similarity data and classifies genes based on the traditional hierarchical clustering algorithm UPGMA. In the course of clustering, the method detects domain fusion or fission events, and splits clusters into domains if required. The subsequent procedure splits the resulting trees such that intra-species paralogous genes are divided into different groups so as to create plausible orthologous groups. As a result, the procedure can split genes into the domains minimally required for ortholog grouping. The procedure, named DomClust, was tested using the COG database as a reference. When comparing several clustering algorithms combined with the conventional bidirectional best-hit (BBH) criterion, we found that our method generally showed better agreement with the COG classification. By comparing the clustering results generated from datasets of different releases, we also found that our method showed relatively good stability in comparison to the BBH-based methods. PMID:16436801
Ising embedding for cluster algorithms in finite-temperature SU(2) gauge theory
NASA Astrophysics Data System (ADS)
Kerler, Werner; Metz, Thomas
1994-12-01
To extend cluster algorithms also to continuous gauge theories is highly desirable. So far in the very special case of N?=1 in SU(2) lattice-gauge theory at finite temperature an embedding of an Ising model with variable couplings has been successful. We get an improvement in this case by using a different flipping rule for the cluster spins. Looking for a generalization to the case N?>1, we find that, by appropriate gauge transformations, one can trade field effects for frustration, but not get a net improvement.
KD-tree based clustering algorithm for fast face recognition on large-scale data
NASA Astrophysics Data System (ADS)
Wang, Yuanyuan; Lin, Yaping; Yang, Junfeng
2015-07-01
This paper proposes an acceleration method for large-scale face recognition system. When dealing with a large-scale database, face recognition is time-consuming. In order to tackle this problem, we employ the k-means clustering algorithm to classify face data. Specifically, the data in each cluster are stored in the form of the kd-tree, and face feature matching is conducted with the kd-tree based nearest neighborhood search. Experiments on CAS-PEAL and self-collected database show the effectiveness of our proposed method.
Fast randomized Hough transformation track initiation algorithm based on multi-scale clustering
NASA Astrophysics Data System (ADS)
Wan, Minjie; Gu, Guohua; Chen, Qian; Qian, Weixian; Wang, Pengcheng
2015-10-01
A fast randomized Hough transformation track initiation algorithm based on multi-scale clustering is proposed to overcome existing problems in traditional infrared search and track system(IRST) which cannot provide movement information of the initial target and select the threshold value of correlation automatically by a two-dimensional track association algorithm based on bearing-only information . Movements of all the targets are presumed to be uniform rectilinear motion throughout this new algorithm. Concepts of space random sampling, parameter space dynamic linking table and convergent mapping of image to parameter space are developed on the basis of fast randomized Hough transformation. Considering the phenomenon of peak value clustering due to shortcomings of peak detection itself which is built on threshold value method, accuracy can only be ensured on condition that parameter space has an obvious peak value. A multi-scale idea is added to the above-mentioned algorithm. Firstly, a primary association is conducted to select several alternative tracks by a low-threshold .Then, alternative tracks are processed by multi-scale clustering methods , through which accurate numbers and parameters of tracks are figured out automatically by means of transforming scale parameters. The first three frames are processed by this algorithm in order to get the first three targets of the track , and then two slightly different gate radius are worked out , mean value of which is used to be the global threshold value of correlation. Moreover, a new model for curvilinear equation correction is applied to the above-mentioned track initiation algorithm for purpose of solving the problem of shape distortion when a space three-dimensional curve is mapped to a two-dimensional bearing-only space. Using sideways-flying, launch and landing as examples to build models and simulate, the application of the proposed approach in simulation proves its effectiveness , accuracy , and adaptivity of correlation threshold selection.
Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
NASA Astrophysics Data System (ADS)
Singh, Sudhakar; Garg, Rakhi; Mishra, P. K.
2015-10-01
Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm. We emphasize and investigate the significance of these three data structures for Apriori algorithm on Hadoop cluster, which has not been given attention yet. Experiments are carried out on both real life and synthetic datasets which shows that hash table trie data structures performs far better than trie and hash tree in terms of execution time. Moreover the performance in case of hash tree becomes worst.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allenâ€™s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
Utilizing unsupervised learning to cluster data in the Bayesian data reduction algorithm
NASA Astrophysics Data System (ADS)
Lynch, Robert S., Jr.; Willett, Peter K.
2005-03-01
In this paper, unsupervised learning is utilized to illustrate the ability of the Bayesian Data Reduction Algorithm (BDRA) to cluster unlabeled training data. The BDRA is based on the assumption that the discrete symbol probabilities of each class are a priori uniformly Dirichlet distributed, and it employs a "greedy" approach (similar to a backward sequential feature search) for reducing irrelevant features from the training data of each class. Notice that reducing irrelevant features is synonymous here with selecting those features that provide best classification performance; the metric for making data reducing decisions is an analytic formula for the probability of error conditioned on the training data. The contribution of this work is to demonstrate how clustering performance varies depending on the method utilized for unsupervised training. To illustrate performance, results are demonstrated using simulated data. In general, the results of this work have implications for finding clusters in data mining applications.
NASA Astrophysics Data System (ADS)
Inclan, Eric; Geohegan, David; Yoon, Mina
2015-03-01
Nanostructured TiO2 materials have interesting properties that are highly relevant to energy and device applications. However, precise control of their morphologies and characterization are still a grand challenge in the field. Using a hybrid optimization algorithm we theoretically explored configuration spaces of energetically metastable TiO2 nanostructures. Our approach is to minimize the total energy of TiO2 clusters in order to identify the structural characteristics and energy landscape of plausible (TiO2)n (n = 1-100). The hybrid algorithm includes a modified differential evolution algorithm, a permutation operator to perform global optimization on a set of randomly generated structures, and then structure refinement using a BFGS Quasi-Newton algorithm. The results were compared against known physical structures and numerical results in the literature as well as our experimentally synthesized structures. Although the global minimum became more computationally expensive to locate with increasing number of TiO2 units, the optimizer successfully identified numerous plausible structures along a range of energies close to the global minimum energy structure for all clusters in the given range. This work is supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division.
A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets
Zhang, Yipu; Wang, Ping
2015-01-01
New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, â€‰d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, â€‰d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. PMID:26236718
A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets.
Zhang, Yipu; Wang, Ping
2015-01-01
New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, â€‰d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, â€‰d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. PMID:26236718
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
Karimi, Abbas; Afsharfarnia, Abbas; Zarafshan, Faraneh; Al-Haddad, S. A. R.
2014-01-01
The stability of clusters is a serious issue in mobile ad hoc networks. Low stability of clusters may lead to rapid failure of clusters, high energy consumption for reclustering, and decrease in the overall network stability in mobile ad hoc network. In order to improve the stability of clusters, weight-based clustering algorithms are utilized. However, these algorithms only use limited features of the nodes. Thus, they decrease the weight accuracy in determining node's competency and lead to incorrect selection of cluster heads. A new weight-based algorithm presented in this paper not only determines node's weight using its own features, but also considers the direct effect of feature of adjacent nodes. It determines the weight of virtual links between nodes and the effect of the weights on determining node's final weight. By using this strategy, the highest weight is assigned to the best choices for being the cluster heads and the accuracy of nodes selection increases. The performance of new algorithm is analyzed by using computer simulation. The results show that produced clusters have longer lifetime and higher stability. Mathematical simulation shows that this algorithm has high availability in case of failure. PMID:25114965
A contour-line color layer separation algorithm based on fuzzy clustering and region growing
NASA Astrophysics Data System (ADS)
Liu, Tiange; Miao, Qiguang; Xu, Pengfei; Tong, Yubing; Song, Jianfeng; Xia, Ge; Yang, Yun; Zhai, Xiaojie
2016-03-01
The color layers of contour-lines separated from scanned topographic map are the basis of contour-line extraction, but it is difficult to separate them well due to the color aliasing and mixed color problems. This paper will focus us on contour-line color layer separation and presents a novel approach for it based on fuzzy clustering and Single-prototype Region Growing for Contour-line Layer (SRGCL). The purpose of this paper is to provide a solution for processing scanned topographic maps on which contour-lines are abundant and densely distributed, for example, in the condition similar to hilly areas and mountainous regions, the contour-lines always occupy the largest proportion in linear features and the contour-line separation is the most difficult task. The proposed approach includes steps as follows. First step, line features are extracted from the map to reduce the interference from area features in fuzzy clustering. Second step, fuzzy clustering algorithm is employed to obtain membership matrix of pixels in the line map. Third step, based on the membership matrix, we obtain the most-similar prototype and the second-similar prototype of each pixel as the indicators of the pixel in SRGCL. The spatial relationship and the fuzzy similarity of color features are used in SRGCL to overcome the inaccurate classification of ambiguous pixels. The procedure focusing on single contour-line layer will improve the accuracy of contour-line segmentation result of SRGCL relative to general segmentation methods. We verified the algorithm on several USGS historical maps, the experimental results show that our algorithm produces contour-line color layers with good continuity and few noises, which verifies the improvement in contour-line color layer separation of our algorithm relative to two general segmentation methods.
Guo, Xiao-yong; Fang, Li; Zhao, Wen-wu; Gu, Xue-jun; Zheng, Hai-yang; Zhang, Wei-jun
2008-08-01
On-line measurement of size and composition of single particle using an aerosol time-of-flight Laser mass spectrometry (ATOFLMS) had been designed in our lab. Each particle's aerodynamic diameter is determined by measuring the delay time between two continuous-wave lasers, A Nd : YAG laser desorbs and ionizes molecules from the particle, and the time-of-flight mass spectrometer collects a mass spectrum of the generated ions. Then the composition of single particle is obtained. ATOFLMS generates large amount of data during the process period. How to process these data and extract valuable information is one of the key problems for the ATOFLMS. In this paper, the fuzzy clustering used to classify large numbers of mass spectral of air indoor by an ATOFLMS. Each revised spectrum is converted to a normalized 300-point vector, each point representing one mass unit. Then the positive ion mass spectra of a single particle are described as 300-dimensional data vectors using the ion masses as dimensions and the ion signal peak areas as values. The data vectors of all particles measured are written into a classification matrix. Each spectrum's data was stored as one row in this matrix. The Fuzzy c-means algorithm is an iterative method starting the calculation with random class centers to find a substructure in the data. The procedure works in such a way that finally similar objects (particle spectra) have a minimum distance between their corresponding data vectors, on the one hand, and to the center of a cluster, on the other hand. So the aim of the iteration is to find local minima in the N-dimensional space where N is the number of evaluated peak masses. The particle data used in this study were collected over a period one day in Hefei. During the campaign, inorganic salts, mineral particles, and carbonaceous particles, with varying degrees of secondary components, were identified. The detection results of particle size exhibit that aerosol is predominanantly in the form of fine particles, and the particles whose diameter larger than 1 microm are scare. The particles whose diameter less than 1 microm are make up of 95% of the total particles, and these particles are major distributed in 0.4-0.8 microm. PMID:18975786
Crowded Cluster Cores. Algorithms for Deblending in Dark Energy Survey Images
Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J.; Rykoff, Eli; Song, Jeeseon
2015-10-26
Deep optical images are often crowded with overlapping objects. We found that this is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores in Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. Our paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.
Detection and clustering of features in aerial images by neuron network-based algorithm
NASA Astrophysics Data System (ADS)
Vozenilek, Vit
2015-12-01
The paper presents the algorithm for detection and clustering of feature in aerial photographs based on artificial neural networks. The presented approach is not focused on the detection of specific topographic features, but on the combination of general features analysis and their use for clustering and backward projection of clusters to aerial image. The basis of the algorithm is a calculation of the total error of the network and a change of weights of the network to minimize the error. A classic bipolar sigmoid was used for the activation function of the neurons and the basic method of backpropagation was used for learning. To verify that a set of features is able to represent the image content from the user's perspective, the web application was compiled (ASP.NET on the Microsoft .NET platform). The main achievements include the knowledge that man-made objects in aerial images can be successfully identified by detection of shapes and anomalies. It was also found that the appropriate combination of comprehensive features that describe the colors and selected shapes of individual areas can be useful for image analysis.
Detecting low-frequency functional connectivity in fMRI using unsupervised clustering algorithms
NASA Astrophysics Data System (ADS)
Lange, Oliver; Meyer-BĂ¤se, Anke; WismĂĽller, Axel
2006-04-01
Recent research in functional magnetic resonance imaging (fMRI) revealed slowly varying temporally correlated fluctuations between functionally related areas. These low-frequency oscillations of less than 0.08 Hz appear to be a property of symmetric cortices, and they are known to be present in the motor cortex among others. These low-frequency data are difficult to detect and quantify in fMRI. Traditionally, user-based regions of interests (ROI) or "seed clusters" have been the primary analysis method. We propose in this paper to employ unsupervised clustering algorithms employing arbitrary distance measures to detect the resting state of functional connectivity. There are two main benefits using unsupervised algorithms instead of traditional techniques: (1) the scan time is reduced by finding directly the activation data set, and (2) the whole data set is considered and not a relative correlation map. The achieved results are evaluated for different distance metrics. The Euclidian metric implemented by the standard unsupervised clustering approaches is compared with a more general topographic mapping of proximities based on the correlation and the prediction error between time courses. Thus, we are able to detect functional connectivity based on model-free analysis methods implementing arbitrary distance metrics.
Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm
NASA Technical Reports Server (NTRS)
Mitra, Sunanda; Pemmaraju, Surya
1992-01-01
Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.
Crowded Cluster Cores: An Algorithm for Deblending in Dark Energy Survey Images
NASA Astrophysics Data System (ADS)
Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J.; Rykoff, Eli; Song, Jeeseon
2015-12-01
Deep optical images are often crowded with overlapping objects. This is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores in Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. This paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.
Crowded Cluster Cores. Algorithms for Deblending in Dark Energy Survey Images
Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J.; Rykoff, Eli; Song, Jeeseon
2015-10-26
Deep optical images are often crowded with overlapping objects. We found that this is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores inmoreÂ Â» Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. Our paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.Â«Â less
NASA Astrophysics Data System (ADS)
de Abreu e Silva, Elcio Sabato; Duarte, Hélio Anderson; Belchior, Jadson Cláudio
2006-04-01
The present work proposes the application of a genetic algorithm (GA) for determining global minima to be used as seeds for a higher level ab initio method analysis such as density function theory (DFT). Water clusters ((H 2O) n (2 ? n ? 13)) are used as a test case and for the initial guesses four empirical potentials (TIP3P, TIP4P, TIP5P and ST2) were considered for the GA calculations. Two types of analysis were performed namely rigid (DFT_RM) and non rigid (DFT_NRM) molecules for the corresponding structures and energies. For the DFT analysis, the PBE exchange correlation functional and the large basis set A-PVTZ have been used. All structures and their respective energies calculated through the GA method, DFT_RM and DFT_NRM are compared and discussed. The proposed methodology showed to be very efficient in order to have quasi accurate global minima on the level of ab initio calculations and the data are discussed in the light of previously published results with particular attention to ((H 2O) n (2 ? n ? 13)) clusters. The results suggest that the stabilization energy error for the empirical potentials used are additive with respect to the cluster size, roughly 0.5 kcal mol -1 per water molecule after ZPE correction. Finally, the approach of using GA/empirical potential structures as starting point for ab initio optimization methods showed to be a computationally manageable strategy to explore the potential energy surface of large systems at quantum level. In conclusion, this work proposes an alternative approach to accurately study properties of larger systems in a very efficient manner.
NASA Astrophysics Data System (ADS)
Bansod, Babankumar S.; Pandey, O. P.
2013-05-01
Within-field variability is a well-known phenomenon and its study is at the centre of precision agriculture (PA). In this paper, site-specific spatial variability (SSSV) of apparent Electrical Conductivity (ECa) and crop yield apart from pH, moisture, temperature and di-electric constant information was analyzed to construct spatial distribution maps. Principal component analysis (PCA) and fuzzy c-means (FCM) clustering algorithm were then performed to delineate management zones (MZs). Various performance indices such as Normalized Classification Entropy (NCE) and Fuzzy Performance Index (FPI) were calculated to determine the clustering performance. The geo-referenced sensor data was analyzed for within-field classification. Results revealed that the variables could be aggregated into MZs that characterize spatial variability in soil chemical properties and crop productivity. The resulting classified MZs showed favorable agreement between ECa and crop yield variability pattern. This enables reduction in number of soil analysis needed to create application maps for certain cultivation operations.
A Comparative Study of Self-organizing Clustering Algorithms Dignet and ART2.
Thomopoulos, Stelios C.A.; Wann, Chin Der
1997-06-01
A comparative study of two self-organizing clustering neural network algorithms, Dignet and ART2, has been conducted. The differences in architecture and learning procedures between the two models are compared. Comparative computer simulations on data clustering and signal detection problems with Gaussian noise were used for investigating the performance of Dignet and "fast learning" ART2. The study shows that Dignet, with a simple architecture and straightforward dynamics, is more flexible with the choice of different metrics for the measure of similarity. The system parameters in Dignet can be analytically determined from a self-adjusting process; moreover, the initial threshold value used in Dignet is directly determined from a lower-bound of the desirable operational signal-to-noise ratio. Simulations show that Dignet generally exhibits faster learning and better clustering performance on statistical pattern recognition problems. A simplified ART2 model (SART2) is derived by adopting the structural concepts from Dignet. SART2 exhibits faster learning and eliminates a "false conviction" problem that exists in the "fast learning" ART2. The comparative study is benchmarked against statistical data clustering and signal detection problems. Copyright 1997 Elsevier Science Ltd. PMID:12662867
Development of a Genetic Algorithm to Automate Clustering of a Dependency Structure Matrix
NASA Technical Reports Server (NTRS)
Rogers, James L.; Korte, John J.; Bilardo, Vincent J.
2006-01-01
Much technology assessment and organization design data exists in Microsoft Excel spreadsheets. Tools are needed to put this data into a form that can be used by design managers to make design decisions. One need is to cluster data that is highly coupled. Tools such as the Dependency Structure Matrix (DSM) and a Genetic Algorithm (GA) can be of great benefit. However, no tool currently combines the DSM and a GA to solve the clustering problem. This paper describes a new software tool that interfaces a GA written as an Excel macro with a DSM in spreadsheet format. The results of several test cases are included to demonstrate how well this new tool works.
NASA Astrophysics Data System (ADS)
McCaffrey, James D.; Dierking, Howard
This study investigates the use of a biologically inspired meta-heuristic algorithm to extract rule sets from clustered categorical data. A computer program which implemented the algorithm was executed against six benchmark data sets and successfully discovered the underlying generation rules in all cases. Compared to existing approaches, the simulated bee colony (SBC) algorithm used in this study has the advantage of allowing full customization of the characteristics of the extracted rule set, and allowing arbitrarily large data sets to be analyzed. The primary disadvantages of the SBC algorithm for rule set extraction are that the approach requires a relatively large number of input parameters, and that the approach does not guarantee convergence to an optimal solution. The results demonstrate that an SBC algorithm for rule set extraction of clustered categorical data is feasible, and suggest that the approach may have the ability to outperform existing algorithms in certain scenarios.
CLUSTAG & WCLUSTAG: Hierarchical Clustering Algorithms for Efficient Tag-SNP Selection
NASA Astrophysics Data System (ADS)
Ao, Sio-Iong
More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clustering algorithms have been proposed for efficient tag SNP selection.
Wang, Wei; Song, Wei-Guo; Liu, Shi-Xing; Zhang, Yong-Ming; Zheng, Hong-Yang; Tian, Wei
2011-04-01
An improved method for detecting cloud combining Kmeans clustering and the multi-spectral threshold approach is described. On the basis of landmark spectrum analysis, MODIS data is categorized into two major types initially by Kmeans method. The first class includes clouds, smoke and snow, and the second class includes vegetation, water and land. Then a multi-spectral threshold detection is applied to eliminate interference such as smoke and snow for the first class. The method is tested with MODIS data at different time under different underlying surface conditions. By visual method to test the performance of the algorithm, it was found that the algorithm can effectively detect smaller area of cloud pixels and exclude the interference of underlying surface, which provides a good foundation for the next fire detection approach. PMID:21714260
Multispectral image classification of MRI data using an empirically-derived clustering algorithm
Horn, K.M.; Osbourn, G.C.; Bouchard, A.M.; Sanders, J.A. |
1998-08-01
Multispectral image analysis of magnetic resonance imaging (MRI) data has been performed using an empirically-derived clustering algorithm. This algorithm groups image pixels into distinct classes which exhibit similar response in the T{sub 2} 1st and 2nd-echo, and T{sub 1} (with ad without gadolinium) MRI images. The grouping is performed in an n-dimensional mathematical space; the n-dimensional volumes bounding each class define each specific tissue type. The classification results are rendered again in real-space by colored-coding each grouped class of pixels (associated with differing tissue types). This classification method is especially well suited for class volumes with complex boundary shapes, and is also expected to robustly detect abnormal tissue classes. The classification process is demonstrated using a three dimensional data set of MRI scans of a human brain tumor.
A Computational Algorithm for Functional Clustering of Proteome Dynamics During Development
Wang, Yaqun; Wang, Ningtao; Hao, Han; Guo, Yunqian; Zhen, Yan; Shi, Jisen; Wu, Rongling
2014-01-01
Phenotypic traits, such as seed development, are a consequence of complex biochemical interactions among genes, proteins and metabolites, but the underlying mechanisms that operate in a coordinated and sequential manner remain elusive. Here, we address this issue by developing a computational algorithm to monitor proteome changes during the course of trait development. The algorithm is built within the mixture-model framework in which each mixture component is modeled by a specific group of proteins that display a similar temporal pattern of expression in trait development. A nonparametric approach based on Legendre orthogonal polynomials was used to fit dynamic changes of protein expression, increasing the power and flexibility of protein clustering. By analyzing a dataset of proteomic dynamics during early embryogenesis of the Chinese fir, the algorithm has successfully identified several distinct types of proteins that coordinate with each other to determine seed development in this forest tree commercially and environmentally important to China. The algorithm will find its immediate applications for the characterization of mechanistic underpinnings for any other biological processes in which protein abundance plays a key role. PMID:24955031
Automated spike sorting algorithm based on Laplacian eigenmaps and k-means clustering
NASA Astrophysics Data System (ADS)
Chah, E.; Hok, V.; Della-Chiesa, A.; Miller, J. J. H.; O'Mara, S. M.; Reilly, R. B.
2011-02-01
This study presents a new automatic spike sorting method based on feature extraction by Laplacian eigenmaps combined with k-means clustering. The performance of the proposed method was compared against previously reported algorithms such as principal component analysis (PCA) and amplitude-based feature extraction. Two types of classifier (namely k-means and classification expectation-maximization) were incorporated within the spike sorting algorithms, in order to find a suitable classifier for the feature sets. Simulated data sets and in-vivo tetrode multichannel recordings were employed to assess the performance of the spike sorting algorithms. The results show that the proposed algorithm yields significantly improved performance with mean sorting accuracy of 73% and sorting error of 10% compared to PCA which combined with k-means had a sorting accuracy of 58% and sorting error of 10%. A correction was made to this article on 22 February 2011. The spacing of the title was amended on the abstract page. No changes were made to the article PDF and the print version was unaffected.
Fong, Simon
2012-01-01
Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm. PMID:22619492
Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-01-01
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network's running and the degree of candidate nodes' effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-01-01
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the networkâ€™s running and the degree of candidate nodesâ€™ effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
NASA Astrophysics Data System (ADS)
Schuetter, Jared Michael
Excavating cairns in southern Arabia is a way for anthropologists to understand which factors led ancient settlers to transition from a pastoral lifestyle and tribal narrative to the formation of states that exist today. Locating these monuments has traditionally been done in the field, relying on eyewitness reports and costly searches through the arid landscape. In this thesis, an algorithm for automatically detecting cairns in satellite imagery is presented. The algorithm uses a set of filters in a window based approach to eliminate background pixels and other objects that do not look like cairns. The resulting set of detected objects constitutes fewer than 0.001% of the pixels in the satellite image, and contains the objects that look the most like cairns in imagery. When a training set of cairns is available, a further reduction of this set of objects can take place, along with a likelihood-based ranking system. To aid in cairn detection, the satellite image is also clustered to determine land-form classes that tend to be consistent with the presence of cairns. Due to the large number of pixels in the image, a subsample spectral clustering algorithm called "Multiple Sample Data Spectroscopic clustering" is used. This multiple sample clustering procedure is motivated by perturbation studies on single sample spectral algorithms. The studies, presented in this thesis, show that sampling variability in the single sample approach can cause an unsatisfactory level of instability in clustering results. The multiple sample data spectroscopic clustering algorithm is intended to stabilize this perturbation by combining information from different samples. While sampling variability is still present, the use of multiple samples mitigates its effect on cluster results. Finally, a step-through of the cairn detection algorithm and satellite image clustering are given for an image in the Hadramawt region of Yemen. The top ranked detected objects are presented, and a discussion of parameter selection and future work follows.
Are judgments a form of data clustering? Reexamining contrast effects with the k-means algorithm.
Boillaud, Eric; Molina, Guylaine
2015-04-01
A number of theories have been proposed to explain in precise mathematical terms how statistical parameters and sequential properties of stimulus distributions affect category ratings. Various contextual factors such as the mean, the midrange, and the median of the stimuli; the stimulus range; the percentile rank of each stimulus; and the order of appearance have been assumed to influence judgmental contrast. A data clustering reinterpretation of judgmental relativity is offered wherein the influence of the initial choice of centroids on judgmental contrast involves 2 combined frequency and consistency tendencies. Accounts of the k-means algorithm are provided, showing good agreement with effects observed on multiple distribution shapes and with a variety of interaction effects relating to the number of stimuli, the number of response categories, and the method of skewing. Experiment 1 demonstrates that centroid initialization accounts for contrast effects obtained with stretched distributions. Experiment 2 demonstrates that the iterative convergence inherent to the k-means algorithm accounts for the contrast reduction observed across repeated blocks of trials. The concept of within-cluster variance minimization is discussed, as is the applicability of a backward k-means calculation method for inferring, from empirical data, the values of the centroids that would serve as a representation of the judgmental context. PMID:25706770
A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm
de Brito, Daniel M.; Maracaja-Coutinho, Vinicius; de Farias, Savio T.; Batista, Leonardo V.; do RĂŞgo, ThaĂs G.
2016-01-01
Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIPâ€”Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657
A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.
de Brito, Daniel M; Maracaja-Coutinho, Vinicius; de Farias, Savio T; Batista, Leonardo V; do RĂŞgo, ThaĂs G
2016-01-01
Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP-Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657
Contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation
NASA Astrophysics Data System (ADS)
Theiler, James P.; Gisler, Galen
1997-10-01
The recent and continuing construction of multi- and hyper-spectral imagers will provide detailed data cubes with information in both the spatial and spectral domain. This data shows great promise for remote sensing applications ranging from environmental and agricultural to national security interest. The reduction of this voluminous data to useful intermediate forms is necessary both for downlinking all those bits and for interpreting them. Smart on-board hardware is required, as well as sophisticated earth-bound processing. A segmented image is one kind of intermediate form which provides some measure of data compression. Traditional image segmentation algorithms treat pixels independently and cluster the pixels according only to their spectral information. This neglects the implicit spatial information that is available in the image. We will suggest a simple approach - a variant of the standard k-means algorithm - which uses both spatial and spectral properties of the image. The segmented image has the property that pixels which are spatially continuous are more likely to be in the same class than are random pairs of pixels. This property naturally comes at some cost in terms o of the compactness of the clusters in the spectral domain,but we have found that the spatial contiguity and spectral compactness properties are nearly 'orthogonal', which means that we can make considerable improvements in the one with minimal loss in the other.
A contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation
Theiler, J.; Gisler, G.
1997-07-01
The recent and continuing construction of multi and hyper spectral imagers will provide detailed data cubes with information in both the spatial and spectral domain. This data shows great promise for remote sensing applications ranging from environmental and agricultural to national security interests. The reduction of this voluminous data to useful intermediate forms is necessary both for downlinking all those bits and for interpreting them. Smart onboard hardware is required, as well as sophisticated earth bound processing. A segmented image (in which the multispectral data in each pixel is classified into one of a small number of categories) is one kind of intermediate form which provides some measure of data compression. Traditional image segmentation algorithms treat pixels independently and cluster the pixels according only to their spectral information. This neglects the implicit spatial information that is available in the image. We will suggest a simple approach; a variant of the standard k-means algorithm which uses both spatial and spectral properties of the image. The segmented image has the property that pixels which are spatially contiguous are more likely to be in the same class than are random pairs of pixels. This property naturally comes at some cost in terms of the compactness of the clusters in the spectral domain, but we have found that the spatial contiguity and spectral compactness properties are nearly orthogonal, which means that we can make considerable improvements in the one with minimal loss in the other.
Analysis Clustering of Electricity Usage Profile Using K-Means Algorithm
NASA Astrophysics Data System (ADS)
Amri, Yasirli; Lailatul Fadhilah, Amanda; Fatmawati; Setiani, Novi; Rani, Septia
2016-01-01
Electricity is one of the most important needs for human life in many sectors. Demand for electricity will increase in line with population and economic growth. Adjustment of the amount of electricity production in specified time is important because the cost of storing electricity is expensive. For handling this problem, we need knowledge about the electricity usage pattern of clients. This pattern can be obtained by using clustering techniques. In this paper, clustering is used to obtain the similarity of electricity usage patterns in a specified time. We use K-Means algorithm to employ clustering on the dataset of electricity consumption from 370 clients that collected in a year. Result of this study, we obtained an interesting pattern that there is a big group of clients consume the lowest electric load in spring season, but in another group, the lowest electricity consumption occurred in winter season. From this result, electricity provider can make production planning in specified season based on pattern of electricity usage profile.
Jiang, Peng; Xu, Yiming; Wu, Feng
2016-01-01
Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption. PMID:26784193
Jiang, Peng; Xu, Yiming; Wu, Feng
2016-01-01
Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption. PMID:26784193
Xue, Zhong; Shen, Dinggang; Li, Hai; Wong, Stephen
2010-01-01
The traditional fuzzy clustering algorithm and its extensions have been successfully applied in medical image segmentation. However, because of the variability of tissues and anatomical structures, the clustering results might be biased by the tissue population and intensity differences. For example, clustering-based algorithms tend to over-segment white matter tissues of MR brain images. To solve this problem, we introduce a tissue probability map constrained clustering algorithm and apply it to serial MR brain image segmentation, i.e., a series of 3-D MR brain images of the same subject at different time points. Using the new serial image segmentation algorithm in the framework of the CLASSIC framework, which iteratively segments the images and estimates the longitudinal deformations, we improved both accuracy and robustness for serial image computing, and at the mean time produced longitudinally consistent segmentation and stable measures. In the algorithm, the tissue probability maps consist of both the population-based and subject-specific segmentation priors. Experimental study using both simulated longitudinal MR brain data and the Alzheimerâ€™s Disease Neuroimaging Initiative (ADNI) data confirmed that using both priors more accurate and robust segmentation results can be obtained. The proposed algorithm can be applied in longitudinal follow up studies of MR brain imaging with subtle morphological changes for neurological disorders. PMID:26566399
`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny
NASA Astrophysics Data System (ADS)
Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila
2010-10-01
Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.
A novel non-overlapping bi-clustering algorithm for network generation using living cell array data
Yang, E.; Foteinou, P.T.; King, K.R.; Yarmush, M.L.
2011-01-01
Motivation The living cell array quantifies the contribution of activated transcription factors upon the expression levels of their target genes. The direct manipulation of the regulatory mechanisms offers enormous possibilities for deciphering the machinery that activates and controls gene expression. We propose a novel bi-clustering algorithm for generating non-overlapping clusters of reporter genes and conditions and demonstrate how this information can be interpreted in order to assist in the construction of transcription factor interaction networks. PMID:17827207
2016-01-01
The early diagnosis of breast cancer is an important step in a fight against the disease. Machine learning techniques have shown promise in improving our understanding of the disease. As medical datasets consist of data points which cannot be precisely assigned to a class, fuzzy methods have been useful for studying of these datasets. Sometimes breast cancer datasets are described by categorical features. Many fuzzy clustering algorithms have been developed for categorical datasets. However, in most of these methods Hamming distance is used to define the distance between the two categorical feature values. In this paper, we use a probabilistic distance measure for the distance computation among a pair of categorical feature values. Experiments demonstrate that the distance measure performs better than Hamming distance for Wisconsin breast cancer data.
NASA Astrophysics Data System (ADS)
Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok
2015-01-01
This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, 'Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.
Clustering by fuzzy neural gas and evaluation of fuzzy clusters.
Geweniger, Tina; Fischer, Lydia; Kaden, Marika; Lange, Mandy; Villmann, Thomas
2013-01-01
We consider some modifications of the neural gas algorithm. First, fuzzy assignments as known from fuzzy c-means and neighborhood cooperativeness as known from self-organizing maps and neural gas are combined to obtain a basic Fuzzy Neural Gas. Further, a kernel variant and a simulated annealing approach are derived. Finally, we introduce a fuzzy extension of the ConnIndex to obtain an evaluation measure for clusterings based on fuzzy vector quantization. PMID:24396342
2011-01-01
Background With the completion of the international HapMap project, many studies have been conducted to investigate the association between complex diseases and haplotype variants. Such haplotype-based association studies, however, often face two difficulties; one is the large number of haplotype configurations in the chromosome region under study, and the other is the ambiguity in haplotype phase when only genotype data are observed. The latter complexity may be handled based on an EM algorithm with family data incorporated, whereas the former can be more problematic, especially when haplotypes of rare frequencies are involved. Here based on family data we propose to cluster long haplotypes of linked SNPs in a biological sense, so that the number of haplotypes can be reduced and the power of statistical tests of association can be increased. Results In this paper we employ family genotype data and combine a clustering scheme with a likelihood ratio statistic to test the association between quantitative phenotypes and haplotype variants. Haplotypes are first grouped based on their evolutionary closeness to establish a set containing core haplotypes. Then, we construct for each family the transmission and non-transmission phase in terms of these core haplotypes, taking into account simultaneously the phase ambiguity as weights. The likelihood ratio test (LRT) is next conducted with these weighted and clustered haplotypes to test for association with disease. This combination of evolution-guided haplotype clustering and weighted assignment in LRT is able, via its core-coding system, to incorporate into analysis both haplotype phase ambiguity and transmission uncertainty. Simulation studies show that this proposed procedure is more informative and powerful than three family-based association tests, FAMHAP, FBAT, and an LRT with a group consisting exclusively of rare haplotypes. Conclusions The proposed procedure takes into account the uncertainty in phase determination and in transmission, utilizes the evolutionary information contained in haplotypes, reduces the dimension in haplotype space and the degrees of freedom in tests, and performs better in association studies. This evolution-guided clustering procedure is particularly useful for long haplotypes containing linked SNPs, and is applicable to other haplotype-based association tests. This procedure is now implemented in R and is free for download. PMID:21592403
NASA Astrophysics Data System (ADS)
Bagheripour, Parisa; Asoodeh, Mojtaba
2013-12-01
Porosity, the void portion of reservoir rocks, determines the volume of hydrocarbon accumulation and has a great control on assessment and development of hydrocarbon reservoirs. Accurate determination of porosity from core analysis is highly cost, time, and labor intensive. Therefore, the mission of finding an accurate, fast and cheap way of determining porosity is unavoidable. On the other hand, conventional well log data, available in almost all wells contain invaluable implicit information about the porosity. Therefore, an intelligent system can explicate this information. Fuzzy logic is a powerful tool for handling geosciences problem which is associated with uncertainty. However, determination of the best fuzzy formulation is still an issue. This study purposes an improved strategy, called hybrid genetic algorithm-pattern search (GA-PS) technique, against the widely held subtractive clustering (SC) method for setting up fuzzy rules between core porosity and petrophysical logs. Hybrid GA-PS technique is capable of extracting optimal parameters for fuzzy clusters (membership functions) which consequently results in the best fuzzy formulation. Results indicate that GA-PS technique manipulates both mean and variance of Gaussian membership functions contrary to SC that only has a control on mean of Gaussian membership functions. A comparison between hybrid GA-PS technique and SC method confirmed the superiority of GA-PS technique in setting up fuzzy rules. The proposed strategy was successfully applied to one of the Iranian carbonate reservoir rocks.
Cancer Subtype Discovery and Biomarker Identification via a New Robust Network Clustering Algorithm
Wu, Meng-Yun; Dai, Dao-Qing; Zhang, Xiao-Fei; Zhu, Yuan
2013-01-01
In cancer biology, it is very important to understand the phenotypic changes of the patients and discover new cancer subtypes. Recently, microarray-based technologies have shed light on this problem based on gene expression profiles which may contain outliers due to either chemical or electrical reasons. These undiscovered subtypes may be heterogeneous with respect to underlying networks or pathways, and are related with only a few of interdependent biomarkers. This motivates a need for the robust gene expression-based methods capable of discovering such subtypes, elucidating the corresponding network structures and identifying cancer related biomarkers. This study proposes a penalized model-based Student’s t clustering with unconstrained covariance (PMT-UC) to discover cancer subtypes with cluster-specific networks, taking gene dependencies into account and having robustness against outliers. Meanwhile, biomarker identification and network reconstruction are achieved by imposing an adaptive penalty on the means and the inverse scale matrices. The model is fitted via the expectation maximization algorithm utilizing the graphical lasso. Here, a network-based gene selection criterion that identifies biomarkers not as individual genes but as subnetworks is applied. This allows us to implicate low discriminative biomarkers which play a central role in the subnetwork by interconnecting many differentially expressed genes, or have cluster-specific underlying network structures. Experiment results on simulated datasets and one available cancer dataset attest to the effectiveness, robustness of PMT-UC in cancer subtype discovering. Moveover, PMT-UC has the ability to select cancer related biomarkers which have been verified in biochemical or biomedical research and learn the biological significant correlation among genes. PMID:23799085
Study of cluster reconstruction and track fitting algorithms for CGEM-IT at BESIII
NASA Astrophysics Data System (ADS)
Yue, Guo; Liang-Liang, Wang; Xu-Dong, Ju; Ling-Hui, Wu; Qing-Lei, Xiu; Hai-Xia, Wang; Ming-Yi, Dong; Jing-Ran, Hu; Wei-Dong, Li; Wei-Guo, Li; Huai-Min, Liu; Qun, Ou-Yang; Xiao-Yan, Shen; Ye, Yuan; Yao, Zhang
2016-01-01
Considering the effects of aging on the existing Inner Drift Chamber (IDC) of BESIII, a GEM-based inner tracker, the Cylindrical-GEM Inner Tracker (CGEM-IT), is proposed to be designed and constructed as an upgrade candidate for the IDC. This paper introduces a full simulation package for the CGEM-IT with a simplified digitization model, and describes the development of software for cluster reconstruction and track fitting, using a track fitting algorithm based on the Kalman filter method. Preliminary results for the reconstruction algorithms which are obtained using a Monte Carlo sample of single muon events in the CGEM-IT, show that the CGEM-IT has comparable momentum resolution and transverse vertex resolution to the IDC, and a better z-direction resolution than the IDC. Supported by National Key Basic Research Program of China (2015CB856700), National Natural Science Foundation of China (11205184, 11205182) and Joint Funds of National Natural Science Foundation of China (U1232201)
Collaborative fuzzy clustering from multiple weighted views.
Jiang, Yizhang; Chung, Fu-Lai; Wang, Shitong; Deng, Zhaohong; Wang, Jun; Qian, Pengjiang
2015-04-01
Clustering with multiview data is becoming a hot topic in data mining, pattern recognition, and machine learning. In order to realize an effective multiview clustering, two issues must be addressed, namely, how to combine the clustering result from each view and how to identify the importance of each view. In this paper, based on a newly proposed objective function which explicitly incorporates two penalty terms, a basic multiview fuzzy clustering algorithm, called collaborative fuzzy c-means (Co-FCM), is firstly proposed. It is then extended into its weighted view version, called weighted view collaborative fuzzy c-means (WV-Co-FCM), by identifying the importance of each view. The WV-Co-FCM algorithm indeed tackles the above two issues simultaneously. Its relationship with the latest multiview fuzzy clustering algorithm Collaborative Fuzzy K-Means (Co-FKM) is also revealed. Extensive experimental results on various multiview datasets indicate that the proposed WV-Co-FCM algorithm outperforms or is at least comparable to the existing state-of-the-art multitask and multiview clustering algorithms and the importance of different views of the datasets can be effectively identified. PMID:25069132
NASA Technical Reports Server (NTRS)
Dasarathy, B. V.
1976-01-01
An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.
Structural damage detection by fuzzy clustering
NASA Astrophysics Data System (ADS)
da Silva, Samuel; Dias Júnior, Milton; Lopes Junior, Vicente; Brennan, Michael J.
2008-10-01
The development of strategies for structural health monitoring (SHM) has become increasingly important because of the necessity of preventing undesirable damage. This paper describes an approach to this problem using vibration data. It involves a three-stage process: reduction of the time-series data using principle component analysis (PCA), the development of a data-based model using an auto-regressive moving average (ARMA) model using data from an undamaged structure, and the classification of whether or not the structure is damaged using a fuzzy clustering approach. The approach is applied to data from a benchmark structure from Los Alamos National Laboratory, USA. Two fuzzy clustering algorithms are compared: fuzzy c-means (FCM) and Gustafson-Kessel (GK) algorithms. It is shown that while both fuzzy clustering algorithms are effective, the GK algorithm marginally outperforms the FCM algorithm.
Wang, Xueyi
2011-01-01
The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 106 records and 104 dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces. PMID:22247818
Marchal, RĂ©mi; CarbonniĂ¨re, Philippe; Pouchan, Claude
2015-01-22
The study of atomic clusters has become an increasingly active area of research in the recent years because of the fundamental interest in studying a completely new area that can bridge the gap between atomic and solid state physics. Due to their specific properties, such compounds are of great interest in the field of nanotechnology [1,2]. Here, we would present our GSAM algorithm based on a DFT exploration of the PES to find the low lying isomers of such compounds. This algorithm includes the generation of an intial set of structure from which the most relevant are selected. Moreover, an optimization process, called raking optimization, able to discard step by step all the non physically reasonnable configurations have been implemented to reduce the computational cost of this algorithm. Structural properties of Ga{sub n}Asm clusters will be presented as an illustration of the method.
NASA Astrophysics Data System (ADS)
Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin
2011-04-01
To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.
Relational data clustering with incomplete data
NASA Astrophysics Data System (ADS)
Hathaway, Richard J.; Overstreet, Dessa D.; Murphy, Thomas E.; Bezdek, James C.
2001-03-01
We consider the problem of clustering a set of objects which are represented by rational data in the form of a dissimilarity matrix which has missing values. Three methods are developed to estimate the missing values, all based on simple triangle inequality-based approximation schemes. With few exceptions, any relational clustering algorithm can then be applied to the completed data matrix to obtain nice clusters. We illustrate our approach by clustering incomplete data built from several data sets. The primary clustering method chosen for our numerical experiments is the non-Euclidean relational fuzzy c-means algorithm. Our examples show that satisfactory clusters can still be obtained even when roughly half of the distance values are missing before completion.
Possibilistic clustering for shape recognition
NASA Technical Reports Server (NTRS)
Keller, James M.; Krishnapuram, Raghu
1993-01-01
Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, the clustering problem was cast into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. An appropriate objective function whose minimum will characterize a good possibilistic partition of the data was constructed, and the membership and prototype update equations from necessary conditions for minimization of our criterion function were derived. The ability of this approach to detect linear and quartic curves in the presence of considerable noise is shown.
Blood vessel extraction and optic disc removal using curvelet transform and kernel fuzzy c-means.
Kar, Sudeshna Sil; Maity, Santi P
2016-03-01
This paper proposes an automatic blood vessel extraction method on retinal images using matched filtering in an integrated system design platform that involves curvelet transform and kernel based fuzzy c-means. Since curvelet transform represents the lines, the edges and the curvatures very well and in compact form (by less number of coefficients) compared to other multi-resolution techniques, this paper uses curvelet transform for enhancement of the retinal vasculature. Matched filtering is then used to intensify the blood vessels' response which is further employed by kernel based fuzzy c-means algorithm that extracts the vessel silhouette from the background through non-linear mapping. For pathological images, in addition to matched filtering, Laplacian of Gaussian filter is also employed to distinguish the step and the ramp like signal from that of vessel structure. To test the efficacy of the proposed method, the algorithm has also been applied to images in presence of additive white Gaussian noise where the curvelet transform has been used for image denoising. Performance is evaluated on publicly available DRIVE, STARE and DIARETDB1 databases and is compared with the large number of existing blood vessel extraction methodologies. Simulation results demonstrate that the proposed method is very much efficient in detecting the long and the thick as well as the short and the thin vessels with an average accuracy of 96.16% for the DRIVE and 97.35% for the STARE database wherein the existing methods fail to extract the tiny and the thin vessels. PMID:26848729
NASA Technical Reports Server (NTRS)
Dasarathy, B. V.
1976-01-01
Learning of discriminant hyperplanes in imperfectly supervised or unsupervised training sample sets with unreliably labeled samples along the fuzzy joint boundaries between sample clusters is discussed, with the discriminant hyperplane designed to be a least-squares fit to the unreliably labeled data points. (Samples along the fuzzy boundary jump back and forth from one cluster to the other in recursive cluster stabilization and are considered unreliably labeled.) Minimization of the distances of these unreliably labeled samples from the hyperplanes does not sacrifice the ability to discriminate between classes represented by reliably labeled subsets of samples. An equivalent unconstrained linear inequality problem is formulated and algorithms for its solution are indicated. Landsat earth sensing data were used in confirming the validity and computational feasibility of the approach, which should be useful in deriving discriminant hyperplanes separating clusters with fuzzy boundaries, given supervised training sample sets with unreliably labeled boundary samples.
NASA Astrophysics Data System (ADS)
Xiao, Yong Liang
Molecular packing, clustering, and docking computations have been performed by empirical intermolecular energy minimization methods. The main focus of this study is finding a robust global search algorithm to solve intermolecular interaction problems, especially to apply an efficient algorithm to large-scale complex molecular systems such as drug-DNA binding or site selectivity which has increasing importance in drug design and drug discovery. Molecular packing in benzene, naphthalene, and anthracene crystals is analyzed in terms of molecular dimer interaction. Intermolecular energies of the gas dimer molecules are calculated for various intermolecular distances and orientations using empirical potential energy functions. The gas dimers are compared to pairs of molecules extracted from the observed crystal structures. Net atomic charges are obtained by the potential-derived method from 6-31G and 6-31G^{**} level ab initio wavefunctions. A new approach using a genetic algorithm is applied to predict structures of benzene, naphthalene, and anthracene molecular clusters. The computer program GAME (genetic algorithm for minimization of energy) has been developed to obtain the global energy minimum of clusters of dimer, trimer, and tetramer molecules. This test model has been further developed to applications of molecular docking. Docking calculations of deoxyguanosine molecules to actinomycin D were performed successfully to identify the binding sites of the drug molecule, which was revealed by actinomycin D-deoxyguanosine complex from the solved x-ray crystal structure. The comparison between the evolutionary computing method and conventional local optimization methods concluded that genetic algorithms are very competitive when it comes to complex, large-scale optimization. Full power of genetic algorithms can be unveiled in computer-assisted drug design only when the difficulties of including optimized molecular conformation in the algorithm are overcome. These problems have been analyzed and some alternative solutions are formulated with the techniques being discussed in detail.
Fleisch, Markus C.; Maxell, Christopher A.; Kuper, Claudia K.; Brown, Erika T.; Parvin, Bahram; Barcellos-Hoff, Mary-Helen; Costes,Sylvain V.
2006-03-08
Centrosomes are small organelles that organize the mitoticspindle during cell division and are also involved in cell shape andpolarity. Within epithelial tumors, such as breast cancer, and somehematological tumors, centrosome abnormalities (CA) are common, occurearly in disease etiology, and correlate with chromosomal instability anddisease stage. In situ quantification of CA by optical microscopy ishampered by overlap and clustering of these organelles, which appear asfocal structures. CA has been frequently associated with Tp53 status inpremalignant lesions and tumors. Here we describe an approach toaccurately quantify centrosomes in tissue sections and tumors.Considering proliferation and baseline amplification rate the resultingpopulation based ratio of centrosomes per nucleus allow the approximationof the proportion of cells with CA. Using this technique we show that20-30 percent of cells have amplified centrosomes in Tp53 null mammarytumors. Combining fluorescence detection, deconvolution microscopy and amathematical algorithm applied to a maximum intensity projection we showthat this approach is superior to traditional investigator based visualanalysis or threshold-based techniques.
NASA Astrophysics Data System (ADS)
Chang, Seongmin; Baek, Sungmin; Kim, Ki-Ook; Cho, Maenghyo
2015-06-01
A system identification method has been proposed to validate finite element models of complex structures using measured modal data. Finite element method is used for the system identification as well as the structural analysis. In perturbation methods, the perturbed system is expressed as a combination of the baseline structure and the related perturbations. The changes in dynamic responses are applied to determine the structural modifications so that the equilibrium may be satisfied in the perturbed system. In practical applications, the dynamic measurements are carried out on a limited number of accessible nodes and associated degrees of freedom. The equilibrium equation is, in principle, expressed in terms of the measured (master, primary) and unmeasured (slave, secondary) degrees of freedom. Only the specified degrees of freedom are included in the equation formulation for identification and the unspecified degrees of freedom are eliminated through the iterative improved reduction scheme. A large number of system parameters are included as the unknown variables in the system identification of large-scaled structures. The identification problem with large number of system parameters requires a large amount of computation time and resources. In the present study, a hierarchical clustering algorithm is applied to reduce the number of system parameters effectively. Numerical examples demonstrate that the proposed method greatly improves the accuracy and efficiency in the inverse problem of identification.
Clustering gene expression data using a diffraction?inspired framework
2012-01-01
Background The recent developments in microarray technology has allowed for the simultaneous measurement of gene expression levels. The large amount of captured data challenges conventional statistical tools for analysing and finding inherent correlations between genes and samples. The unsupervised clustering approach is often used, resulting in the development of a wide variety of algorithms. Typical clustering algorithms require selecting certain parameters to operate, for instance the number of expected clusters, as well as defining a similarity measure to quantify the distance between data points. The diffraction?based clustering algorithm however is designed to overcome this necessity for user?defined parameters, as it is able to automatically search the data for any underlying structure. Methods The diffraction?based clustering algorithm presented in this paper is tested using five well?known expression datasets pertaining to cancerous tissue samples. The clustering results are then compared to those results obtained from conventional algorithms such as the k?means, fuzzy c?means, self?organising map, hierarchical clustering algorithm, Gaussian mixture model and density?based spatial clustering of applications with noise (DBSCAN). The performance of each algorithm is measured using an average external criterion and an average validity index. Results The diffraction?based clustering algorithm is shown to be independent of the number of clusters as the algorithm searches the feature space and requires no form of parameter selection. The results show that the diffraction?based clustering algorithm performs significantly better on the real biological datasets compared to the other existing algorithms. Conclusion The results of the diffraction?based clustering algorithm presented in this paper suggest that the method can provide researchers with a new tool for successfully analysing microarray data. PMID:23164195
NASA Astrophysics Data System (ADS)
Chehdi, Kacem; Taher, Akar; Cariou, Claude
2015-11-01
A stable and unsupervised version of the fuzzy C-means algorithm, named FCM-optimized (FCMO), is presented. The originality of the proposed algorithm stems from (1) the introduction of an adaptive incremental procedure to initialize class centers, which makes the algorithm stable and deterministic; therefore, the classification results do not vary from one run to another and (2) the use of an unsupervised evaluation criterion to estimate the optimal number of classes. The validation of FCMO with regard to stability, reliability in class number estimation, and classification efficiency is shown through experimental results on synthetic monocomponent and real multicomponent images.
Rough-fuzzy clustering for grouping functionally similar genes from microarray data.
Maji, Pradipta; Paul, Sushmita
2013-01-01
Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets. PMID:22848138
Cloud classification from satellite data using a fuzzy sets algorithm: A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1988-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
NASA Technical Reports Server (NTRS)
1975-01-01
The implementation of the algorithms used in the flight program to approximate elementary functions and mathematical procedures was checked. This was done by verifying that at least one, and in most cases, more than one function computed through the use of the algorithms was calculated properly. The following algorithms were checked: sine-cosine, arctangent, natural logarithm, square root, inverse square root, as well as the vector dot and cross products.
An Algorithm for Testing Unidimensionality and Clustering Items in Rasch Measurement
ERIC Educational Resources Information Center
Debelak, Rudolf; Arendasy, Martin
2012-01-01
A new approach to identify item clusters fitting the Rasch model is described and evaluated using simulated and real data. The proposed method is based on hierarchical cluster analysis and constructs clusters of items that show a good fit to the Rasch model. It thus gives an estimate of the number of independent scales satisfying the postulates ofâ€¦
Yang, Liu; Lu, Yinzhi; Zhong, Yuanchang; Wu, Xuegang; Yang, Simon X.
2015-01-01
Energy resource limitation is a severe problem in traditional wireless sensor networks (WSNs) because it restricts the lifetime of network. Recently, the emergence of energy harvesting techniques has brought with them the expectation to overcome this problem. In particular, it is possible for a sensor node with energy harvesting abilities to work perpetually in an Energy Neutral state. In this paper, a Multi-hop Energy Neutral Clustering (MENC) algorithm is proposed to construct the optimal multi-hop clustering architecture in energy harvesting WSNs, with the goal of achieving perpetual network operation. All cluster heads (CHs) in the network act as routers to transmit data to base station (BS) cooperatively by a multi-hop communication method. In addition, by analyzing the energy consumption of intra- and inter-cluster data transmission, we give the energy neutrality constraints. Under these constraints, every sensor node can work in an energy neutral state, which in turn provides perpetual network operation. Furthermore, the minimum network data transmission cycle is mathematically derived using convex optimization techniques while the network information gathering is maximal. Simulation results show that our protocol can achieve perpetual network operation, so that the consistent data delivery is guaranteed. In addition, substantial improvements on the performance of network throughput are also achieved as compared to the famous traditional clustering protocol LEACH and recent energy harvesting aware clustering protocols. PMID:26712764
Yang, Liu; Lu, Yinzhi; Zhong, Yuanchang; Wu, Xuegang; Yang, Simon X
2015-01-01
Energy resource limitation is a severe problem in traditional wireless sensor networks (WSNs) because it restricts the lifetime of network. Recently, the emergence of energy harvesting techniques has brought with them the expectation to overcome this problem. In particular, it is possible for a sensor node with energy harvesting abilities to work perpetually in an Energy Neutral state. In this paper, a Multi-hop Energy Neutral Clustering (MENC) algorithm is proposed to construct the optimal multi-hop clustering architecture in energy harvesting WSNs, with the goal of achieving perpetual network operation. All cluster heads (CHs) in the network act as routers to transmit data to base station (BS) cooperatively by a multi-hop communication method. In addition, by analyzing the energy consumption of intra- and inter-cluster data transmission, we give the energy neutrality constraints. Under these constraints, every sensor node can work in an energy neutral state, which in turn provides perpetual network operation. Furthermore, the minimum network data transmission cycle is mathematically derived using convex optimization techniques while the network information gathering is maximal. Simulation results show that our protocol can achieve perpetual network operation, so that the consistent data delivery is guaranteed. In addition, substantial improvements on the performance of network throughput are also achieved as compared to the famous traditional clustering protocol LEACH and recent energy harvesting aware clustering protocols. PMID:26712764
Automatic segmentation of corpus callosum using Gaussian mixture modeling and Fuzzy C means methods.
Ä°Ă§er, Semra
2013-10-01
This paper presents a comparative study of the success and performance of the Gaussian mixture modeling and Fuzzy C means methods to determine the volume and cross-sectionals areas of the corpus callosum (CC) using simulated and real MR brain images. The Gaussian mixture model (GMM) utilizes weighted sum of Gaussian distributions by applying statistical decision procedures to define image classes. In the Fuzzy C means (FCM), the image classes are represented by certain membership function according to fuzziness information expressing the distance from the cluster centers. In this study, automatic segmentation for midsagittal section of the CC was achieved from simulated and real brain images. The volume of CC was obtained using sagittal sections areas. To compare the success of the methods, segmentation accuracy, Jaccard similarity and time consuming for segmentation were calculated. The results show that the GMM method resulted by a small margin in more accurate segmentation (midsagittal section segmentation accuracy 98.3% and 97.01% for GMM and FCM); however the FCM method resulted in faster segmentation than GMM. With this study, an accurate and automatic segmentation system that allows opportunity for quantitative comparison to doctors in the planning of treatment and the diagnosis of diseases affecting the size of the CC was developed. This study can be adapted to perform segmentation on other regions of the brain, thus, it can be operated as practical use in the clinic. PMID:23871683
NASA Astrophysics Data System (ADS)
Valaparla, Sunil K.; Peng, Qi; Gao, Feng; Clarke, Geoffrey D.
2014-03-01
Accurate measurements of human body fat distribution are desirable because excessive body fat is associated with impaired insulin sensitivity, type 2 diabetes mellitus (T2DM) and cardiovascular disease. In this study, we hypothesized that the performance of water suppressed (WS) MRI is superior to non-water suppressed (NWS) MRI for volumetric assessment of abdominal subcutaneous (SAT), intramuscular (IMAT), visceral (VAT), and total (TAT) adipose tissues. We acquired T1-weighted images on a 3T MRI system (TIM Trio, Siemens), which was analyzed using semi-automated segmentation software that employs a fuzzy c-means (FCM) clustering algorithm. Sixteen contiguous axial slices, centered at the L4-L5 level of the abdomen, were acquired in eight T2DM subjects with water suppression (WS) and without (NWS). Histograms from WS images show improved separation of non-fatty tissue pixels from fatty tissue pixels, compared to NWS images. Paired t-tests of WS versus NWS showed a statistically significant lower volume of lipid in the WS images for VAT (145.3 cc less, p=0.006) and IMAT (305 cc less, p<0.001), but not SAT (14.1 cc more, NS). WS measurements of TAT also resulted in lower fat volumes (436.1 cc less, p=0.002). There is strong correlation between WS and NWS quantification methods for SAT measurements (r=0.999), but poorer correlation for VAT studies (r=0.845). These results suggest that NWS pulse sequences may overestimate adipose tissue volumes and that WS pulse sequences are more desirable due to the higher contrast generated between fatty and non-fatty tissues.
Sumithra, Subramaniam; Victoire, T. Aruldoss Albert
2015-01-01
Due to large dimension of clusters and increasing size of sensor nodes, finding the optimal route and cluster for large wireless sensor networks (WSN) seems to be highly complex and cumbersome. This paper proposes a new method to determine a reasonably better solution of the clustering and routing problem with the highest concern of efficient energy consumption of the sensor nodes for extending network life time. The proposed method is based on the Differential Evolution (DE) algorithm with an improvised search operator called Diversified Vicinity Procedure (DVP), which models a trade-off between energy consumption of the cluster heads and delay in forwarding the data packets. The obtained route using the proposed method from all the gateways to the base station is comparatively lesser in overall distance with less number of data forwards. Extensive numerical experiments demonstrate the superiority of the proposed method in managing energy consumption of the WSN and the results are compared with the other algorithms reported in the literature. PMID:26516635
Sumithra, Subramaniam; Victoire, T Aruldoss Albert
2015-01-01
Due to large dimension of clusters and increasing size of sensor nodes, finding the optimal route and cluster for large wireless sensor networks (WSN) seems to be highly complex and cumbersome. This paper proposes a new method to determine a reasonably better solution of the clustering and routing problem with the highest concern of efficient energy consumption of the sensor nodes for extending network life time. The proposed method is based on the Differential Evolution (DE) algorithm with an improvised search operator called Diversified Vicinity Procedure (DVP), which models a trade-off between energy consumption of the cluster heads and delay in forwarding the data packets. The obtained route using the proposed method from all the gateways to the base station is comparatively lesser in overall distance with less number of data forwards. Extensive numerical experiments demonstrate the superiority of the proposed method in managing energy consumption of the WSN and the results are compared with the other algorithms reported in the literature. PMID:26516635
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Malek, H.
1978-01-01
A clustering method, CLASSY, was developed, which alternates maximum likelihood iteration with a procedure for splitting, combining, and eliminating the resulting statistics. The method maximizes the fit of a mixture of normal distributions to the observed first through fourth central moments of the data and produces an estimate of the proportions, means, and covariances in this mixture. The mathematical model which is the basic for CLASSY and the actual operation of the algorithm is described. Data comparing the performances of CLASSY and ISOCLS on simulated and actual LACIE data are presented.
Li, Weizhong
2011-10-12
San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
Li, Weizhong [San Diego Supercomputer Center
2013-01-22
San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
Erny, Guillaume L; Acunha, Tanize; SimĂł, Carolina; Cifuentes, Alejandro; Alves, Arminda
2016-01-15
Various algorithms have been developed to improve the quantity and quality of information that can be extracted from complex datasets obtained using hyphenated mass spectrometric techniques. While different approaches are possible, the key step often consists in arranging the data into a large series of profiles known as extracted ion profiles. Those profiles, similar to mono-dimensional separation profiles, are then processed to detect potential chromatographic peaks. This allows extracting from the dataset a large number of peaks that are characteristics of the compounds that have been separated. However, with mass spectrometry (MS) detection, the response is usually a complex signal whose pattern depends on the analyte, the MS instrument and the ionization method. When converted to ionic profiles, a single separated analyte will have multiple images at different m/z range. In this manuscript we present a hierarchical agglomerative clustering algorithm to group profiles with very similar feature. Each group aims to contain all profiles that are due to the transport and monitoring of a single analyte. Clustering results are then used to generate a 2 dimensional representation, called clusters plot, which allows an in-depth analysis of the MS dataset including the visualization of poorly separated compounds even when their intensity differs by more than two orders of magnitude. The usefulness of this new approach has been validated with data from capillary electrophoresis time of flight mass spectrometry hyphenated via an electrospray ionization. Using a mixture of 17 low molecular endogenous compounds it was verified that ionic profiles belonging to each compounds were correctly clustered even with very low degree of separation (R below 0.03). The approach was also validated using a urine sample. While with the total ion profile 15 peaks could be distinguished, 70 clusters were obtained allowing a much thorough analysis. In this particular example, the total computing took less than 10min. PMID:26711157
El Harchaoui, Nour-Eddine; Ait Kerroum, Mounir; Hammouch, Ahmed; Ouadou, Mohamed; Aboutajdine, Driss
2013-01-01
The analysis and processing of large data are a challenge for researchers. Several approaches have been used to model these complex data, and they are based on some mathematical theories: fuzzy, probabilistic, possibilistic, and evidence theories. In this work, we propose a new unsupervised classification approach that combines the fuzzy and possibilistic theories; our purpose is to overcome the problems of uncertain data in complex systems. We used the membership function of fuzzy c-means (FCM) to initialize the parameters of possibilistic c-means (PCM), in order to solve the problem of coinciding clusters that are generated by PCM and also overcome the weakness of FCM to noise. To validate our approach, we used several validity indexes and we compared them with other conventional classification algorithms: fuzzy c-means, possibilistic c-means, and possibilistic fuzzy c-means. The experiments were realized on different synthetics data sets and real brain MR images. PMID:24489535
El Harchaoui, Nour-Eddine; Ait Kerroum, Mounir; Hammouch, Ahmed; Ouadou, Mohamed; Aboutajdine, Driss
2013-01-01
The analysis and processing of large data are a challenge for researchers. Several approaches have been used to model these complex data, and they are based on some mathematical theories: fuzzy, probabilistic, possibilistic, and evidence theories. In this work, we propose a new unsupervised classification approach that combines the fuzzy and possibilistic theories; our purpose is to overcome the problems of uncertain data in complex systems. We used the membership function of fuzzy c-means (FCM) to initialize the parameters of possibilistic c-means (PCM), in order to solve the problem of coinciding clusters that are generated by PCM and also overcome the weakness of FCM to noise. To validate our approach, we used several validity indexes and we compared them with other conventional classification algorithms: fuzzy c-means, possibilistic c-means, and possibilistic fuzzy c-means. The experiments were realized on different synthetics data sets and real brain MR images. PMID:24489535
NASA Astrophysics Data System (ADS)
Davis, Jack B. A.; Shayeghi, Armin; Horswell, Sarah L.; Johnston, Roy L.
2015-08-01
A new open-source parallel genetic algorithm, the Birmingham parallel genetic algorithm, is introduced for the direct density functional theory global optimisation of metallic nanoparticles. The program utilises a pool genetic algorithm methodology for the efficient use of massively parallel computational resources. The scaling capability of the Birmingham parallel genetic algorithm is demonstrated through its application to the global optimisation of iridium clusters with 10 to 20 atoms, a catalytically important system with interesting size-specific effects. This is the first study of its type on Iridium clusters of this size and the parallel algorithm is shown to be capable of scaling beyond previous size restrictions and accurately characterising the structures of these larger system sizes. By globally optimising the system directly at the density functional level of theory, the code captures the cubic structures commonly found in sub-nanometre sized Ir clusters.A new open-source parallel genetic algorithm, the Birmingham parallel genetic algorithm, is introduced for the direct density functional theory global optimisation of metallic nanoparticles. The program utilises a pool genetic algorithm methodology for the efficient use of massively parallel computational resources. The scaling capability of the Birmingham parallel genetic algorithm is demonstrated through its application to the global optimisation of iridium clusters with 10 to 20 atoms, a catalytically important system with interesting size-specific effects. This is the first study of its type on Iridium clusters of this size and the parallel algorithm is shown to be capable of scaling beyond previous size restrictions and accurately characterising the structures of these larger system sizes. By globally optimising the system directly at the density functional level of theory, the code captures the cubic structures commonly found in sub-nanometre sized Ir clusters. Electronic supplementary information (ESI) available. See DOI: 10.1039/C5NR03774C
Algorithmic Identification for Wings in Butterfly Diagrams.
NASA Astrophysics Data System (ADS)
Illarionov, E. A.; Sokolov, D. D.
2012-12-01
We investigate to what extent the wings of solar butterfly diagrams can be separated without an explicit usage of Hale's polarity law as well as the location of the solar equator. Two algorithms of cluster analysis, namely DBSCAN and C-means, have demonstrated their ability to separate the wings of contemporary butterfly diagrams based on the sunspot group density in the diagram only. Here we generalize the method for continuous tracers, give results concerning the migration velocities and presented clusters for 12 - 20 cycles.
Tsai, Ming-Hui; Huang, Yueh-Min
2014-01-01
Wireless sensor networks (WSNs) have emerged as a promising solution for various applications due to their low cost and easy deployment. Typically, their limited power capability, i.e., battery powered, make WSNs encounter the challenge of extension of network lifetime. Many hierarchical protocols show better ability of energy efficiency in the literature. Besides, data reduction based on the correlation of sensed readings can efficiently reduce the amount of required transmissions. Therefore, we use a sub-clustering procedure based on spatial data correlation to further separate the hierarchical (clustered) architecture of a WSN. The proposed algorithm (2TC-cor) is composed of two procedures: the prediction model construction procedure and the sub-clustering procedure. The energy conservation benefits by the reduced transmissions, which are dependent on the prediction model. Also, the energy can be further conserved because of the representative mechanism of sub-clustering. As presented by simulation results, it shows that 2TC-cor can effectively conserve energy and monitor accurately the environment within an acceptable level. PMID:25412220
Tsai, Ming-Hui; Huang, Yueh-Min
2014-01-01
Wireless sensor networks (WSNs) have emerged as a promising solution for various applications due to their low cost and easy deployment. Typically, their limited power capability, i.e., battery powered, make WSNs encounter the challenge of extension of network lifetime. Many hierarchical protocols show better ability of energy efficiency in the literature. Besides, data reduction based on the correlation of sensed readings can efficiently reduce the amount of required transmissions. Therefore, we use a sub-clustering procedure based on spatial data correlation to further separate the hierarchical (clustered) architecture of a WSN. The proposed algorithm (2TC-cor) is composed of two procedures: the prediction model construction procedure and the sub-clustering procedure. The energy conservation benefits by the reduced transmissions, which are dependent on the prediction model. Also, the energy can be further conserved because of the representative mechanism of sub-clustering. As presented by simulation results, it shows that 2TC-cor can effectively conserve energy and monitor accurately the environment within an acceptable level. PMID:25412220
Aslan, Mikail; Davis, Jack B A; Johnston, Roy L
2016-03-01
The global optimisation of small bimetallic PdCo binary nanoalloys are systematically investigated using the Birmingham Cluster Genetic Algorithm (BCGA). The effect of size and composition on the structures, stability, magnetic and electronic properties including the binding energies, second finite difference energies and mixing energies of Pd-Co binary nanoalloys are discussed. A detailed analysis of Pd-Co structural motifs and segregation effects is also presented. The maximal mixing energy corresponds to Pd atom compositions for which the number of mixed Pd-Co bonds is maximised. Global minimum clusters are distinguished from transition states by vibrational frequency analysis. HOMO-LUMO gap, electric dipole moment and vibrational frequency analyses are made to enable correlation with future experiments. PMID:26872088
NASA Astrophysics Data System (ADS)
Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.
2014-06-01
The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test Î“ coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.
NASA Astrophysics Data System (ADS)
Vrtilek, Saeqa Dil; Boroson, Bram S.; Richards, Joseph
2014-06-01
We have explored recent developments in machine learning algorithms, such as diffusion mapping (Richards et al. 2009) which allow us to identify physically similar clusters independent of prior knowledge. We have successfully used this method to separate out different classes of X-ray binaries and of different spectral states within a given system. Beyond the immediate astronomical application, a strength of our approach is to offer new and useful insight into the vast and rapidly growing multi-dimensional data collections in essentially all fields of investigation, not only the astrophysical ones which form our testbed and the immediate focus of our scientific interest.
Chen, Wei-Chen; Ostrouchov, George; Pugmire, Dave; Prabhat,; Wehner, Michael
2013-01-01
We develop a parallel EM algorithm for multivariate Gaussian mixture models and use it to perform model-based clustering of a large climate data set. Three variants of the EM algorithm are reformulated in parallel and a new variant that is faster is presented. All are implemented using the single program, multiple data (SPMD) programming model, which is able to take advantage of the combined collective memory of large distributed computer architectures to process larger data sets. Displays of the estimated mixture model rather than the data allow us to explore multivariate relationships in a way that scales to arbitrary size data. We study the performance of our methodology on simulated data and apply our methodology to a high resolution climate dataset produced by the community atmosphere model (CAM5). This article has supplementary material online.
Abedini, Mohammad; Moradi, Mohammad H; Hosseinian, S M
2016-03-01
This paper proposes a novel method to address reliability and technical problems of microgrids (MGs) based on designing a number of self-adequate autonomous sub-MGs via adopting MGs clustering thinking. In doing so, a multi-objective optimization problem is developed where power losses reduction, voltage profile improvement and reliability enhancement are considered as the objective functions. To solve the optimization problem a hybrid algorithm, named HS-GA, is provided, based on genetic and harmony search algorithms, and a load flow method is given to model different types of DGs as droop controller. The performance of the proposed method is evaluated in two case studies. The results provide support for the performance of the proposed method. PMID:26767800
NASA Astrophysics Data System (ADS)
Cazade, Pierre-André; Zheng, Wenwei; Prada-Gracia, Diego; Berezovska, Ganna; Rao, Francesco; Clementi, Cecilia; Meuwly, Markus
2015-01-01
The ligand migration network for O2-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.
Cazade, Pierre-André; Zheng, Wenwei; Prada-Gracia, Diego; Berezovska, Ganna; Rao, Francesco; Clementi, Cecilia; Meuwly, Markus
2015-01-14
The ligand migration network for O2-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ. PMID:25591387
Shamsi, Hamed; Ozbek, I Yucel
2015-01-01
This work presents a detailed framework to detect the location of heart sound within the respiratory sound based on temporal fuzzy c-means (TFCM) algorithm. In the proposed method, respiratory sound is first divided into frames and for each frame, the logarithmic energy features are calculated. Then, these features are used to classify the respiratory sound as heart sound (HS containing lung sound) and non-HS (only lung sound) by the TFCM algorithm. The TFCM is the modified version fuzzy c-means (FCM) algorithm. While the FCM algorithm uses only the local information about the current frame, the TFCM algorithm uses the temporal information from both the current and the neighboring frames in decision making. To measure the detection performance of the proposed method, several experiments have been conducted on a database of 24 healthy subjects. The experimental results show that the average false-negative rate values are 0.8 Â± 1.1 and 1.5 Â± 1.4 %, and the normalized area under detection error curves are 0.0145 and 0.0269 for the TFCM method in the low and medium respiratory flow rates, respectively. These average values are significantly lower than those obtained by FCM algorithm and by the other compared methods in the literature, which demonstrates the efficiency of the proposed TFCM algorithm. On the other hand, the average elapsed time of the TFCM for a data with length of 0.2 Â± 0.05 s is 0.2 Â± 0.05 s, which is slightly higher than that of the FCM and lower than those of the other compared methods. PMID:25326867
Yilmaz, Nihat; Inan, Onur; Uzer, Mustafa Serter
2014-05-01
The most important factors that prevent pattern recognition from functioning rapidly and effectively are the noisy and inconsistent data in databases. This article presents a new data preparation method based on clustering algorithms for diagnosis of heart and diabetes diseases. In this method, a new modified K-means Algorithm is used for clustering based data preparation system for the elimination of noisy and inconsistent data and Support Vector Machines is used for classification. This newly developed approach was tested in the diagnosis of heart diseases and diabetes, which are prevalent within society and figure among the leading causes of death. The data sets used in the diagnosis of these diseases are the Statlog (Heart), the SPECT images and the Pima Indians Diabetes data sets obtained from the UCI database. The proposed system achieved 97.87 %, 98.18 %, 96.71 % classification success rates from these data sets. Classification accuracies for these data sets were obtained through using 10-fold cross-validation method. According to the results, the proposed method of performance is highly successful compared to other results attained, and seems very promising for pattern recognition applications. PMID:24737307
piClust: a density based piRNA clustering algorithm.
Jung, Inuk; Park, Jong Chan; Kim, Sun
2014-06-01
Piwi-interacting RNAs (piRNAs) are recently discovered, endogenous small non-coding RNAs. piRNAs protect the genome from invasive transposable elements (TE) and sustain integrity of the genome in germ cell lineages. Small RNA-sequencing data can be used to detect piRNA activations in a cell under a specific condition. However, identification of cell specific piRNA activations requires sophisticated computational methods. As of now, there is only one computational method, proTRAC, to locate activated piRNAs from the sequencing data. proTRAC detects piRNA clusters based on a probabilistic analysis with assumption of a uniform distribution. Unfortunately, we were not able to locate activated piRNAs from our proprietary sequencing data in chicken germ cells using proTRAC. With a careful investigation on data sets, we found that a uniform or any statistical distribution for detecting piRNA clusters may not be assumed. Furthermore, small RNA-seq data contains many different types of RNAs which was not carefully taken into account in previous studies. To improve piRNA cluster identification, we developed piClust that uses a density based clustering approach without assumption of any parametric distribution. In previous studies, it is known that piRNAs exhibit a strong tendency of forming piRNA clusters in syntenic regions of the genome. Thus, the density based clustering approach is effective and robust to the existence of non-piRNAs or noise in the data. In experiments with piRNA data from human, mouse, rat and chicken, piClust was able to detect piRNA clusters from total small RNA-seq data from germ cell lines, while proTRAC was not successful. piClust outperformed proTRAC in terms of sensitivity and running time (up to 200 folds). piClust is currently available as a web service at http://epigenomics.snu.ac.kr/piclustweb. PMID:24656595
Yin, Jiandong; Yang, Jiawen; Guo, Qiyong
2014-01-01
During dynamic susceptibility contrast-magnetic resonance imaging (DSC-MRI), it has been demonstrated that the arterial input function (AIF) can be obtained using fuzzy c-means (FCM) and k-means clustering methods. However, due to the dependence on the initial centers of clusters, both clustering methods have poor reproducibility between the calculation and recalculation steps. To address this problem, the present study developed an alternative clustering technique based on the agglomerative hierarchy (AH) method for AIF determination. The performance of AH method was evaluated using simulated data and clinical data based on comparisons with the two previously demonstrated clustering-based methods in terms of the detection accuracy, calculation reproducibility, and computational complexity. The statistical analysis demonstrated that, at the cost of a significantly longer execution time, AH method obtained AIFs more in line with the expected AIF, and it was perfectly reproducible at different time points. In our opinion, the disadvantage of AH method in terms of the execution time can be alleviated by introducing a professional high-performance workstation. The findings of this study support the feasibility of using AH clustering method for detecting the AIF automatically. PMID:24932638
Kandalla, Krishna; Subramoni, Hari; Vishnu, Abhinav; Panda, Dhabaleswar K.
2010-04-01
Modern high performance computing systems are being increasingly deployed in a hierarchical fashion with multi-core computing platforms forming the base of the hierarchy. These systems are usually comprised of multiple racks, with each rack consisting of a finite number of chassis, with each chassis having multiple compute nodes or blades, based on multi-core architectures. The networks are also hierarchical with multiple levels of switches. Message exchange operations between processes that belong to different racks involve multiple hops across different switches and this directly affects the performance of collective operations. In this paper, we take on the challenges involved in detecting the topology of large scale InfiniBand clusters and leveraging this knowledge to design efficient topology-aware algorithms for collective operations. We also propose a communication model to analyze the communication costs involved in collective operations on large scale supercomputing systems. We have analyzed the performance characteristics of two collectives, MPI_Gather and MPI_Scatter on such systems and we have proposed topology-aware algorithms for these operations. Our experimental results have shown that the proposed algorithms can improve the performance of these collective operations by almost 54% at the micro-benchmark level.
Possibilistic clustering for shape recognition
NASA Technical Reports Server (NTRS)
Keller, James M.; Krishnapuram, Raghu
1992-01-01
Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, we cast the clustering problem into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We constructed an appropriate objective function whose minimum will characterize a good possibilistic partition of the data, and we derived the membership and prototype update equations from necessary conditions for minimization of our criterion function. In this paper, we show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.
A pixel-based color image segmentation using support vector machine and fuzzy C-means.
Wang, Xiang-Yang; Zhang, Xian-Jin; Yang, Hong-Ying; Bu, Juan
2012-09-01
Image segmentation is an important tool in image processing and can serve as an efficient front end to sophisticated algorithms and thereby simplify subsequent processing. In this paper, we present a pixel-based color image segmentation using Support Vector Machine (SVM) and Fuzzy C-Means (FCM). Firstly, the pixel-level color feature and texture feature of the image, which is used as input of the SVM model (classifier), are extracted via the local spatial similarity measure model and Steerable filter. Then, the SVM model (classifier) is trained by using FCM with the extracted pixel-level features. Finally, the color image is segmented with the trained SVM model (classifier). This image segmentation can not only take full advantage of the local information of the color image but also the ability of the SVM classifier. Experimental evidence shows that the proposed method has a very effective computational behavior and effectiveness, and decreases the time and increases the quality of color image segmentation in comparison with the state-of-the-art segmentation methods recently proposed in the literature. PMID:22647833
NASA Astrophysics Data System (ADS)
Sun, Xiaoming; Wang, Guiling
2008-09-01
This study explores the applicability of data-driven clustering analysis in predicting vegetation distribution over two continents where water is an important controlling factor for vegetation growth, South America and Africa, and compares the ability of clustering analysis with that of a physically based dynamic vegetation model to predict vegetation distribution. A clustering analysis algorithm based on the genetic-algorithm-based K-means is tested, with the number of clusters determined a priori according to the primary plant functional types observed to exist in the study domain. The most important variables upon which the clustering analysis is based include available water, its seasonality, and evaporative demand. The dynamic vegetation model used is the Community Land Model version 3 coupled with a Dynamic Global Vegetation Model (CLM3-DGVM) with modifications targeted to address some known biases of the model. Results from both the clustering analysis and the modified CLM3-DGVM are compared against observations derived from the Moderate Resolution Imaging Spectroradiometer (MODIS). Both methods reasonably reproduced the general pattern of dominant plant functional type distribution. There is no clear winner between the two methods, as the DGVM outperforms the clustering analysis approach in some aspects and is outperformed in others. It is therefore suggested that clustering analysis can be a useful tool in biogeography estimation, although it cannot be used in mechanistic studies as the process-based DGVMs are.
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2016-03-01
We present new versions of sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. In this update, we add the method of GPU-based cluster-labeling algorithm without the use of conventional iteration (Komura, 2015) to those programs. For high-precision calculations, we also add a random-number generator in the cuRAND library. Moreover, we fix several bugs and remove the extra usage of shared memory in the kernel functions.
Solution of facility location problem in Turkey by using fuzzy C-means method
NASA Astrophysics Data System (ADS)
Kocakaya, Mustafa Nabi; Türkak?n, Osman Hürol
2013-10-01
Facility location problem is one of most frequent problems, which is encountered while deciding facility places such as factories, warehouses. There are various techniques developed to solve facility location problems. Fuzzy c-means method is one of the most usable techniques between them. In this study, optimum warehouse location for natural stone mines is found by using fuzzy c-means method.
NASA Astrophysics Data System (ADS)
Nomura, Yusuke; Sakai, Shiro; Arita, Ryotaro
2014-05-01
We implement a multiorbital cluster dynamical mean-field theory (DMFT) by improving a sample update algorithm in the continuous-time quantum Monte Carlo method based on the interaction expansion. The proposed sampling scheme for the spin-flip and pair-hopping interactions in the two-orbital systems mitigates the sign problem, giving an efficient way to deal with these interactions. In particular, in the single-site DMFT, we see that the negative signs vanish. We apply the method to the two-dimensional two-orbital Hubbard model at half-filling, where we take into account the short-range spatial correlation effects within a four-site cluster. We show that, compared to the single-site DMFT results, the critical interaction value for the metal-insulator transition decreases and that the effects of the spin-flip and pair-hopping terms are less significant in the parameter region we have studied. The present method provides a firm starting point for the study of intersite correlations in multiorbital systems. It also has a wide applicable scope in terms of realistic calculations in conjunction with density functional theory.
Delineation and quantitation of brain lesions by fuzzy clustering in positron emission tomography.
Boudraa, A E; Champier, J; Cinotti, L; Bordet, J C; Lavenne, F; Mallet, J J
1996-01-01
In this study, we investigate the application of the fuzzy clustering to the anatomical localization and quantitation of brain lesions in Positron Emission Tomography (PET) images. The method is based on the Fuzzy C-Means (FCM) algorithm. The algorithm segments the PET image data points into a given number of clusters. Each cluster is an homogeneous region of the brain (e.g. tumor). A feature vector is assigned to a cluster which has the highest membership degree. Having the label affected by the FCM algorithm to a cluster, one may easily compute the corresponding spatial localization, area and perimeter. Studies concerning the evolution of a tumor after different treatments in two patients are presented. PMID:8891420
Muhammad, Durreshahwar; Foret, Jessica; Brady, Siobhan M.; Ducoste, Joel J.; Tuck, James; Long, Terri A.; Williams, Cranos
2015-01-01
Time course transcriptome datasets are commonly used to predict key gene regulators associated with stress responses and to explore gene functionality. Techniques developed to extract causal relationships between genes from high throughput time course expression data are limited by low signal levels coupled with noise and sparseness in time points. We deal with these limitations by proposing the Cluster and Differential Alignment Algorithm (CDAA). This algorithm was designed to process transcriptome data by first grouping genes based on stages of activity and then using similarities in gene expression to predict influential connections between individual genes. Regulatory relationships are assigned based on pairwise alignment scores generated using the expression patterns of two genes and some inferred delay between the regulator and the observed activity of the target. We applied the CDAA to an iron deficiency time course microarray dataset to identify regulators that influence 7 target transcription factors known to participate in the Arabidopsis thaliana iron deficiency response. The algorithm predicted that 7 regulators previously unlinked to iron homeostasis influence the expression of these known transcription factors. We validated over half of predicted influential relationships using qRT-PCR expression analysis in mutant backgrounds. One predicted regulator-target relationship was shown to be a direct binding interaction according to yeast one-hybrid (Y1H) analysis. These results serve as a proof of concept emphasizing the utility of the CDAA for identifying unknown or missing nodes in regulatory cascades, providing the fundamental knowledge needed for constructing predictive gene regulatory networks. We propose that this tool can be used successfully for similar time course datasets to extract additional information and infer reliable regulatory connections for individual genes. PMID:26317202
Nagwani, Naresh Kumar; Deo, Shirish V.
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining
NASA Astrophysics Data System (ADS)
Tsiafoulis, S.; Zorkadis, V. C.; Karras, D. A.
The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The â„“-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.
A binned clustering algorithm to detect high-Z material using cosmic muons
NASA Astrophysics Data System (ADS)
Thomay, C.; Velthuis, J. J.; Baesso, P.; Cussans, D.; Morris, P. A. W.; Steer, C.; Burns, J.; Quillin, S.; Stapleton, M.
2013-10-01
We present a novel approach to the detection of special nuclear material using cosmic rays. Muon Scattering Tomography (MST) is a method for using cosmic muons to scan cargo containers and vehicles for special nuclear material. Cosmic muons are abundant, highly penetrating, not harmful for organic tissue, cannot be screened against, and can easily be detected, which makes them highly suited to the use of cargo scanning. Muons undergo multiple Coulomb scattering when passing through material, and the amount of scattering is roughly proportional to the square of the atomic number Z of the material. By reconstructing incoming and outgoing tracks, we can obtain variables to identify high-Z material. In a real life application, this has to happen on a timescale of 1 min and thus with small numbers of muons. We have built a detector system using resistive plate chambers (RPCs): 12 layers of RPCs allow for the readout of 6 x and 6 y positions, by which we can reconstruct incoming and outgoing tracks. In this work we detail the performance of an algorithm by which we separate high-Z targets from low-Z background, both for real data from our prototype setup and for MC simulation of a cargo container-sized setup. (c) British Crown Owned Copyright 2013/AWE
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.
Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le
2013-01-01
Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches. PMID:24091399
Thimmaiah, Tim; Voje, William E; Carothers, James M
2015-01-01
With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts. PMID:25487092
Jiang, Joe-Air; Chen, Chia-Pang; Chuang, Cheng-Long; Lin, Tzu-Shiang; Tseng, Chwan-Lu; Yang, En-Cheng; Wang, Yung-Chung
2009-01-01
Deployment of wireless sensor networks (WSNs) has drawn much attention in recent years. Given the limited energy for sensor nodes, it is critical to implement WSNs with energy efficiency designs. Sensing coverage in networks, on the other hand, may degrade gradually over time after WSNs are activated. For mission-critical applications, therefore, energy-efficient coverage control should be taken into consideration to support the quality of service (QoS) of WSNs. Usually, coverage-controlling strategies present some challenging problems: (1) resolving the conflicts while determining which nodes should be turned off to conserve energy; (2) designing an optimal wake-up scheme that avoids awakening more nodes than necessary. In this paper, we implement an energy-efficient coverage control in cluster-based WSNs using a Memetic Algorithm (MA)-based approach, entitled CoCMA, to resolve the challenging problems. The CoCMA contains two optimization strategies: a MA-based schedule for sensor nodes and a wake-up scheme, which are responsible to prolong the network lifetime while maintaining coverage preservation. The MA-based schedule is applied to a given WSN to avoid unnecessary energy consumption caused by the redundant nodes. During the network operation, the wake-up scheme awakens sleeping sensor nodes to recover coverage hole caused by dead nodes. The performance evaluation of the proposed CoCMA was conducted on a cluster-based WSN (CWSN) under either a random or a uniform deployment of sensor nodes. Simulation results show that the performance yielded by the combination of MA and wake-up scheme is better than that in some existing approaches. Furthermore, CoCMA is able to activate fewer sensor nodes to monitor the required sensing area. PMID:22408561
A new method based on Dempster-Shafer theory and fuzzy c-means for brain MRI segmentation
NASA Astrophysics Data System (ADS)
Liu, Jie; Lu, Xi; Li, Yunpeng; Chen, Xiaowu; Deng, Yong
2015-10-01
In this paper, a new method is proposed to decrease sensitiveness to motion noise and uncertainty in magnetic resonance imaging (MRI) segmentation especially when only one brain image is available. The method is approached with considering spatial neighborhood information by fusing the information of pixels with their neighbors with Dempster-Shafer (DS) theory. The basic probability assignment (BPA) of each single hypothesis is obtained from the membership function of applying fuzzy c-means (FCM) clustering to the gray levels of the MRI. Then multiple hypotheses are generated according to the single hypothesis. Then we update the objective pixel’s BPA by fusing the BPA of the objective pixel and those of its neighbors to get the final result. Some examples in MRI segmentation are demonstrated at the end of the paper, in which our method is compared with some previous methods. The results show that the proposed method is more effective than other methods in motion-blurred MRI segmentation.
Polat, Kemal
2012-08-01
In this paper, attribute weighting method based on the cluster centers with aim of increasing the discrimination between classes has been proposed and applied to nonlinear separable datasets including two medical datasets (mammographic mass dataset and bupa liver disorders dataset) and 2-D spiral dataset. The goals of this method are to gather the data points near to cluster center all together to transform from nonlinear separable datasets to linear separable dataset. As clustering algorithm, k-means clustering, fuzzy c-means clustering, and subtractive clustering have been used. The proposed attribute weighting methods are k-means clustering based attribute weighting (KMCBAW), fuzzy c-means clustering based attribute weighting (FCMCBAW), and subtractive clustering based attribute weighting (SCBAW) and used prior to classifier algorithms including C4.5 decision tree and adaptive neuro-fuzzy inference system (ANFIS). To evaluate the proposed method, the recall, precision value, true negative rate (TNR), G-mean1, G-mean2, f-measure, and classification accuracy have been used. The results have shown that the best attribute weighting method was the subtractive clustering based attribute weighting with respect to classification performance in the classification of three used datasets. PMID:21611787
Risk Mapping of Cutaneous Leishmaniasis via a Fuzzy C Means-based Neuro-Fuzzy Inference System
NASA Astrophysics Data System (ADS)
Akhavan, P.; Karimi, M.; Pahlavani, P.
2014-10-01
Finding pathogenic factors and how they are spread in the environment has become a global demand, recently. Cutaneous Leishmaniasis (CL) created by Leishmania is a special parasitic disease which can be passed on to human through phlebotomus of vector-born. Studies show that economic situation, cultural issues, as well as environmental and ecological conditions can affect the prevalence of this disease. In this study, Data Mining is utilized in order to predict CL prevalence rate and obtain a risk map. This case is based on effective environmental parameters on CL and a Neuro-Fuzzy system was also used. Learning capacity of Neuro-Fuzzy systems in neural network on one hand and reasoning power of fuzzy systems on the other, make it very efficient to use. In this research, in order to predict CL prevalence rate, an adaptive Neuro-fuzzy inference system with fuzzy inference structure of fuzzy C Means clustering was applied to determine the initial membership functions. Regarding to high incidence of CL in Ilam province, counties of Ilam, Mehran, and Dehloran have been examined and evaluated. The CL prevalence rate was predicted in 2012 by providing effective environmental map and topography properties including temperature, moisture, annual, rainfall, vegetation and elevation. Results indicate that the model precision with fuzzy C Means clustering structure rises acceptable RMSE values of both training and checking data and support our analyses. Using the proposed data mining technology, the pattern of disease spatial distribution and vulnerable areas become identifiable and the map can be used by experts and decision makers of public health as a useful tool in management and optimal decision-making.
NASA Technical Reports Server (NTRS)
Werth, L. F. (Principal Investigator)
1981-01-01
Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,000 color infrared aerial photography. Confusion matrices for the two clustering algorithms were generated and studied to determine which is most applicable to forest and rangeland inventories in future projects. In an unsupervised mode, ISOCLS requires many trial-and-error runs to find the proper parameters to separate desired information classes. CLASSY tells more in a single run concerning the classes that can be separated, shows more promise for forest stratification than ISOCLS, and shows more promise for consistency. One major drawback to CLASSY is that important forest and range classes that are smaller than a minimum cluster size will be combined with other classes. The algorithm requires so much computer storage that only data sets as small as a quadrangle can be used at one time.
A graph-based watershed merging using fuzzy C-means and simulated annealing for image segmentation
NASA Astrophysics Data System (ADS)
Vadiveloo, Mogana; Abdullah, Rosni; Rajeswari, Mandava
2015-12-01
In this paper, we have addressed the issue of over-segmented regions produced in watershed by merging the regions using global feature. The global feature information is obtained from clustering the image in its feature space using Fuzzy C-Means (FCM) clustering. The over-segmented regions produced by performing watershed on the gradient of the image are then mapped to this global information in the feature space. Further to this, the global feature information is optimized using Simulated Annealing (SA). The optimal global feature information is used to derive the similarity criterion to merge the over-segmented watershed regions which are represented by the region adjacency graph (RAG). The proposed method has been tested on digital brain phantom simulated dataset to segment white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF) soft tissues regions. The experiments showed that the proposed method performs statistically better, with average of 95.242% regions are merged, than the immersion watershed and average accuracy improvement of 8.850% in comparison with RAG-based immersion watershed merging using global and local features.
NASA Astrophysics Data System (ADS)
Nandy, Subhajit; Chaudhury, Pinaki; Bhattacharyya, S. P.
2010-06-01
We present a genetic algorithm based investigation of structural fragmentation in dicationic noble gas clusters, Arn+2, Krn+2, and Xen+2, where n denotes the size of the cluster. Dications are predicted to be stable above a threshold size of the cluster when positive charges are assumed to remain localized on two noble gas atoms and the Lennard-Jones potential along with bare Coulomb and ion-induced dipole interactions are taken into account for describing the potential energy surface. Our cutoff values are close to those obtained experimentally [P. Scheier and T. D. Mark, J. Chem. Phys. 11, 3056 (1987)] and theoretically [J. G. Gay and B. J. Berne, Phys. Rev. Lett. 49, 194 (1982)]. When the charges are allowed to be equally distributed over four noble gas atoms in the cluster and the nonpolarization interaction terms are allowed to remain unchanged, our method successfully identifies the size threshold for stability as well as the nature of the channels of dissociation as function of cluster size. In Arn2+, for example, fissionlike fragmentation is predicted for n =55 while for n =43, the predicted outcome is nonfission fragmentation in complete agreement with earlier work [Golberg et al., J. Chem. Phys. 100, 8277 (1994)].
NASA Astrophysics Data System (ADS)
Heard, Christopher J.; Heiles, Sven; Vajda, Stefan; Johnston, Roy L.
2014-09-01
The novel surface mode of the Birmingham Cluster Genetic Algorithm (S-BCGA) is employed for the global optimisation of noble metal tetramers upon an MgO (100) substrate at the GGA-DFT level of theory. The effect of element identity and alloying in surface-bound neutral subnanometre clusters is determined by energetic comparison between all compositions of PdnAg(4-n) and PdnPt(4-n). While the binding strengths to the surface increase in the order Pt > Pd > Ag, the excess energy profiles suggest a preference for mixed clusters for both cases. The binding of CO is also modelled, showing that the adsorption site can be predicted solely by electrophilicity. Comparison to CO binding on a single metal atom shows a reversal of the 5?-d activation process for clusters, weakening the cluster-surface interaction on CO adsorption. Charge localisation determines homotop, CO binding and surface site preferences. The electronic behaviour, which is intermediate between molecular and metallic particles allows for tunable features in the subnanometre size range.
Sensitivity evaluation of dynamic speckle activity measurements using clustering methods
Etchepareborda, Pablo; Federico, Alejandro; Kaufmann, Guillermo H.
2010-07-01
We evaluate and compare the use of competitive neural networks, self-organizing maps, the expectation-maximization algorithm, K-means, and fuzzy C-means techniques as partitional clustering methods, when the sensitivity of the activity measurement of dynamic speckle images needs to be improved. The temporal history of the acquired intensity generated by each pixel is analyzed in a wavelet decomposition framework, and it is shown that the mean energy of its corresponding wavelet coefficients provides a suited feature space for clustering purposes. The sensitivity obtained by using the evaluated clustering techniques is also compared with the well-known methods of Konishi-Fujii, weighted generalized differences, and wavelet entropy. The performance of the partitional clustering approach is evaluated using simulated dynamic speckle patterns and also experimental data.
NASA Astrophysics Data System (ADS)
Wagstaff, Kiri L.
2012-03-01
On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained clustering, in which some partial information about item assignments or other components of the resulting output are already known and must be accommodated by the solution. Some algorithms seek a partition of the data set into distinct clusters, while others build a hierarchy of nested clusters that can capture taxonomic relationships. Some produce a single optimal solution, while others construct a probabilistic model of cluster membership. More formally, clustering algorithms operate on a data set X composed of items represented by one or more features (dimensions). These could include physical location, such as right ascension and declination, as well as other properties such as brightness, color, temporal change, size, texture, and so on. Let D be the number of dimensions used to represent each item, xi ? RD. The clustering goal is to produce an organization P of the items in X that optimizes an objective function f : P -> R, which quantifies the quality of solution P. Often f is defined so as to maximize similarity within a cluster and minimize similarity between clusters. To that end, many algorithms make use of a measure d : X x X -> R of the distance between two items. A partitioning algorithm produces a set of clusters P = {c1, . . . , ck} such that the clusters are nonoverlapping (c_i intersected with c_j = empty set, i != j) subsets of the data set (Union_i c_i=X). Hierarchical algorithms produce a series of partitions P = {p1, . . . , pn }. For a complete hierarchy, the number of partitions n’= n, the number of items in the data set; the top partition is a single cluster containing all items, and the bottom partition contains n clusters, each containing a single item. For model-based clustering, each cluster c_j is represented by a model m_j , such as the cluster center or a Gaussian distribution. The wide array of available clustering algorithms may seem bewildering, and covering all of them is beyond the scope of this chapter. Choosing among them for a particular application involves considerations of the kind of data being analyzed, algorithm runtime efficiency, and how much prior knowledge is available about the problem domain, which can dictate the nature of clusters sought. Fundamentally, the clustering method and its representations of clusters carries with it a definition of what a cluster is, and it is important that this be aligned with the analysis goals for the problem at hand. In this chapter, I emphasize this point by identifying for each algorithm the cluster representation as a model, m_j , even for algorithms that are not typically thought of as creating a “model.” This chapter surveys a basic collection of clustering methods useful to any practitioner who is interested in applying clustering to a new data set. The algorithms include k-means (Section 25.2), EM (Section 25.3), agglomerative (Section 25.4), and spectral (Section 25.5) clustering, with side mentions of variants such as kernel k-means and divisive clustering. The chapter also discusses each algorithm’s strengths and limitations and provides pointers to additional in-depth reading for each subject. Section 25.6 discusses methods for incorporating domain knowledge into the clustering process. This chapter concludes with a brief survey of interesting applications of clustering methods to astronomy data (Section 25.7). The chapter begins with k-means because it is both generally accessible and so widely used that understanding it can be considered a necessary prerequisite for further work in the field. EM can be viewed as a more sophisticated version of k-means that uses a generative model for each cluster and probabilistic item assignments. Agglomerative clustering is the most basic form of hierarchical clustering and provides a basis for further exploration of algorithms in that vein. Spectral clustering permits a departure from feature-vector-based clustering and can operate on data sets instead represented as affinity, or similarity matrices—cases in which only pairwise information is known. The list of algorithms covered in this chapter is representative of those most commonly in use, but it is by no means comprehensive. There is an extensive collection of existing books on clustering that provide additional background and depth. Three early books that remain useful today are Anderberg’s Cluster Analysis for Applications [3], Hartigan’s Clustering Algorithms [25], and Gordon’s Classification [22]. The latter covers basics on similarity measures, partitioning and hierarchical algorithms, fuzzy clustering, overlapping clustering, conceptual clustering, validations methods, and visualization or data reduction techniques such as principal components analysis (PCA),multidimensional scaling, and self-organizing maps. More recently, Jain et al. provided a useful and informative survey [27] of a variety of different clustering algorithms, including those mentioned here as well as fuzzy, graph-theoretic, and evolutionary clustering. Everitt’s Cluster Analysis [19] provides a modern overview of algorithms, similarity measures, and evaluation methods.
Application of Fuzzy Grade-of-Membership Clustering to Analysis of Remote Sensing Data.
NASA Astrophysics Data System (ADS)
Talbot, Lisa M.; Talbot, Bryan G.; Peterson, Robert E.; Tolley, H. Dennis; Mecham, Harvey D.
1999-01-01
A fuzzy grade-of-membership (GoM) clustering algorithm is applied to analysis of remote sensing data, in particular, the type of data used in climatic classification. The methodology is applied to a cloud product data subset derived from NASA's International Satellite Cloud Climatology Project, which includes remotely sensed global monthly average surface temperature and precipitation data for land and coastal regions for the year 1984. GoM partitions for this case are similar to those of vector quantization and fuzzy c-means clustering algorithms, which is significant given the striking differences between the algorithms. The GoM clustering approach is shown to provide an alternative means of interpreting large heterogeneous datasets for exploratory analysis, which broadens the application base by admitting categorical data.
NASA Astrophysics Data System (ADS)
Bhattacharya, Saswata; Sonin, Benjamin H.; Jumonville, Christopher J.; Ghiringhelli, Luca M.; Marom, Noa
2015-06-01
In order to design clusters with desired properties, we have implemented a suite of genetic algorithms tailored to optimize for low total energy, high vertical electron affinity (VEA), and low vertical ionization potential (VIP). Applied to (TiO2) n clusters, the property-based optimization reveals the underlying structure-property relations and the structural features that may serve as active sites for catalysis. High VEA and low VIP are correlated with the presence of several dangling-O atoms and their proximity, respectively. We show that the electronic properties of (TiO2) n up to n =20 correlate more strongly with the presence of these structural features than with size.
Basalto, Nicolas; Bellotti, Roberto; De Carlo, Francesco; Facchi, Paolo; Pantaleo, Ester; Pascazio, Saverio
2008-10-01
A clustering algorithm based on the Hausdorff distance is analyzed and compared to the single, complete, and average linkage algorithms. The four clustering procedures are applied to a toy example and to the time series of financial data. The dendrograms are scrutinized and their features compared. The Hausdorff linkage relies on firm mathematical grounds and turns out to be very effective when one has to discriminate among complex structures. PMID:18999498
Wu, Shandong; Weinstein, Susan P.; Conant, Emily F.; Kontos, Despina
2013-12-15
Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's pairedt-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readers’ manual segmentation, the proposed FCM-Atlas method achieves a correlation ofr = 0.92 for FGT% and r = 0.93 for |FGT|, and the automated segmentation is not statistically significantly different (p = 0.46 for FGT% and p = 0.55 for |FGT|). The bilateral correlation between left breasts and right breasts for the FGT% is 0.94, 0.92, and 0.95 for reader 1, reader 2, and the FCM-Atlas, respectively; likewise, for the |FGT|, it is 0.92, 0.92, and 0.93, respectively. For the spatial segmentation agreement, the automated algorithm achieves a DSC of 0.69 ± 0.1 when compared to reader 1 and 0.61 ± 0.1 for reader 2, respectively, while the DSC between the two readers’ manual segmentation is 0.67 ± 0.15. Additional robustness analysis shows that the segmentation performance of the authors' method is stable both with respect to selecting different cases and to varying the number of cases needed to construct the prior probability atlas. The authors' results also show that the proposed FCM-Atlas method outperforms the commonly used two-cluster FCM-alone method. The authors' method runs at ?5 min for each 3D bilateral MR scan (56 slices) for computing the FGT% and |FGT|, compared to ?55 min needed for manual segmentation for the same purpose. Conclusions: The authors' method achieves robust segmentation and can serve as an efficient tool for processing large clinical datasets for quantifying the fibroglandular tissue content in breast MRI. It holds a great potential to support clinical applications in the future including breast cancer risk assessment.
Wu, Shandong; Weinstein, Susan P.; Conant, Emily F.; Kontos, Despina
2013-12-15
Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's pairedt-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readersâ€™ manual segmentation, the proposed FCM-Atlas method achieves a correlation ofr = 0.92 for FGT% and r = 0.93 for |FGT|, and the automated segmentation is not statistically significantly different (p = 0.46 for FGT% and p = 0.55 for |FGT|). The bilateral correlation between left breasts and right breasts for the FGT% is 0.94, 0.92, and 0.95 for reader 1, reader 2, and the FCM-Atlas, respectively; likewise, for the |FGT|, it is 0.92, 0.92, and 0.93, respectively. For the spatial segmentation agreement, the automated algorithm achieves a DSC of 0.69 Â± 0.1 when compared to reader 1 and 0.61 Â± 0.1 for reader 2, respectively, while the DSC between the two readersâ€™ manual segmentation is 0.67 Â± 0.15. Additional robustness analysis shows that the segmentation performance of the authors' method is stable both with respect to selecting different cases and to varying the number of cases needed to construct the prior probability atlas. The authors' results also show that the proposed FCM-Atlas method outperforms the commonly used two-cluster FCM-alone method. The authors' method runs at âĽ5 min for each 3D bilateral MR scan (56 slices) for computing the FGT% and |FGT|, compared to âĽ55 min needed for manual segmentation for the same purpose. Conclusions: The authors' method achieves robust segmentation and can serve as an efficient tool for processing large clinical datasets for quantifying the fibroglandular tissue content in breast MRI. It holds a great potential to support clinical applications in the future including breast cancer risk assessment.
Shenvi, Neil; van Aggelen, Helen; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David
2013-08-01
Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r(4)), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T). PMID:23927246
NASA Astrophysics Data System (ADS)
Shenvi, Neil; van Aggelen, Helen; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David
2013-08-01
Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r4), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T).
A stable and unsupervised Fuzzy C-Means for data classification
NASA Astrophysics Data System (ADS)
Taher, Akar; Chehdi, Kacem; Cariou, Claude
2015-04-01
In this paper a stable and unsupervised version of FCM algorithm named FCMO is presented. The originality of the proposed FCMO algorithm relies: i) on the usage of an adaptive incremental technique to initialize the class centres that calls into question the intermediate initializations; this technique renders the algorithm stable and deterministic, and the classification results do not vary from a run to another, and ii) on the unsupervised evaluation criteria of the intermediate classification result to estimate the optimal number of classes; this makes the algorithm unsupervised. The efficiency of this optimized version of FCM is shown through some experimental results for its stability and its correct class number estimation.
Cazade, Pierre-André; Berezovska, Ganna; Meuwly, Markus; Zheng, Wenwei; Clementi, Cecilia; Prada-Gracia, Diego; Rao, Francesco
2015-01-14
The ligand migration network for O{sub 2}–diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k–means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k–means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.
Naim, Iftekhar; Datta, Suprakash; Rebhahn, Jonathan; Cavenaugh, James S; Mosmann, Tim R; Sharma, Gaurav
2014-05-01
We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems. PMID:24677621
Naim, Iftekhar; Datta, Suprakash; Rebhahn, Jonathan; Cavenaugh, James S; Mosmann, Tim R; Sharma, Gaurav
2014-01-01
We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems. Â© 2014 The Authors. Published by Wiley Periodicals Inc. PMID:24677621
Baldauf, Tobias; Smith, Robert E.; Seljak, Uros; Mandelbaum, Rachel
2010-03-15
The clustering of matter on cosmological scales is an essential probe for studying the physical origin and composition of our Universe. To date, most of the direct studies have focused on shear-shear weak lensing correlations, but it is also possible to extract the dark matter clustering by combining galaxy-clustering and galaxy-galaxy-lensing measurements. In order to extract the required information, one must relate the observable galaxy distribution to the underlying dark matter distribution. In this study we develop in detail a method that can constrain the dark matter correlation function from galaxy clustering and galaxy-galaxy-lensing measurements, by focusing on the correlation coefficient between the galaxy and matter overdensity fields. Our goal is to develop an estimator that maximally correlates the two. To generate a mock galaxy catalogue for testing purposes, we use the halo occupation distribution approach applied to a large ensemble of N-body simulations to model preexisting SDSS luminous red galaxy sample observations. Using this mock catalogue, we show that a direct comparison between the excess surface mass density measured by lensing and its corresponding galaxy clustering quantity is not optimal. We develop a new statistic that suppresses the small-scale contributions to these observations and show that this new statistic leads to a cross-correlation coefficient that is within a few percent of unity down to 5h{sup -1} Mpc. Furthermore, the residual incoherence between the galaxy and matter fields can be explained using a theoretical model for scale-dependent galaxy bias, giving us a final estimator that is unbiased to within 1%, so that we can reconstruct the dark matter clustering power spectrum at this accuracy up to k{approx}1h Mpc{sup -1}. We also perform a comprehensive study of other physical effects that can affect the analysis, such as redshift space distortions and differences in radial windows between galaxy clustering and weak lensing observations. We apply the method to a range of cosmological models and explicitly show the viability of our new statistic to distinguish between cosmological models.
An improved FCM medical image segmentation algorithm based on MMTD.
Zhou, Ningning; Yang, Tingting; Zhang, Shaobai
2014-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) is one of the popular clustering algorithms for medical image segmentation. But FCM is highly vulnerable to noise due to not considering the spatial information in image segmentation. This paper introduces medium mathematics system which is employed to process fuzzy information for image segmentation. It establishes the medium similarity measure based on the measure of medium truth degree (MMTD) and uses the correlation of the pixel and its neighbors to define the medium membership function. An improved FCM medical image segmentation algorithm based on MMTD which takes some spatial features into account is proposed in this paper. The experimental results show that the proposed algorithm is more antinoise than the standard FCM, with more certainty and less fuzziness. This will lead to its practicable and effective applications in medical image segmentation. PMID:24648852
Krejci, Adam; Hupp, Ted R.; Lexa, Matej; Vojtesek, Borivoj; Muller, Petr
2016-01-01
Motivation: Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins’ surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions. Results: The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible. Availability and implementation: Hammock is published under GNU GPL v. 3 license and is freely available as a standalone program (from http://www.recamo.cz/en/software/hammock-cluster-peptides/) or as a tool for the Galaxy toolbox (from https://toolshed.g2.bx.psu.edu/view/hammock/hammock). The source code can be downloaded from https://github.com/hammock-dev/hammock/releases. Contact: muller@mou.cz Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26342231
Information Clustering Based on Fuzzy Multisets.
ERIC Educational Resources Information Center
Miyamoto, Sadaaki
2003-01-01
Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.â€¦
Muster: Massively Scalable Clustering
Energy Science and Technology Software Center (ESTSC)
2010-05-20
Muster is a framework for scalable cluster analysis. It includes implementations of classic K-Medoids partitioning algorithms, as well as infrastructure for making these algorithms run scalably on very large systems. In particular, Muster contains algorithms such as CAPEK (described in reference 1) that are capable of clustering highly distributed data sets in-place on a hundred thousand or more processes.
Adaptive fuzzy leader clustering of complex data sets in pattern recognition
NASA Technical Reports Server (NTRS)
Newton, Scott C.; Pemmaraju, Surya; Mitra, Sunanda
1992-01-01
A modular, unsupervised neural network architecture for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns on-line in a stable and efficient manner. The initial classification is performed in two stages: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from fuzzy C-means system equations for the centroids and the membership values. The AFLC algorithm is applied to the Anderson Iris data and laser-luminescent fingerprint image data. It is concluded that the AFLC algorithm successfully classifies features extracted from real data, discrete or continuous.
Bio Inspired Swarm Algorithm for Tumor Detection in Digital Mammogram
NASA Astrophysics Data System (ADS)
Dheeba, J.; Selvi, Tamil
Microcalcification clusters in mammograms is the significant early sign of breast cancer. Individual clusters are difficult to detect and hence an automatic computer aided mechanism will help the radiologist in detecting the microcalcification clusters in an easy and efficient way. This paper presents a new classification approach for detection of microcalcification in digital mammogram using particle swarm optimization algorithm (PSO) based clustering technique. Fuzzy C-means clustering technique, well defined for clustering data sets are used in combination with the PSO. We adopt the particle swarm optimization to search the cluster center in the arbitrary data set automatically. PSO can search the best solution from the probability option of the Social-only model and Cognition-only model. This method is quite simple and valid, and it can avoid the minimum local value. The proposed classification approach is applied to a database of 322 dense mammographic images, originating from the MIAS database. Results shows that the proposed PSO-FCM approach gives better detection performance compared to conventional approaches.
Clustering recommendations to compute agent reputation
NASA Astrophysics Data System (ADS)
Bedi, Punam; Kaur, Harmeet
2005-03-01
Traditional centralized approaches to security are difficult to apply to multi-agent systems which are used nowadays in e-commerce applications. Developing a notion of trust that is based on the reputation of an agent can provide a softer notion of security that is sufficient for many multi-agent applications. Our paper proposes a mechanism for computing reputation of the trustee agent for use by the trustier agent. The trustier agent computes the reputation based on its own experience as well as the experience the peer agents have with the trustee agents. The trustier agents intentionally interact with the peer agents to get their experience information in the form of recommendations. We have also considered the case of unintentional encounters between the referee agents and the trustee agent, which can be directly between them or indirectly through a set of interacting agents. The clustering is done to filter off the noise in the recommendations in the form of outliers. The trustier agent clusters the recommendations received from referee agents on the basis of the distances between recommendations using the hierarchical agglomerative method. The dendogram hence obtained is cut at the required similarity level which restricts the maximum distance between any two recommendations within a cluster. The cluster with maximum number of elements denotes the views of the majority of recommenders. The center of this cluster represents the reputation of the trustee agent which can be computed using c-means algorithm.
Ghorbanzadeh, Leila; Torshabi, Ahmad Esmaili; Nabipour, Jamshid Soltani; Arbatan, Moslem Ahmadi
2016-04-01
In image guided radiotherapy, in order to reach a prescribed uniform dose in dynamic tumors at thorax region while minimizing the amount of additional dose received by the surrounding healthy tissues, tumor motion must be tracked in real-time. Several correlation models have been proposed in recent years to provide tumor position information as a function of time in radiotherapy with external surrogates. However, developing an accurate correlation model is still a challenge. In this study, we proposed an adaptive neuro-fuzzy based correlation model that employs several data clustering algorithms for antecedent parameters construction to avoid over-fitting and to achieve an appropriate performance in tumor motion tracking compared with the conventional models. To begin, a comparative assessment is done between seven nuero-fuzzy correlation models each constructed using a unique data clustering algorithm. Then, each of the constructed models are combined within an adaptive sevenfold synthetic model since our tumor motion database has high degrees of variability and that each model has its intrinsic properties at motion tracking. In the proposed sevenfold synthetic model, best model is selected adaptively at pre-treatment. The model also updates the steps for each patient using an automatic model selectivity subroutine. We tested the efficacy of the proposed synthetic model on twenty patients (divided equally into two control and worst groups) treated with CyberKnife synchrony system. Compared to Cyberknife model, the proposed synthetic model resulted in 61.2% and 49.3% reduction in tumor tracking error in worst and control group, respectively. These results suggest that the proposed model selection program in our synthetic neuro-fuzzy model can significantly reduce tumor tracking errors. Numerical assessments confirmed that the proposed synthetic model is able to track tumor motion in real time with high accuracy during treatment. PMID:25765021
NASA Astrophysics Data System (ADS)
Turan, Muhammed K.; Sehirli, Eftal; Elen, Abdullah; Karas, Ismail R.
2015-07-01
Gel electrophoresis (GE) is one of the most used method to separate DNA, RNA, protein molecules according to size, weight and quantity parameters in many areas such as genetics, molecular biology, biochemistry, microbiology. The main way to separate each molecule is to find borders of each molecule fragment. This paper presents a software application that show columns edges of DNA fragments in 3 steps. In the first step the application obtains lane histograms of agarose gel electrophoresis images by doing projection based on x-axis. In the second step, it utilizes k-means clustering algorithm to classify point values of lane histogram such as left side values, right side values and undesired values. In the third step, column edges of DNA fragments is shown by using mean algorithm and mathematical processes to separate DNA fragments from the background in a fully automated way. In addition to this, the application presents locations of DNA fragments and how many DNA fragments exist on images captured by a scientific camera.
NASA Astrophysics Data System (ADS)
Garoni, Timothy M.; Ossola, Giovanni; Polin, Marco; Sokal, Alan D.
2011-08-01
We study, via Monte Carlo simulation, the dynamic critical behavior of the Chayes-Machta dynamics for the Fortuin-Kasteleyn random-cluster model, which generalizes the Swendsen-Wang dynamics for the q-state Potts ferromagnet to non-integer q?1. We consider spatial dimension d=2 and 1.25? q?4 in steps of 0.25, on lattices up to 10242, and obtain estimates for the dynamic critical exponent z CM. We present evidence that when 1? q?1.95 the Ossola-Sokal conjecture z CM? ?/ ? is violated, though we also present plausible fits compatible with this conjecture. We show that the Li-Sokal bound z CM? ?/ ? is close to being sharp over the entire range 1? q?4, but is probably non-sharp by a power. As a byproduct of our work, we also obtain evidence concerning the corrections to scaling in static observables.
NASA Astrophysics Data System (ADS)
Huang, Xiaoming; Sai, Linwei; Jiang, Xue; Zhao, Jijun
2013-02-01
Employing genetic algorithm incorporated with density functional theory calculations we determined the lowest-energy structures of cationic Na n + clusters ( n = 9, 15, 21, 26, 31, 36, 41, 50 and 59). We revealed a transition of growth pattern from "polyicosahedral" sequence to the Mackay icosahedral motif at around n = 40. Based on the ground-state structures the size dependent electronic properties of Na n + clusters including the binding energies, HOMO-LUMO gaps, electron density of states and photoabsorption spectra were discussed. As cluster size increases, the HOMO-LUMO gap of Na n + cluster gradually reduces and converges to metallic behavior of bulk crystal rapidly. The photoabsorption spectra of Na n + clusters from our calculations agree with experimental data rather well, confirming the reliability of our theoretical approaches.
NASA Astrophysics Data System (ADS)
Bellugi, Dino; Milledge, David G.; Dietrich, William E.; Perron, J. Taylor; McKean, Jim
2015-12-01
Predicting shallow landslide size and location across landscapes is important for understanding landscape form and evolution and for hazard identification. We test a recently developed model that couples a search algorithm with 3-D slope stability analysis that predicts these two key attributes in an intensively studied landscape with a 10 year landslide inventory. We use process-based submodels to estimate soil depth, root strength, and pore pressure for a sequence of landslide-triggering rainstorms. We parameterize submodels with field measurements independently of the slope stability model, without calibrating predictions to observations. The model generally reproduces observed landslide size and location distributions, overlaps 65% of observed landslides, and of these predicts size to within factors of 2 and 1.5 in 55% and 28% of cases, respectively. Five percent of the landscape is predicted unstable, compared to 2% recorded landslide area. Missed landslides are not due to the search algorithm but to the formulation and parameterization of the slope stability model and inaccuracy of observed landslide maps. Our model does not improve location prediction relative to infinite-slope methods but predicts landslide size, improves process representation, and reduces reliance on effective parameters. Increasing rainfall intensity or root cohesion generally increases landslide size and shifts locations down hollow axes, while increasing cohesion restricts unstable locations to areas with deepest soils. Our findings suggest that shallow landslide abundance, location, and size are ultimately controlled by covarying topographic, material, and hydrologic properties. Estimating the spatiotemporal patterns of root strength, pore pressure, and soil depth across a landscape may be the greatest remaining challenge.
NASA Astrophysics Data System (ADS)
Feng, Jian-xin; Tang, Jia-fu; Wang, Guang-xing
2007-04-01
On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.
Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong
2015-01-01
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong
2015-01-01
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2014-03-01
We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the q-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported the idea of the GPU implementation for 2D models (Komura and Okabe, 2012). We here explain the details of sample programs, and discuss the performance of the present GPU implementation for the 3D Ising and XY models. We also show the calculated results of the moment ratio for these models, and discuss phase transitions. Catalogue identifier: AERM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 5632 No. of bytes in distributed program, including test data, etc.: 14688 Distribution format: tar.gz Programming language: C, CUDA. Computer: System with an NVIDIA CUDA enabled GPU. Operating system: System with an NVIDIA CUDA enabled GPU. Classification: 23. External routines: NVIDIA CUDA Toolkit 3.0 or newer Nature of problem: Monte Carlo simulation of classical spin systems. Ising, q-state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices. Solution method: GPU-based Swendsen-Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [1] and that by Kalentev et al. [2]. Restrictions: The system size is limited depending on the memory of a GPU. Running time: For the parameters used in the sample programs, it takes about a minute for each program. Of course, it depends on the system size, the number of Monte Carlo steps, etc. References: [1] K.A. Hawick, A. Leist, and D. P. Playne, Parallel Computing 36 (2010) 655-678 [2] O. Kalentev, A. Rai, S. Kemnitzb, and R. Schneider, J. Parallel Distrib. Comput. 71 (2011) 615-620
NASA Astrophysics Data System (ADS)
Douglass, Michael; Bezak, Eva; Penfold, Scott
2015-04-01
The preliminary framework of a combined radiobiological model is developed and calibrated in the current work. The model simulates the production of individual cells forming a tumour, the spatial distribution of individual ionization events (using Geant4-DNA) and the stochastic biochemical repair of DNA double strand breaks (DSBs) leading to the prediction of survival or death of individual cells. In the current work, we expand upon a previously developed tumour generation and irradiation model to include a stochastic ionization damage clustering and DNA lesion repair model. The Geant4 code enabled the positions of each ionization event in the cells to be simulated and recorded for analysis. An algorithm was developed to cluster the ionization events in each cell into simple and complex double strand breaks. The two lesion kinetic (TLK) model was then adapted to predict DSB repair kinetics and the resultant cell survival curve. The parameters in the cell survival model were then calibrated using experimental cell survival data of V79 cells after low energy proton irradiation. A monolayer of V79 cells was simulated using the tumour generation code developed previously. The cells were then irradiated by protons with mean energies of 0.76 MeV and 1.9 MeV using a customized version of Geant4. By replicating the experimental parameters of a low energy proton irradiation experiment and calibrating the model with two sets of data, the model is now capable of predicting V79 cell survival after low energy (<2 MeV) proton irradiation for a custom set of input parameters. The novelty of this model is the realistic cellular geometry which can be irradiated using Geant4-DNA and the method in which the double strand breaks are predicted from clustering the spatial distribution of ionisation events. Unlike the original TLK model which calculates a tumour average cell survival probability, the cell survival probability is calculated for each cell in the geometric tumour model developed in the current work. This model uses fundamental measurable microscopic quantities such as genome length rather than macroscopic radiobiological quantities such as alpha/beta ratios. This means that the model can be theoretically used under a wide range of conditions with a single set of input parameters once calibrated for a given cell line.
Lu, Jing; Chen, Lei; Yin, Jun; Huang, Tao; Bi, Yi; Kong, Xiangyin; Zheng, Mingyue; Cai, Yu-Dong
2016-04-01
Lung cancer, characterized by uncontrolled cell growth in the lung tissue, is the leading cause of global cancer deaths. Until now, effective treatment of this disease is limited. Many synthetic compounds have emerged with the advancement of combinatorial chemistry. Identification of effective lung cancer candidate drug compounds among them is a great challenge. Thus, it is necessary to build effective computational methods that can assist us in selecting for potential lung cancer drug compounds. In this study, a computational method was proposed to tackle this problem. The chemical-chemical interactions and chemical-protein interactions were utilized to select candidate drug compounds that have close associations with approved lung cancer drugs and lung cancer-related genes. A permutation test and K-means clustering algorithm were employed to exclude candidate drugs with low possibilities to treat lung cancer. The final analysis suggests that the remaining drug compounds have potential anti-lung cancer activities and most of them have structural dissimilarity with approved drugs for lung cancer. PMID:26849843
Fuzzy Clustering Neural Networks for Real-Time Odor Recognition System
KarlÄ±k, Bekir; YĂĽksek, Kemal
2007-01-01
The aim of this study is to develop a novel fuzzy clustering neural network (FCNN) algorithm as pattern classifiers for real-time odor recognition system. In this type of FCNN, the input neurons activations are derived through fuzzy c mean clustering of the input data, so that the neural system could deal with the statistics of the measurement error directly. Then the performance of FCNN network is compared with the other network which is well-known algorithm, named multilayer perceptron (MLP), for the same odor recognition system. Experimental results show that both FCNN and MLP provided high recognition probability in determining various learn categories of odors, however, the FCNN neural system has better ability to recognize odors more than the MLP network. PMID:18368140
On the analysis of BIS stage epochs via fuzzy clustering.
Nasibov, Efendi; Ozgören, Murat; Ulutagay, Gözde; Oniz, Adile; Kocaaslan, Sibel
2010-06-01
Among various types of clustering methods, partition-based methods such as k-means and FCM are widely used in the analysis of such data. However, when duration between stimuli is different, such methods are not able to provide satisfactory results because they find equal size clusters according to the fundamental running principle of these methods. In such cases, neighborhood-based clustering methods can give more satisfactory results because measurement series are separated from one another according to dramatic breaking points. In recent years, bispectral index (BIS) monitoring, which is used for monitoring the level of anesthesia, has been used in sleep studies. Sleep stages are classically scored according to the Rechtschaffen and Kales (R&K) scoring system. BIS has been shown to have a strong correlation with the R&K scoring system. In this study, fuzzy neighborhood/density-based spatial clustering of applications with noise (FN-DBSCAN) that combines speed of the DBSCAN algorithm and robustness of the NRFJP algorithm is applied to BIS measurement series. As a result of experiments, we can conclude that, by using BIS data, the FN-DBSCAN method estimates sleep stages better than the fuzzy c-means method. PMID:20156029
NASA Astrophysics Data System (ADS)
Ward, W. O. C.; Wilkinson, P. B.; Chambers, J. E.; Oxby, L. S.; Bai, L.
2014-04-01
A novel method for the effective identification of bedrock subsurface elevation from electrical resistivity tomography images is described. Identifying subsurface boundaries in the topographic data can be difficult due to smoothness constraints used in inversion, so a statistical population-based approach is used that extends previous work in calculating isoresistivity surfaces. The analysis framework involves a procedure for guiding a clustering approach based on the fuzzy c-means algorithm. An approximation of resistivity distributions, found using kernel density estimation, was utilized as a means of guiding the cluster centroids used to classify data. A fuzzy method was chosen over hard clustering due to uncertainty in hard edges in the topography data, and a measure of clustering uncertainty was identified based on the reciprocal of cluster membership. The algorithm was validated using a direct comparison of known observed bedrock depths at two 3-D survey sites, using real-time GPS information of exposed bedrock by quarrying on one site, and borehole logs at the other. Results show similarly accurate detection as a leading isosurface estimation method, and the proposed algorithm requires significantly less user input and prior site knowledge. Furthermore, the method is effectively dimension-independent and will scale to data of increased spatial dimensions without a significant effect on the runtime. A discussion on the results by automated versus supervised analysis is also presented.
Adaptive Clustering of Hypermedia Documents.
ERIC Educational Resources Information Center
Johnson, Andrew; Fotouhi, Farshad
1996-01-01
Discussion of hypermedia systems focuses on a comparison of two types of adaptive algorithm (genetic algorithm and neural network) in clustering hypermedia documents. These clusters allow the user to index into the nodes to find needed information more quickly, since clustering is "personalized" based on the user's paths rather than representing…
Algorithms and Algorithmic Languages.
ERIC Educational Resources Information Center
Veselov, V. M.; Koprov, V. M.
This paper is intended as an introduction to a number of problems connected with the description of algorithms and algorithmic languages, particularly the syntaxes and semantics of algorithmic languages. The terms "letter, word, alphabet" are defined and described. The concept of the algorithm is defined and the relation between the algorithm and…
Fuzzy technique for microcalcifications clustering in digital mammograms
2014-01-01
Background Mammography has established itself as the most efficient technique for the identification of the pathological breast lesions. Among the various types of lesions, microcalcifications are the most difficult to identify since they are quite small (0.1-1.0 mm) and often poorly contrasted against an images background. Within this context, the Computer Aided Detection (CAD) systems could turn out to be very useful in breast cancer control. Methods In this paper we present a potentially powerful microcalcifications cluster enhancement method applicable to digital mammograms. The segmentation phase employs a form filter, obtained from LoG filter, to overcome the dependence from target dimensions and to optimize the recognition efficiency. A clustering method, based on a Fuzzy C-means (FCM), has been developed. The described method, Fuzzy C-means with Features (FCM-WF), was tested on simulated clusters of microcalcifications, implying that the location of the cluster within the breast and the exact number of microcalcifications are known. The proposed method has been also tested on a set of images from the mini-Mammographic database provided by Mammographic Image Analysis Society (MIAS) publicly available. Results The comparison between FCM-WF and standard FCM algorithms, applied on both databases, shows that the former produces better microcalcifications associations for clustering than the latter: with respect to the private and the public database we had a performance improvement of 10% and 5% with regard to the Merit Figure and a 22% and a 10% of reduction of false positives potentially identified in the images, both to the benefit of the FCM-WF. The method was also evaluated in terms of Sensitivity (93% and 82%), Accuracy (95% and 94%), FP/image (4% for both database) and Precision (62% and 65%). Conclusions Thanks to the private database and to the informations contained in it regarding every single microcalcification, we tested the developed clustering method with great accuracy. In particular we verified that 70% of the injected clusters of the private database remained unaffected if the reconstruction is performed with the FCM-WF. Testing the method on the MIAS databases allowed also to verify the segmentation properties of the algorithm, showing that 80% of pathological clusters remained unaffected. PMID:24961885
NASA Astrophysics Data System (ADS)
Hsu, Kuo-Hsien
2012-11-01
Formosat-2 image is a kind of high-spatial-resolution (2 meters GSD) remote sensing satellite data, which includes one panchromatic band and four multispectral bands (Blue, Green, Red, near-infrared). An essential sector in the daily processing of received Formosat-2 image is to estimate the cloud statistic of image using Automatic Cloud Coverage Assessment (ACCA) algorithm. The information of cloud statistic of image is subsequently recorded as an important metadata for image product catalog. In this paper, we propose an ACCA method with two consecutive stages: preprocessing and post-processing analysis. For pre-processing analysis, the un-supervised K-means classification, Sobel's method, thresholding method, non-cloudy pixels reexamination, and cross-band filter method are implemented in sequence for cloud statistic determination. For post-processing analysis, Box-Counting fractal method is implemented. In other words, the cloud statistic is firstly determined via pre-processing analysis, the correctness of cloud statistic of image of different spectral band is eventually cross-examined qualitatively and quantitatively via post-processing analysis. The selection of an appropriate thresholding method is very critical to the result of ACCA method. Therefore, in this work, We firstly conduct a series of experiments of the clustering-based and spatial thresholding methods that include Otsu's, Local Entropy(LE), Joint Entropy(JE), Global Entropy(GE), and Global Relative Entropy(GRE) method, for performance comparison. The result shows that Otsu's and GE methods both perform better than others for Formosat-2 image. Additionally, our proposed ACCA method by selecting Otsu's method as the threshoding method has successfully extracted the cloudy pixels of Formosat-2 image for accurate cloud statistic estimation.
Matlab Cluster Ensemble Toolbox
Sapio, Vincent De; Kegelmeyer, Philip
2009-04-27
This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. With regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.
Delineation of river bed-surface patches by clustering high-resolution spatial grain size data
NASA Astrophysics Data System (ADS)
Nelson, Peter A.; Bellugi, Dino; Dietrich, William E.
2014-01-01
The beds of gravel-bed rivers commonly display distinct sorting patterns, which at length scales of ~ 0.1 - 1 channel widths appear to form an organization of patches or facies. This paper explores alternatives to traditional visual facies mapping by investigating methods of patch delineation in which clustering analysis is applied to a high-resolution grid of spatial grain-size distributions (GSDs) collected during a flume experiment. Specifically, we examine four clustering techniques: 1) partitional clustering of grain-size distributions with the k-means algorithm (assigning each GSD to a type of patch based solely on its distribution characteristics), 2) spatially-constrained agglomerative clustering ("growing" patches by merging adjacent GSDs, thus generating a hierarchical structure of patchiness), 3) spectral clustering using Normalized Cuts (using the spatial distance between GSDs and the distribution characteristics to generate a matrix describing the similarity between all GSDs, and using the eigenvalues of this matrix to divide the bed into patches), and 4) fuzzy clustering with the fuzzy c-means algorithm (assigning each GSD a membership probability to every patch type). For each clustering method, we calculate metrics describing how well-separated cluster-average GSDs are and how patches are arranged in space. We use these metrics to compute optimal clustering parameters, to compare the clustering methods against each other, and to compare clustering results with patches mapped visually during the flume experiment.All clustering methods produced better-separated patch GSDs than the visually-delineated patches. Although they do not produce crisp cluster assignment, fuzzy algorithms provide useful information that can characterize the uncertainty of a location on the bed belonging to any particular type of patch, and they can be used to characterize zones of transition from one patch to another. The extent to which spatial information influences clustering leads to a trade-off between the quality of GSD separation between patch types and the spatial coherence of patches. Methods incorporating spatial information during the clustering process tended to produce a finite number of types of patches. As methods improve for collecting high-resolution grain size data, the approaches described here can be scaled up to field studies to better characterize the grain size heterogeneity of river beds.
NASA Astrophysics Data System (ADS)
Khateri, Parisa; Rad, Hamidreza Saligheh; Jafari, Amir Homayoun; Ay, Mohammad Reza
2014-01-01
Quantitative PET image reconstruction requires an accurate map of attenuation coefficients of the tissue under investigation at 511 keV (ÎĽ-map), and in order to correct the emission data for attenuation. The use of MRI-based attenuation correction (MRAC) has recently received lots of attention in the scientific literature. One of the major difficulties facing MRAC has been observed in the areas where bone and air collide, e.g. ethmoidal sinuses in the head area. Bone is intrinsically not detectable by conventional MRI, making it difficult to distinguish air from bone. Therefore, development of more versatile MR sequences to label the bone structure, e.g. ultra-short echo-time (UTE) sequences, certainly plays a significant role in novel methodological developments. However, long acquisition time and complexity of UTE sequences limit its clinical applications. To overcome this problem, we developed a novel combination of Short-TE (ShTE) pulse sequence to detect bone signal with a 2-point Dixon technique for water-fat discrimination, along with a robust image segmentation method based on fuzzy clustering C-means (FCM) to segment the head area into four classes of air, bone, soft tissue and adipose tissue. The imaging protocol was set on a clinical 3 T Tim Trio and also 1.5 T Avanto (Siemens Medical Solution, Erlangen, Germany) employing a triple echo time pulse sequence in the head area. The acquisition parameters were as follows: TE1/TE2/TE3=0.98/4.925/6.155 ms, TR=8 ms, FA=25 on the 3 T system, and TE1/TE2/TE3=1.1/2.38/4.76 ms, TR=16 ms, FA=18 for the 1.5 T system. The second and third echo-times belonged to the Dixon decomposition to distinguish soft and adipose tissues. To quantify accuracy, sensitivity and specificity of the bone segmentation algorithm, resulting classes of MR-based segmented bone were compared with the manual segmented one by our expert neuro-radiologist. Results for both 3 T and 1.5 T systems show that bone segmentation applied in several slices yields average accuracy, sensitivity and specificity higher than 90%. Results indicate that FCM is an appropriate technique for tissue classification in the sinusoidal area where there is air-bone interface. Furthermore, using Dixon method, fat and brain tissues were successfully separated.
Haplotyping Problem, A Clustering Approach
NASA Astrophysics Data System (ADS)
Eslahchi, Changiz; Sadeghi, Mehdi; Pezeshk, Hamid; Kargar, Mehdi; Poormohammadi, Hadi
2007-09-01
Construction of two haplotypes from a set of Single Nucleotide Polymorphism (SNP) fragments is called haplotype reconstruction problem. One of the most popular computational model for this problem is Minimum Error Correction (MEC). Since MEC is an NP-hard problem, here we propose a novel heuristic algorithm based on clustering analysis in data mining for haplotype reconstruction problem. Based on hamming distance and similarity between two fragments, our iterative algorithm produces two clusters of fragments; then, in each iteration, the algorithm assigns a fragment to one of the clusters. Our results suggest that the algorithm has less reconstruction error rate in comparison with other algorithms.
Haplotyping Problem, A Clustering Approach
Eslahchi, Changiz; Sadeghi, Mehdi; Pezeshk, Hamid; Kargar, Mehdi; Poormohammadi, Hadi
2007-09-06
Construction of two haplotypes from a set of Single Nucleotide Polymorphism (SNP) fragments is called haplotype reconstruction problem. One of the most popular computational model for this problem is Minimum Error Correction (MEC). Since MEC is an NP-hard problem, here we propose a novel heuristic algorithm based on clustering analysis in data mining for haplotype reconstruction problem. Based on hamming distance and similarity between two fragments, our iterative algorithm produces two clusters of fragments; then, in each iteration, the algorithm assigns a fragment to one of the clusters. Our results suggest that the algorithm has less reconstruction error rate in comparison with other algorithms.
Weigend, Florian
2014-10-01
Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf12 and [LaPb7Bi7](4-). For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the "pure" genetic algorithm. PMID:25296780
Weigend, Florian
2014-10-07
Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf{sub 12} and [LaPb{sub 7}Bi{sub 7}]{sup 4â’}. For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the â€śpureâ€ť genetic algorithm.
NASA Astrophysics Data System (ADS)
Ergun, Bahadir; Sahin, Cumhur; Ustuntas, Taner
2014-01-01
Terrestrial Laser Scanners (TLS) are used frequently in three dimensional documentation studies and present an alternative method for three dimensional modeling without any deformation of scale. In this study, point cloud data segmentation is used for photogrammetrical image data production from laser scanner data. The segmentation studies suggest several methods for automation of curve surface determination for digital terrain modeling. In this study, fuzzy logic approach has been used for the automatic segmentation of the regular curve surfaces which differ in their depths to the instrument. This type of shapes has been usually observed in the dome surfaces for close range architectural documentation. The model of C-means integrated fuzzy logic approach has been developed with MatLAB 7.0 software. Gauss2mf membership functions algorithm has been tested with original data set. These results were used in photogrammetric 3D modeling process. As the result of the study, testing the results of point cloud data set has been discussed and interpreted with all of its advantages and disadvantages in Section 5.
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.; Bensaid, Amine M.; Clarke, Laurence P.; Velthuizen, Robert P.; Silbiger, Martin S.; Bezdek, James C.
1992-01-01
Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms and a supervised computational neural network, a dynamic multilayered perception trained with the cascade correlation learning algorithm. Initial clinical results are presented on both normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. However, for a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed.
A Linear Algebra Measure of Cluster Quality.
ERIC Educational Resources Information Center
Mather, Laura A.
2000-01-01
Discussion of models for information retrieval focuses on an application of linear algebra to text clustering, namely, a metric for measuring cluster quality based on the theory that cluster quality is proportional to the number of terms that are disjoint across the clusters. Explains term-document matrices and clustering algorithms. (Author/LRW)
Cluster analysis for pattern recognition in solar butterfly diagrams
NASA Astrophysics Data System (ADS)
Illarionov, E.; Sokoloff, D.; Arlt, R.; Khlystova, A.
2011-07-01
We investigate to what extent the wings of solar butterfly diagrams can be separated without an explicit usage of Hale's polarity law as well as the location of the solar equator. We apply two algorithms of cluster analysis for this purpose, namely DBSCAN and C-means, and demonstrate their ability to separate the wings of contemporary butterfly diagrams based on the sunspot group density in the diagram only. Then we apply the method to historical data concerning the solar activity in the 18th century (Staudacher data). The method separates the two wings for Cycle 2, but fails to separate them for Cycle 1. In our opinion, this finding supports the interpretation of the Staudacher data as an indication of the unusual nature of the solar cycle in the 18th century.
A robust elicitation algorithm for discovering DNA motifs using fuzzy self-organizing maps.
Wang, Dianhui; Tapan, Sarwar
2013-10-01
It is important to identify DNA motifs in promoter regions to understand the mechanism of gene regulation. Computational approaches for finding DNA motifs are well recognized as useful tools to biologists, which greatly help in saving experimental time and cost in wet laboratories. Self-organizing maps (SOMs), as a powerful clustering tool, have demonstrated good potential for problem solving. However, the current SOM-based motif discovery algorithms unfairly treat data samples lying around the cluster boundaries by assigning them to one of the nodes, which may result in unreliable system performance. This paper aims to develop a robust framework for discovering DNA motifs, where fuzzy SOMs, with an integration of fuzzy c-means membership functions and a standard batch-learning scheme, are employed to extract putative motifs with varying length in a recursive manner. Experimental results on eight real datasets show that our proposed algorithm outperforms the other searching tools such as SOMBRERO, SOMEA, MEME, AlignACE, and WEEDER in terms of the F-measure and algorithm reliability. It is observed that a remarkable 24.6% improvement can be achieved compared to the state-of-the-art SOMBRERO. Furthermore, our algorithm can produce a 20% and 6.6% improvement over SOMBRERO and SOMEA, respectively, in finding multiple motifs on five artificial datasets. PMID:24808603
An Incremental Clustering with Attribute Unbalance Considered for Categorical Data
NASA Astrophysics Data System (ADS)
Chen, Jize; Yang, Zhimin; Yin, Jian; Yang, Xiaobo; Huang, Li
Clustering analysis is an important technique used in many fields. But traditional clustering algorithms generally deal with numeric data. While clustering categorical data have always attracted researchers’ attentions because of their prevalence in real life. This paper analyses limitations of the categorical clustering algorithms proposed. Based on two observations, a new similarity measure is proposed for categorical data which considers the unbalance of attributes. As the data are getting much larger and more dynamic, incremental is an important quality of good clustering algorithms. The clustering algorithm present is an incremental with linear computing complexity. The experiment results indicate that it outperforms other categorical clustering algorithms referred in the paper.
Yorek, Nurettin; Ugulu, Ilker; Aydin, Halil
2016-01-01
We propose an approach to clustering and visualization of students' cognitive structural models. We use the self-organizing map (SOM) combined with Ward's clustering to conduct cluster analysis. In the study carried out on 100 subjects, a conceptual understanding test consisting of open-ended questions was used as a data collection tool. The results of analyses indicated that students constructed the aliveness concept by associating it predominantly with human. Motion appeared as the most frequently associated term with the aliveness concept. The results suggest that the aliveness concept has been constructed using anthropocentric and animistic cognitive structures. In the next step, we used the data obtained from the conceptual understanding test for training the SOM. Consequently, we propose a visualization method about cognitive structure of the aliveness concept. PMID:26819579
Ugulu, Ilker; Aydin, Halil
2016-01-01
We propose an approach to clustering and visualization of students' cognitive structural models. We use the self-organizing map (SOM) combined with Ward's clustering to conduct cluster analysis. In the study carried out on 100 subjects, a conceptual understanding test consisting of open-ended questions was used as a data collection tool. The results of analyses indicated that students constructed the aliveness concept by associating it predominantly with human. Motion appeared as the most frequently associated term with the aliveness concept. The results suggest that the aliveness concept has been constructed using anthropocentric and animistic cognitive structures. In the next step, we used the data obtained from the conceptual understanding test for training the SOM. Consequently, we propose a visualization method about cognitive structure of the aliveness concept. PMID:26819579
NASA Astrophysics Data System (ADS)
Zhou, Pu-cheng; Liu, Cun-chao
2013-08-01
Camouflaged targets detection in complex background is a challenging problem. Spectral-polarimetric imaging can offers spectral information and polarization information from the objects in the scene. Fusion of the spectral and polarization information in the images will result in better camouflaged target identification and recognition. In this paper a novel spectral-polarimetric image fusion algorithm based on Shearlet transform is proposed. Firstly, every polarimetric image in each wave band is decomposed into images of low frequency components and high frequency components by Shearlet transform. Then, the fused low frequency approximate coefficients are obtained with weighted average method, and the fused high frequency coefficients are obtained with area-based feature selection method, so features and details from different spectral-polarimetric images are fused successfully. After that, the kernel fuzzy c-means clustering algorithm is used for camouflaged object separation from its background. Experimental results have shown that better identification performance was achieved.
Change detection in synthetic aperture radar images based on image fusion and fuzzy clustering.
Gong, Maoguo; Zhou, Zhiqiang; Ma, Jingjing
2012-04-01
This paper presents an unsupervised distribution-free change detection approach for synthetic aperture radar (SAR) images based on an image fusion strategy and a novel fuzzy clustering algorithm. The image fusion technique is introduced to generate a difference image by using complementary information from a mean-ratio image and a log-ratio image. In order to restrain the background information and enhance the information of changed regions in the fused difference image, wavelet fusion rules based on an average operator and minimum local area energy are chosen to fuse the wavelet coefficients for a low-frequency band and a high-frequency band, respectively. A reformulated fuzzy local-information C-means clustering algorithm is proposed for classifying changed and unchanged regions in the fused difference image. It incorporates the information about spatial context in a novel fuzzy way for the purpose of enhancing the changed information and of reducing the effect of speckle noise. Experiments on real SAR images show that the image fusion strategy integrates the advantages of the log-ratio operator and the mean-ratio operator and gains a better performance. The change detection results obtained by the improved fuzzy clustering algorithm exhibited lower error than its preexistences. PMID:21984509
Multiview approach to spectral clustering.
Kanaan-Izquierdo, Samir; Ziyatdinov, Andrey; Massanet, Raimon; Perera, Alexandre
2012-01-01
In this paper we propose a generic approach to the multiview clustering problem that can be applied to any number of data views and with different topologies, either continuous, discrete, graphs, or other. The proposed method is an extension of the well-established spectral clustering algorithm to integrate the information from several data views in the partition solution. The algorithm, therefore, resolves a joint cluster structure which could be present in all views, which enables researchers to better resolve data structures in data fusion problems The application of this novel clustering approach covers an extended number of machine learning unsupervised clustering problems including biomedical analysis or machine vision. PMID:23366126
A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.
Liu, Zhi; Xu, Shuqiong; Zhang, Yun; Chen, Chun Lung Philip
2014-11-01
This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method. PMID:25248211
Brightest Cluster Galaxy Identification
NASA Astrophysics Data System (ADS)
Leisman, Luke; Haarsma, D. B.; Sebald, D. A.; ACCEPT Team
2011-01-01
Brightest cluster galaxies (BCGs) play an important role in several fields of astronomical research. The literature includes many different methods and criteria for identifying the BCG in the cluster, such as choosing the brightest galaxy, the galaxy nearest the X-ray peak, or the galaxy with the most extended profile. Here we examine a sample of 75 clusters from the Archive of Chandra Cluster Entropy Profile Tables (ACCEPT) and the Sloan Digital Sky Survey (SDSS), measuring masked magnitudes and profiles for BCG candidates in each cluster. We first identified galaxies by hand; in 15% of clusters at least one team member selected a different galaxy than the others.We also applied 6 other identification methods to the ACCEPT sample; in 30% of clusters at least one of these methods selected a different galaxy than the other methods. We then developed an algorithm that weighs brightness, profile, and proximity to the X-ray peak and centroid. This algorithm incorporates the advantages of by-hand identification (weighing multiple properties) and automated selection (repeatable and consistent). The BCG population chosen by the algorithm is more uniform in its properties than populations selected by other methods, particularly in the relation between absolute magnitude (a proxy for galaxy mass) and average gas temperature (a proxy for cluster mass). This work supported by a Barry M. Goldwater Scholarship and a Sid Jansma Summer Research Fellowship.
NASA Astrophysics Data System (ADS)
Arrell, K. E.; Fisher, P. F.; Tate, N. J.; Bastin, L.
2007-10-01
The increasing global coverage of high resolution/large-scale digital elevation data has allowed the study of geomorphological form to receive renewed attention by providing accessible datasets for the characterisation and quantification of land surfaces. Digital elevation models (DEMs) provide quantitative elevation data, but it is the characterisation and extraction of geomorphologically significant measures (morphometric indices) from these raw data that form more informative and useful datasets. Common to many geographical measures, morphometric measures derived from DEMs are dependent on the scale of observation. This paper reports results of employing a fuzzy c-means classification for a sample DEM from Snowdonia, Wales, with a number of morphometric measures at different resolutions as input, and morphometric classification of landforms at each resolution as output. The classifications reveal that different landscape components or morphometric classes are important at different resolutions, and that morphometric classes exhibit resolution dependency in their geographical extents. Examination of the scale dependency and behaviour of morphometric classifications of landforms at different resolutions provides a fuller and more holistic view of the classes present than a single-scale analysis.
NASA Astrophysics Data System (ADS)
Li, Chaofeng; Yang, Maolong; Shi, Chengxian; Xia, Deshen
2003-09-01
Road extracted from satellite imagery have been used for many different purposes, e.g. military, map publishing, transportation, and car navigations, etc. Many method such as, neural network, Knowledge-based, Optimal search, Snake model, Semantic model, Road operator model, etc. was researched to identify road from satellite image, but because of complicated characteristics of road and image itself, and automated road network extraction still remains a challenge problem, and no existing software is able to perform the task reliably. This paper presents a hybrid method which combines Fuzzy-C-Means with back-propagation neural network and knowledge processing technique to detect roads in SPOT image. The basic idea of the paper is "easiest first" principal, and firstly focus to extract local salient road segments most easily and reliably, then use contextual knowledge and supervised back-propagation neural network model to extract fuzzy road segments among salient road segment, and then grouping these extracted pixel as seed point, candidate point, and not-road point, and then according to appropriate knowledge rule to traversal and join, guide the further road link in the whole image. At last, some post-processing steps are taken to refine the result. The resultant image shows this hybrid identification method performs better than only using knowledge-based method or neural network techniques.
Symmetry Based Automatic Evolution of Clusters: A New Approach to Data Clustering
Vijendra, Singh; Laxman, Sahoo
2015-01-01
We present a multiobjective genetic clustering approach, in which data points are assigned to clusters based on new line symmetry distance. The proposed algorithm is called multiobjective line symmetry based genetic clustering (MOLGC). Two objective functions, first the Davies-Bouldin (DB) index and second the line symmetry distance based objective functions, are used. The proposed algorithm evolves near-optimal clustering solutions using multiple clustering criteria, without a priori knowledge of the actual number of clusters. The multiple randomized K dimensional (Kd) trees based nearest neighbor search is used to reduce the complexity of finding the closest symmetric points. Experimental results based on several artificial and real data sets show that proposed clustering algorithm can obtain optimal clustering solutions in terms of different cluster quality measures in comparison to existing SBKM and MOCK clustering algorithms. PMID:26339233
Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold
2014-12-01
In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings. PMID:24802018
Bayesian Decision Theoretical Framework for Clustering
ERIC Educational Resources Information Center
Chen, Mo
2011-01-01
In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the…
Bayesian Decision Theoretical Framework for Clustering
ERIC Educational Resources Information Center
Chen, Mo
2011-01-01
In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, theâ€¦
Slonim, Noam; Atwal, Gurinder Singh; Tka?ik, Gašper; Bialek, William
2005-01-01
In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial assumptions about the structure of data. Here, we reformulate the clustering problem from an information theoretic perspective that avoids many of these assumptions. In particular, our formulation obviates the need for defining a cluster “prototype,” does not require an a priori similarity metric, is invariant to changes in the representation of the data, and naturally captures nonlinear relations. We apply this approach to different domains and find that it consistently produces clusters that are more coherent than those extracted by existing algorithms. Finally, our approach provides a way of clustering based on collective notions of similarity rather than the traditional pairwise measures. PMID:16352721
Orbit Clustering Based on Transfer Cost
NASA Technical Reports Server (NTRS)
Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.
2013-01-01
We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.
Gong, Hui; Chen, Shangbin; Zhang, Bin; Ding, Wenxiang; Luo, Qingming; Li, Anan
2014-01-01
Characterizing cytoarchitecture is crucial for understanding brain functions and neural diseases. In neuroanatomy, it is an important task to accurately extract cell populations' centroids and contours. Recent advances have permitted imaging at single cell resolution for an entire mouse brain using the Nissl staining method. However, it is difficult to precisely segment numerous cells, especially those cells touching each other. As presented herein, we have developed an automated three-dimensional detection and segmentation method applied to the Nissl staining data, with the following two key steps: 1) concave points clustering to determine the seed points of touching cells; and 2) random walker segmentation to obtain cell contours. Also, we have evaluated the performance of our proposed method with several mouse brain datasets, which were captured with the micro-optical sectioning tomography imaging system, and the datasets include closely touching cells. Comparing with traditional detection and segmentation methods, our approach shows promising detection accuracy and high robustness. PMID:25111442
Matlab Cluster Ensemble Toolbox
Energy Science and Technology Software Center (ESTSC)
2009-04-27
This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. WithmoreÂ Â» regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.Â«Â less
Clustering of Multi-Temporal Fully Polarimetric L-Band SAR Data for Agricultural Land Cover Mapping
NASA Astrophysics Data System (ADS)
Tamiminia, H.; Homayouni, S.; Safari, A.
2015-12-01
Recently, the unique capabilities of Polarimetric Synthetic Aperture Radar (PolSAR) sensors make them an important and efficient tool for natural resources and environmental applications, such as land cover and crop classification. The aim of this paper is to classify multi-temporal full polarimetric SAR data using kernel-based fuzzy C-means clustering method, over an agricultural region. This method starts with transforming input data into the higher dimensional space using kernel functions and then clustering them in the feature space. Feature space, due to its inherent properties, has the ability to take in account the nonlinear and complex nature of polarimetric data. Several SAR polarimetric features extracted using target decomposition algorithms. Features from Cloude-Pottier, Freeman-Durden and Yamaguchi algorithms used as inputs for the clustering. This method was applied to multi-temporal UAVSAR L-band images acquired over an agricultural area near Winnipeg, Canada, during June and July in 2012. The results demonstrate the efficiency of this approach with respect to the classical methods. In addition, using multi-temporal data in the clustering process helped to investigate the phenological cycle of plants and significantly improved the performance of agricultural land cover mapping.
Color segmentation using MDL clustering
NASA Astrophysics Data System (ADS)
Wallace, Richard S.; Suenaga, Yasuhito
1991-02-01
This paper describes a procedure for segmentation of color face images. A cluster analysis algorithm uses a subsample of the input image color pixels to detect clusters in color space. The clustering program consists of two parts. The first part searches for a hierarchical clustering using the NIHC algorithm. The second part searches the resultant cluster tree for a level clustering having minimum description length (MDL). One of the primary advantages of the MDL paradigm is that it enables writing robust vision algorithms that do not depend on user-specified threshold parameters or other " magic numbers. " This technical note describes an application of minimal length encoding in the analysis of digitized human face images at the NTT Human Interface Laboratories. We use MDL clustering to segment color images of human faces. For color segmentation we search for clusters in color space. Using only a subsample of points from the original face image our clustering program detects color clusters corresponding to the hair skin and background regions in the image. Then a maximum likelyhood classifier assigns the remaining pixels to each class. The clustering program tends to group small facial features such as the nostrils mouth and eyes together but they can be separated from the larger classes through connected components analysis.
DNA clustering and genome complexity.
Dios, Francisco; Barturen, Guillermo; Lebrón, Ricardo; Rueda, Antonio; Hackenberg, Michael; Oliver, José L
2014-12-01
Early global measures of genome complexity (power spectra, the analysis of fluctuations in DNA walks or compositional segmentation) uncovered a high degree of complexity in eukaryotic genome sequences. The main evolutionary mechanisms leading to increases in genome complexity (i.e. gene duplication and transposon proliferation) can all potentially produce increases in DNA clustering. To quantify such clustering and provide a genome-wide description of the formed clusters, we developed GenomeCluster, an algorithm able to detect clusters of whatever genome element identified by chromosome coordinates. We obtained a detailed description of clusters for ten categories of human genome elements, including functional (genes, exons, introns), regulatory (CpG islands, TFBSs, enhancers), variant (SNPs) and repeat (Alus, LINE1) elements, as well as DNase hypersensitivity sites. For each category, we located their clusters in the human genome, then quantifying cluster length and composition, and estimated the clustering level as the proportion of clustered genome elements. In average, we found a 27% of elements in clusters, although a considerable variation occurs among different categories. Genes form the lowest number of clusters, but these are the longest ones, both in bp and the average number of components, while the shortest clusters are formed by SNPs. Functional and regulatory elements (genes, CpG islands, TFBSs, enhancers) show the highest clustering level, as compared to DNase sites, repeats (Alus, LINE1) or SNPs. Many of the genome elements we analyzed are known to be composed of clusters of low-level entities. In addition, we found here that the clusters generated by GenomeCluster can be in turn clustered into high-level super-clusters. The observation of 'clusters-within-clusters' parallels the 'domains within domains' phenomenon previously detected through global statistical methods in eukaryotic sequences, and reveals a complex human genome landscape dominated by hierarchical clustering. PMID:25182383
Ghaheri, Salehe; Masoum, Saeed; Gholami, Ali
2016-01-15
Analysis of fragrance composition is very important for both the fragrance producers and consumers. Unraveling of fragrance formulation is necessary for quality control, competitor and trace analysis. Gas chromatography-mass spectrometry (GC-MS) has been introduced as the most appropriate analytical technique for this type of analysis, which is based on Kovats index and MS database. The most straightforward method to analyze a GC-MS dataset is to integrate those peaks that can be recognized by their mass profiles. But, because of common problems of chromatographic data such as spectral background, baseline offset and specially overlapped peaks, accurate quantitative and qualitative analysis could be failed. Some chemometric modeling techniques such as bilinear multivariate curve resolution (MCR) methods have been introduced to overcome these problems and obtained well resolved chromatographic profiles. The main drawback of these methods is rotational ambiguity or nonunique solution that is represented as area of feasible solutions (AFS). Polygonal inflation algorithm (PIA) is an automatic and simple to use algorithm for numerical computation of AFS. In this study, the extent of rotational ambiguity in curve resolution methods is calculated by MCR-BAND toolbox and the PIA. The ability of the PIA in resolving GC-MS data sets is evaluated by simulated GC-MS data in comparison with other popular curve resolution methods such as multivariate curve resolution alternative least square (MCR-ALS), multivariate curve resolution objective function minimization (MCR-FMIN) by different initial estimation methods and independent component analysis (ICA). In addition, two typical challenging area of total ion chromatogram (TIC) of commercial fragrances with overlapped peaks were analyzed by the PIA to investigate the possibility of peak deconvolution analysis. PMID:26711156
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth
2015-01-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
Fully Automated Complementary DNA Microarray Segmentation using a Novel Fuzzy-based Algorithm
Saberkari, Hamidreza; Bahrami, Sheyda; Shamsi, Mousa; Amoshahy, Mohammad Javad; Ghavifekr, Habib Badri; Sedaaghi, Mohammad Hossein
2015-01-01
DNA microarray is a powerful approach to study simultaneously, the expression of 1000 of genes in a single experiment. The average value of the fluorescent intensity could be calculated in a microarray experiment. The calculated intensity values are very close in amount to the levels of expression of a particular gene. However, determining the appropriate position of every spot in microarray images is a main challenge, which leads to the accurate classification of normal and abnormal (cancer) cells. In this paper, first a preprocessing approach is performed to eliminate the noise and artifacts available in microarray cells using the nonlinear anisotropic diffusion filtering method. Then, the coordinate center of each spot is positioned utilizing the mathematical morphology operations. Finally, the position of each spot is exactly determined through applying a novel hybrid model based on the principle component analysis and the spatial fuzzy c-means clustering (SFCM) algorithm. Using a Gaussian kernel in SFCM algorithm will lead to improving the quality in complementary DNA microarray segmentation. The performance of the proposed algorithm has been evaluated on the real microarray images, which is available in Stanford Microarray Databases. Results illustrate that the accuracy of microarray cells segmentation in the proposed algorithm reaches to 100% and 98% for noiseless/noisy cells, respectively. PMID:26284175
Fully Automated Complementary DNA Microarray Segmentation using a Novel Fuzzy-based Algorithm.
Saberkari, Hamidreza; Bahrami, Sheyda; Shamsi, Mousa; Amoshahy, Mohammad Javad; Ghavifekr, Habib Badri; Sedaaghi, Mohammad Hossein
2015-01-01
DNA microarray is a powerful approach to study simultaneously, the expression of 1000 of genes in a single experiment. The average value of the fluorescent intensity could be calculated in a microarray experiment. The calculated intensity values are very close in amount to the levels of expression of a particular gene. However, determining the appropriate position of every spot in microarray images is a main challenge, which leads to the accurate classification of normal and abnormal (cancer) cells. In this paper, first a preprocessing approach is performed to eliminate the noise and artifacts available in microarray cells using the nonlinear anisotropic diffusion filtering method. Then, the coordinate center of each spot is positioned utilizing the mathematical morphology operations. Finally, the position of each spot is exactly determined through applying a novel hybrid model based on the principle component analysis and the spatial fuzzy c-means clustering (SFCM) algorithm. Using a Gaussian kernel in SFCM algorithm will lead to improving the quality in complementary DNA microarray segmentation. The performance of the proposed algorithm has been evaluated on the real microarray images, which is available in Stanford Microarray Databases. Results illustrate that the accuracy of microarray cells segmentation in the proposed algorithm reaches to 100% and 98% for noiseless/noisy cells, respectively. PMID:26284175
Swarm Intelligence in Text Document Clustering
Cui, Xiaohui; Potok, Thomas E
2008-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.
NASA Astrophysics Data System (ADS)
Liu, Xiaoming; Mei, Ming; Liu, Jun; Hu, Wei
2015-12-01
Clustered microcalcifications (MCs) in mammograms are an important early sign of breast cancer in women. Their accurate detection is important in computer-aided detection (CADe). In this paper, we integrated the possibilistic fuzzy c-means (PFCM) clustering algorithm and weighted support vector machine (WSVM) for the detection of MC clusters in full-field digital mammograms (FFDM). For each image, suspicious MC regions are extracted with region growing and active contour segmentation. Then geometry and texture features are extracted for each suspicious MC, a mutual information-based supervised criterion is used to select important features, and PFCM is applied to cluster the samples into two clusters. Weights of the samples are calculated based on possibilities and typicality values from the PFCM, and the ground truth labels. A weighted nonlinear SVM is trained. During the test process, when an unknown image is presented, suspicious regions are located with the segmentation step, selected features are extracted, and the suspicious MC regions are classified as containing MC or not by the trained weighted nonlinear SVM. Finally, the MC regions are analyzed with spatial information to locate MC clusters. The proposed method is evaluated using a database of 410 clinical mammograms and compared with a standard unweighted support vector machine (SVM) classifier. The detection performance is evaluated using response receiver operating (ROC) curves and free-response receiver operating characteristic (FROC) curves. The proposed method obtained an area under the ROC curve of 0.8676, while the standard SVM obtained an area of 0.8268 for MC detection. For MC cluster detection, the proposed method obtained a high sensitivity of 92 % with a false-positive rate of 2.3 clusters/image, and it is also better than standard SVM with 4.7 false-positive clusters/image at the same sensitivity.
Efficient clustering aggregation based on data fragments.
Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing
2012-06-01
Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy. PMID:22334025
The applicability and effectiveness of cluster analysis
NASA Technical Reports Server (NTRS)
Ingram, D. S.; Actkinson, A. L.
1973-01-01
An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.
NASA Technical Reports Server (NTRS)
Wang, Lui; Bayer, Steven E.
1991-01-01
Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.
Time series clustering analysis of health-promoting behavior
NASA Astrophysics Data System (ADS)
Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng
2013-10-01
Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.
Spontaneous clustering via minimum Îł-divergence.
Notsu, Akifumi; Komori, Osamu; Eguchi, Shinto
2014-02-01
We propose a new method for clustering based on local minimization of the gamma-divergence, which we call spontaneous clustering. The greatest advantage of the proposed method is that it automatically detects the number of clusters that adequately reflect the data structure. In contrast, existing methods, such as K-means, fuzzy c-means, or model-based clustering need to prescribe the number of clusters. We detect all the local minimum points of the gamma-divergence, by which we define the cluster centers. A necessary and sufficient condition for the gamma-divergence to have local minimum points is also derived in a simple setting. Applications to simulated and real data are presented to compare the proposed method with existing ones. PMID:24206383
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Toward Parallel Document Clustering
Mogill, Jace A.; Haglin, David J.
2011-09-01
A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.
Muetterties, Earl L.
1980-05-01
Metal cluster chemistry is one of the most rapidly developing areas of inorganic and organometallic chemistry. Prior to 1960 only a few metal clusters were well characterized. However, shortly after the early development of boron cluster chemistry, the field of metal cluster chemistry began to grow at a very rapid rate and a structural and a qualitative theoretical understanding of clusters came quickly. Analyzed here is the chemistry and the general significance of clusters with particular emphasis on the cluster research within my group. The importance of coordinately unsaturated, very reactive metal clusters is the major subject of discussion.
Hierarchical Dirichlet process model for gene expression clustering.
Wang, Liming; Wang, Xiaodong
2013-01-01
: Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447
A GMBCG Galaxy Cluster Catalog of 55,424 Rich Clusters from SDSS DR7
Hao, Jiangang; McKay, Timothy A.; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Annis, James; Wechsler, Risa H.; Evrard, August; Siegel, Seth R.; Becker, Matthew; Busha, Michael; Gerdes, David; Johnston, David E.; Sheldon, Erin; /Brookhaven
2011-08-22
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
A GMBCG galaxy cluster catalog of 55,880 rich clusters from SDSS DR7
Hao, Jiangang; McKay, Timothy A.; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Annis, James; Wechsler, Risa H.; Evrard, August; Siegel, Seth R.; Becker, Matthew; Busha, Michael; /Fermilab /Michigan U. /Chicago U., Astron. Astrophys. Ctr. /UC, Santa Barbara /KICP, Chicago /KIPAC, Menlo Park /SLAC /Caltech /Brookhaven
2010-08-01
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
Hierarchical clustering in minimum spanning trees
NASA Astrophysics Data System (ADS)
Yu, Meichen; Hillebrand, Arjan; Tewarie, Prejaas; Meier, Jil; van Dijk, Bob; Van Mieghem, Piet; Stam, Cornelis Jan
2015-02-01
The identification of clusters or communities in complex networks is a reappearing problem. The minimum spanning tree (MST), the tree connecting all nodes with minimum total weight, is regarded as an important transport backbone of the original weighted graph. We hypothesize that the clustering of the MST reveals insight in the hierarchical structure of weighted graphs. However, existing theories and algorithms have difficulties to define and identify clusters in trees. Here, we first define clustering in trees and then propose a tree agglomerative hierarchical clustering (TAHC) method for the detection of clusters in MSTs. We then demonstrate that the TAHC method can detect clusters in artificial trees, and also in MSTs of weighted social networks, for which the clusters are in agreement with the previously reported clusters of the original weighted networks. Our results therefore not only indicate that clusters can be found in MSTs, but also that the MSTs contain information about the underlying clusters of the original weighted network.
Dynamic Trajectory Extraction from Stereo Vision Using Fuzzy Clustering
NASA Astrophysics Data System (ADS)
Onishi, Masaki; Yoda, Ikushi
In recent years, many human tracking researches have been proposed in order to analyze human dynamic trajectory. These researches are general technology applicable to various fields, such as customer purchase analysis in a shopping environment and safety control in a (railroad) crossing. In this paper, we present a new approach for tracking human positions by stereo image. We use the framework of two-stepped clustering with k-means method and fuzzy clustering to detect human regions. In the initial clustering, k-means method makes middle clusters from objective features extracted by stereo vision at high speed. In the last clustering, c-means fuzzy method cluster middle clusters based on attributes into human regions. Our proposed method can be correctly clustered by expressing ambiguity using fuzzy clustering, even when many people are close to each other. The validity of our technique was evaluated with the experiment of trajectories extraction of doctors and nurses in an emergency room of a hospital.
Combining multiple clusterings using evidence accumulation.
Fred, Ana L N; Jain, Anil K
2005-06-01
We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble--a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying different clustering algorithms and 2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n x n, similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the K-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well-known clustering algorithms. PMID:15943417
NASA Technical Reports Server (NTRS)
Abrams, D.; Williams, C.
1999-01-01
This thesis describes several new quantum algorithms. These include a polynomial time algorithm that uses a quantum fast Fourier transform to find eigenvalues and eigenvectors of a Hamiltonian operator, and that can be applied in cases for which all know classical algorithms require exponential time.
A fully automated algorithm under modified FCM framework for improved brain MR image segmentation.
Sikka, Karan; Sinha, Nitesh; Singh, Pankaj K; Mishra, Amit K
2009-09-01
Automated brain magnetic resonance image (MRI) segmentation is a complex problem especially if accompanied by quality depreciating factors such as intensity inhomogeneity and noise. This article presents a new algorithm for automated segmentation of both normal and diseased brain MRI. An entropy driven homomorphic filtering technique has been employed in this work to remove the bias field. The initial cluster centers are estimated using a proposed algorithm called histogram-based local peak merger using adaptive window. Subsequently, a modified fuzzy c-mean (MFCM) technique using the neighborhood pixel considerations is applied. Finally, a new technique called neighborhood-based membership ambiguity correction (NMAC) has been used for smoothing the boundaries between different tissue classes as well as to remove small pixel level noise, which appear as misclassified pixels even after the MFCM approach. NMAC leads to much sharper boundaries between tissues and, hence, has been found to be highly effective in prominently estimating the tissue and tumor areas in a brain MR scan. The algorithm has been validated against MFCM and FMRIB software library using MRI scans from BrainWeb. Superior results to those achieved with MFCM technique have been observed along with the collateral advantages of fully automatic segmentation, faster computation and faster convergence of the objective function. PMID:19395212
NASA Astrophysics Data System (ADS)
Colless, M.; Murdin, P.
2000-11-01
Coma, with Virgo, is one of the best-studied GALAXY CLUSTERS. The cluster is located at right ascension 12h59m48.7s, declination +27°58'50'' (J2000) and lies almost at the north Galactic pole, (l,b)=(58°,+88°). In the major catalog of galaxy clusters by Abell, its designation is Abell 1656 (see ABELL CLUSTERS). The mean redshift of the cluster is approximately 6900 km s-1, which puts it at a dis...
Clustering analysis of moving target signatures
NASA Astrophysics Data System (ADS)
Martone, Anthony; Ranney, Kenneth; Innocenti, Roberto
2010-04-01
Previously, we developed a moving target indication (MTI) processing approach to detect and track slow-moving targets inside buildings, which successfully detected moving targets (MTs) from data collected by a low-frequency, ultra-wideband radar. Our MTI algorithms include change detection, automatic target detection (ATD), clustering, and tracking. The MTI algorithms can be implemented in a real-time or near-real-time system; however, a person-in-the-loop is needed to select input parameters for the clustering algorithm. Specifically, the number of clusters to input into the cluster algorithm is unknown and requires manual selection. A critical need exists to automate all aspects of the MTI processing formulation. In this paper, we investigate two techniques that automatically determine the number of clusters: the adaptive knee-point (KP) algorithm and the recursive pixel finding (RPF) algorithm. The KP algorithm is based on a well-known heuristic approach for determining the number of clusters. The RPF algorithm is analogous to the image processing, pixel labeling procedure. Both algorithms are used to analyze the false alarm and detection rates of three operational scenarios of personnel walking inside wood and cinderblock buildings.
Chalmers, Eric; Le, Jonathan; Sukhdeep, Dulai; Watt, Joe; Andersen, John; Lou, Edmond
2014-01-01
When children walk on their toes for no known reason, the condition is called Idiopathic Toe Walking (ITW). Assessing the true severity of ITW can be difficult because children can alter their gait while under observation in clinic. The ability to monitor the foot angle during daily life outside of clinic may improve the assessment of ITW. A foot-worn, battery-powered inertial sensing device has been designed to monitor patients' foot angle during daily activities. The monitor includes a 3-axis accelerometer, 2-axis gyroscope, and a low-power microcontroller. The device is necessarily small, with limited battery capacity and processing power. Therefore a high-accuracy but low-complexity inertial sensing algorithm is needed. This paper compares several low-complexity algorithms' aptitude for foot-angle measurement: accelerometer-only measurement, finite impulse response (FIR) and infinite impulse response (IIR) complementary filtering, and a new dynamic predict-correct style algorithm developed using fuzzy c-means clustering. A total of 11 subjects each walked 20 m with the inertial sensing device fixed to one foot; 10 m with normal gait and 10 m simulating toe walking. A cross-validation scheme was used to obtain a low-bias estimate of each algorithm's angle measurement accuracy. The new predict-correct algorithm achieved the lowest angle measurement error: <5° mean error during normal and toe walking. The IIR complementary filtering algorithm achieved almost-as good accuracy with less computational complexity. These two algorithms seem to have good aptitude for the foot-angle measurement problem, and would be good candidates for use in a long-term monitoring device for toe-walking assessment. PMID:24050952
Improved Fuzzy Clustering Techniques for Categorical Data
NASA Astrophysics Data System (ADS)
Saha, Indrajit; Maulik, Ujjwal
2009-01-01
Clustering is a widely used technique in data mining application for discovering patterns in underlying data. Most traditional clustering algorithms are limited in handling datasets that contain categorical attributes. Howerver, datasets with categorical types of attributes are common in real life data mining problem. For these data sets, no inherent distance measure, like the Euclidean distance, would work to compute the distance between two catgorical objects. In this article, we have described differential evolution, genetic algorithm and simulated annealing based fuzzy clustering. The performance of the proposed algorithms have been compared with that of different well known categorical data clustering algorithms and demonstrated for a variety of artificial and real life categorical data sets. Statistical significance test has been performed to establish the superiority of the proposed algorithms.
Discriminative clustering via extreme learning machine.
Huang, Gao; Liu, Tianchi; Yang, Yan; Lin, Zhiping; Song, Shiji; Wu, Cheng
2015-10-01
Discriminative clustering is an unsupervised learning framework which introduces the discriminative learning rule of supervised classification into clustering. The underlying assumption is that a good partition (clustering) of the data should yield high discrimination, namely, the partitioned data can be easily classified by some classification algorithms. In this paper, we propose three discriminative clustering approaches based on Extreme Learning Machine (ELM). The first algorithm iteratively trains weighted ELM (W-ELM) classifier to gradually maximize the data discrimination. The second and third methods are both built on Fisher's Linear Discriminant Analysis (LDA); but one approach adopts alternative optimization, while the other leverages kernel k-means. We show that the proposed algorithms can be easily implemented, and yield competitive clustering accuracy on real world data sets compared to state-of-the-art clustering methods. PMID:26143036
Histamine headache; Headache - histamine; Migrainous neuralgia; Headache - cluster; Horton's headache ... A cluster headache begins as a severe, sudden headache. The headache commonly strikes 2 to 3 hours after you fall asleep. ...
A Fast Implementation of the ISOCLUS Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2003-01-01
Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor modification to the ISOCLUS specification. We provide empirical evidence, on both synthetic and Landsat image data sets, that our algorithm's performance is essentially the same as that of ISOCLUS, but with significantly lower running times. We show that our algorithm runs from 3 to 30 times faster than a straightforward implementation of ISOCLUS. Our adaptation of the filtering algorithm involves the efficient computation of a number of cluster statistics that are needed for ISOCLUS, but not for k-means.
Construction of the Ensemble of Logical Models in Cluster Analysis
NASA Astrophysics Data System (ADS)
Berikov, Vladimir
In this paper, the algorithm of cluster analysis based on the ensemble of tree-like logical models (decision trees) is proposed. During the construction of the ensemble, the algorithm takes into account distances between logical statements describing clusters. Besides, we consider some properties of the Bayes model of classification. These properties are used at the motivation of information-probabilistic criterion of clustering quality. The results of experimental studies demonstrate the effectiveness of the suggested algorithm.
Clusters and Clusters of Clusters in Collisions
NASA Astrophysics Data System (ADS)
Manil, B.; Bernigaud, V.; Boduch, P.; Cassimi, A.; Kamalou, O.; Lenoir, J.; Maunoury, L.; Rangama, J.; Huber, B. A.; Jensen, J.; Schmidt, H. T.; Zettergren, H.; Cederquist, H.; Tomita, S.; Hvelplund, P.; Alvarado, F.; Bari, S.; Lecointre, A.; Schlathölter, T.
2006-11-01
Pure and mixed clusters of fullerenes (C60 and C70) as well as of nucleobases have been produced within a cluster aggregation source and have been multiply ionised in collisions with highly charged ions. Multiply charged clusters and the corresponding appearance sizes have been identified for charge states up to q=5. In the fullerene case, the dominant fragmentation channel leads to the emission of singly charged fullerenes. Furthermore, it is concluded that the fullerene clusters, which in their neutral state are insulators, bound only by weak van der Waals forces, become conducting as soon as being multiply charged. Thus, the excess charges turn out to be delocalised. This phenomenon is explained by a rapid charge transfer to neighboured fullerene molecules well in agreement with predictions of the conducting sphere model. In the case of biomolecular clusters, it is found that the aggregation probability varies strongly for different nucleobases. In some cases the cluster distributions show strong variations due to possible shell effects. In the case of mixed biomolecular clusters a strong enhancement is observed for those clusters containing a so called Watson-Crick pair, for example a dimer of thymine and adenine.
Sanfilippo, Antonio P.; Calapristi, Augustin J.; Crow, Vernon L.; Hetzler, Elizabeth G.; Turner, Alan E.
2004-05-26
We present an approach to the disambiguation of cluster labels that capitalizes on the notion of semantic similarity to assign WordNet senses to cluster labels. The approach provides interesting insights on how document clustering can provide the basis for developing a novel approach to word sense disambiguation.
Segmentation of color images using genetic algorithm with image histogram
NASA Astrophysics Data System (ADS)
Sneha Latha, P.; Kumar, Pawan; Kahu, Samruddhi; Bhurchandi, Kishor M.
2015-02-01
This paper proposes a family of color image segmentation algorithms using genetic approach and color similarity threshold in terns of Just noticeable difference. Instead of segmenting and then optimizing, the proposed technique directly uses GA for optimized segmentation of color images. Application of GA on larger size color images is computationally heavy so they are applied on 4D-color image histogram table. The performance of the proposed algorithms is benchmarked on BSD dataset with color histogram based segmentation and Fuzzy C-means Algorithm using Probabilistic Rand Index (PRI). The proposed algorithms yield better analytical and visual results.
The impact of cluster representatives on the convergence of the k-modes type clustering.
Bai, Liang; Liang, Jiye; Dang, Chuangyin; Cao, Fuyuan
2013-06-01
As a leading partitional clustering technique, k-modes is one of the most computationally efficient clustering methods for categorical data. In the k-modes, a cluster is represented by a "mode," which is composed of the attribute value that occurs most frequently in each attribute domain of the cluster, whereas, in real applications, using only one attribute value in each attribute to represent a cluster may not be adequate as it could in turn affect the accuracy of data analysis. To get rid of this deficiency, several modified clustering algorithms were developed by assigning appropriate weights to several attribute values in each attribute. Although these modified algorithms are quite effective, their convergence proofs are lacking. In this paper, we analyze their convergence property and prove that they cannot guarantee to converge under their optimization frameworks unless they degrade to the original k-modes type algorithms. Furthermore, we propose two different modified algorithms with weighted cluster prototypes to overcome the shortcomings of these existing algorithms. We rigorously derive updating formulas for the proposed algorithms and prove the convergence of the proposed algorithms. The experimental studies show that the proposed algorithms are effective and efficient for large categorical datasets. PMID:23599062
Image segmentation using fuzzy LVQ clustering networks
NASA Technical Reports Server (NTRS)
Tsao, Eric Chen-Kuo; Bezdek, James C.; Pal, Nikhil R.
1992-01-01
In this note we formulate image segmentation as a clustering problem. Feature vectors extracted from a raw image are clustered into subregions, thereby segmenting the image. A fuzzy generalization of a Kohonen learning vector quantization (LVQ) which integrates the Fuzzy c-Means (FCM) model with the learning rate and updating strategies of the LVQ is used for this task. This network, which segments images in an unsupervised manner, is thus related to the FCM optimization problem. Numerical examples on photographic and magnetic resonance images are given to illustrate this approach to image segmentation.
Clustering of financial time series
NASA Astrophysics Data System (ADS)
D'Urso, Pierpaolo; Cappelli, Carmela; Di Lallo, Dario; Massari, Riccardo
2013-05-01
This paper addresses the topic of classifying financial time series in a fuzzy framework proposing two fuzzy clustering models both based on GARCH models. In general clustering of financial time series, due to their peculiar features, needs the definition of suitable distance measures. At this aim, the first fuzzy clustering model exploits the autoregressive representation of GARCH models and employs, in the framework of a partitioning around medoids algorithm, the classical autoregressive metric. The second fuzzy clustering model, also based on partitioning around medoids algorithm, uses the Caiado distance, a Mahalanobis-like distance, based on estimated GARCH parameters and covariances that takes into account the information about the volatility structure of time series. In order to illustrate the merits of the proposed fuzzy approaches an application to the problem of classifying 29 time series of Euro exchange rates against international currencies is presented and discussed, also comparing the fuzzy models with their crisp version.
Clustering of gene expression data: performance and similarity analysis
Yin, Longde; Huang, Chun-Hsi; Ni, Jun
2006-01-01
Background DNA Microarray technology is an innovative methodology in experimental molecular biology, which has produced huge amounts of valuable data in the profile of gene expression. Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. The evaluation of feasible and applicable clustering algorithms is becoming an important issue in today's bioinformatics research. Results In this paper we first experimentally study three major clustering algorithms: Hierarchical Clustering (HC), Self-Organizing Map (SOM), and Self Organizing Tree Algorithm (SOTA) using Yeast Saccharomyces cerevisiae gene expression data, and compare their performance. We then introduce Cluster Diff, a new data mining tool, to conduct the similarity analysis of clusters generated by different algorithms. The performance study shows that SOTA is more efficient than SOM while HC is the least efficient. The results of similarity analysis show that when given a target cluster, the Cluster Diff can efficiently determine the closest match from a set of clusters. Therefore, it is an effective approach for evaluating different clustering algorithms. Conclusion HC methods allow a visual, convenient representation of genes. However, they are neither robust nor efficient. The SOM is more robust against noise. A disadvantage of SOM is that the number of clusters has to be fixed beforehand. The SOTA combines the advantages of both hierarchical and SOM clustering. It allows a visual representation of the clusters and their structure and is not sensitive to noises. The SOTA is also more flexible than the other two clustering methods. By using our data mining tool, Cluster Diff, it is possible to analyze the similarity of clusters generated by different algorithms and thereby enable comparisons of different clustering methods. PMID:17217511
Hierarchical clustering using mutual information
NASA Astrophysics Data System (ADS)
Kraskov, A.; StĂ¶gbauer, H.; Andrzejak, R. G.; Grassberger, P.
2005-04-01
We present a conceptually simple method for hierarchical clustering of data called mutual information clustering (MIC) algorithm. It uses mutual information (MI) as a similarity measure and exploits its grouping property: The MI between three objects X, Y, and Z is equal to the sum of the MI between X and Y, plus the MI between Z and the combined object (XY). We use this both in the Shannon (probabilistic) version of information theory and in the Kolmogorov (algorithmic) version. We apply our method to the construction of phylogenetic trees from mitochondrial DNA sequences and to the output of independent components analysis (ICA) as illustrated with the ECG of a pregnant woman.
Sobel, E.; Lange, K.; O`Connell, J.R.
1996-12-31
Haplotyping is the logical process of inferring gene flow in a pedigree based on phenotyping results at a small number of genetic loci. This paper formalizes the haplotyping problem and suggests four algorithms for haplotype reconstruction. These algorithms range from exhaustive enumeration of all haplotype vectors to combinatorial optimization by simulated annealing. Application of the algorithms to published genetic analyses shows that manual haplotyping is often erroneous. Haplotyping is employed in screening pedigrees for phenotyping errors and in positional cloning of disease genes from conserved haplotypes in population isolates. 26 refs., 6 figs., 3 tabs.
Analyzing geographic clustered response
Merrill, D.W.; Selvin, S.; Mohr, M.S.
1991-08-01
In the study of geographic disease clusters, an alternative to traditional methods based on rates is to analyze case locations on a transformed map in which population density is everywhere equal. Although the analyst's task is thereby simplified, the specification of the density equalizing map projection (DEMP) itself is not simple and continues to be the subject of considerable research. Here a new DEMP algorithm is described, which avoids some of the difficulties of earlier approaches. The new algorithm (a) avoids illegal overlapping of transformed polygons; (b) finds the unique solution that minimizes map distortion; (c) provides constant magnification over each map polygon; (d) defines a continuous transformation over the entire map domain; (e) defines an inverse transformation; (f) can accept optional constraints such as fixed boundaries; and (g) can use commercially supported minimization software. Work is continuing to improve computing efficiency and improve the algorithm. 21 refs., 15 figs., 2 tabs.
The hierarchical algorithms--theory and applications
NASA Astrophysics Data System (ADS)
Su, Zheng-Yao
Monte Carlo simulations are one of the most important numerical techniques for investigating statistical physical systems. Among these systems, spin models are a typical example which also play an essential role in constructing the abstract mechanism for various complex systems. Unfortunately, traditional Monte Carlo algorithms are afflicted with "critical slowing down" near continuous phase transitions and the efficiency of the Monte Carlo simulation goes to zero as the size of the lattice is increased. To combat critical slowing down, a very different type of collective-mode algorithm, in contrast to the traditional single-spin-flipmode, was proposed by Swendsen and Wang in 1987 for Potts spin models. Since then, there has been an explosion of work attempting to understand, improve, or generalize it. In these so-called "cluster" algorithms, clusters of spin are regarded as one template and are updated at each step of the Monte Carlo procedure. In implementing these algorithms the cluster labeling is a major time-consuming bottleneck and is also isomorphic to the problem of computing connected components of an undirected graph seen in other application areas, such as pattern recognition.A number of cluster labeling algorithms for sequential computers have long existed. However, the dynamic irregular nature of clusters complicates the task of finding good parallel algorithms and this is particularly true on SIMD (single-instruction-multiple-data machines. Our design of the Hierarchical Cluster Labeling Algorithm aims at alleviating this problem by building a hierarchical structure on the problem domain and by incorporating local and nonlocal communication schemes. We present an estimate for the computational complexity of cluster labeling and prove the key features of this algorithm (such as lower computational complexity, data locality, and easy implementation) compared with the methods formerly known. In particular, this algorithm can be viewed as a generalized scan scheme applicable to problem domains of any high dimension and of arbitrary geometry (scan is an important primitive of parallel computing). In addition, from implementation results, the hierarchical cluster labeling algorithm has proved to work equally well on MIMD machines, though originally designed for SIMD machines.Based on this success, we further study the hierarchical structure hidden in the algorithm. Hierarchical structure is a conceptual framework frequently used in building models for the study of a great variety of problems. This structure serves not only to describe the complexity of the system at different levels, but also to achieve some goals targeted by the problem, i.e., an algorithm to solve the problem. In this regard, we investigate the similarities and differences between this algorithm and others, including the FFT and the Barnes-Hut method, in terms of their hierarchical structures.
Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing
Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud
2015-01-01
This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309
Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing.
Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud
2015-01-01
This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, "MOPSOSA". The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309
Processing of rock core microtomography images: Using seven different machine learning algorithms
NASA Astrophysics Data System (ADS)
Chauhan, Swarup; RĂĽhaak, Wolfram; Khan, Faisal; Enzmann, Frieder; Mielke, Philipp; Kersten, Michael; Sass, Ingo
2016-01-01
The abilities of machine learning algorithms to process X-ray microtomographic rock images were determined. The study focused on the use of unsupervised, supervised, and ensemble clustering techniques, to segment X-ray computer microtomography rock images and to estimate the pore spaces and pore size diameters in the rocks. The unsupervised k-means technique gave the fastest processing time and the supervised least squares support vector machine technique gave the slowest processing time. Multiphase assemblages of solid phases (minerals and finely grained minerals) and the pore phase were found on visual inspection of the images. In general, the accuracy in terms of porosity values and pore size distribution was found to be strongly affected by the feature vectors selected. Relative porosity average value of 15.92Â±1.77% retrieved from all the seven machine learning algorithm is in very good agreement with the experimental results of 17Â±2%, obtained using gas pycnometer. Of the supervised techniques, the least square support vector machine technique is superior to feed forward artificial neural network because of its ability to identify a generalized pattern. In the ensemble classification techniques boosting technique converged faster compared to bragging technique. The k-means technique outperformed the fuzzy c-means and self-organized maps techniques in terms of accuracy and speed.
Feature Clustering for Accelerating Parallel Coordinate Descent
Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.
2012-12-06
We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.
Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data.
Yu, Zhiwen; Chen, Hantao; You, Jane; Liu, Jiming; Wong, Hau-San; Han, Guoqiang; Li, Le
2015-01-01
Performing clustering analysis is one of the important research topics in cancer discovery using gene expression profiles, which is crucial in facilitating the successful diagnosis and treatment of cancer. While there are quite a number of research works which perform tumor clustering, few of them considers how to incorporate fuzzy theory together with an optimization process into a consensus clustering framework to improve the performance of clustering analysis. In this paper, we first propose a random double clustering based cluster ensemble framework (RDCCE) to perform tumor clustering based on gene expression data. Specifically, RDCCE generates a set of representative features using a randomly selected clustering algorithm in the ensemble, and then assigns samples to their corresponding clusters based on the grouping results. In addition, we also introduce the random double clustering based fuzzy cluster ensemble framework (RDCFCE), which is designed to improve the performance of RDCCE by integrating the newly proposed fuzzy extension model into the ensemble framework. RDCFCE adopts the normalized cut algorithm as the consensus function to summarize the fuzzy matrices generated by the fuzzy extension models, partition the consensus matrix, and obtain the final result. Finally, adaptive RDCFCE (A-RDCFCE) is proposed to optimize RDCFCE and improve the performance of RDCFCE further by adopting a self-evolutionary process (SEPP) for the parameter set. Experiments on real cancer gene expression profiles indicate that RDCFCE and A-RDCFCE works well on these data sets, and outperform most of the state-of-the-art tumor clustering algorithms. PMID:26357330
SMART: Unique Splitting-While-Merging Framework for Gene Clustering
Fa, Rui; Roberts, David J.; Nandi, Asoke K.
2014-01-01
Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms. PMID:24714159
Reactive Collision Avoidance Algorithm
NASA Technical Reports Server (NTRS)
Scharf, Daniel; Acikmese, Behcet; Ploen, Scott; Hadaegh, Fred
2010-01-01
The reactive collision avoidance (RCA) algorithm allows a spacecraft to find a fuel-optimal trajectory for avoiding an arbitrary number of colliding spacecraft in real time while accounting for acceleration limits. In addition to spacecraft, the technology can be used for vehicles that can accelerate in any direction, such as helicopters and submersibles. In contrast to existing, passive algorithms that simultaneously design trajectories for a cluster of vehicles working to achieve a common goal, RCA is implemented onboard spacecraft only when an imminent collision is detected, and then plans a collision avoidance maneuver for only that host vehicle, thus preventing a collision in an off-nominal situation for which passive algorithms cannot. An example scenario for such a situation might be when a spacecraft in the cluster is approaching another one, but enters safe mode and begins to drift. Functionally, the RCA detects colliding spacecraft, plans an evasion trajectory by solving the Evasion Trajectory Problem (ETP), and then recovers after the collision is avoided. A direct optimization approach was used to develop the algorithm so it can run in real time. In this innovation, a parameterized class of avoidance trajectories is specified, and then the optimal trajectory is found by searching over the parameters. The class of trajectories is selected as bang-off-bang as motivated by optimal control theory. That is, an avoiding spacecraft first applies full acceleration in a constant direction, then coasts, and finally applies full acceleration to stop. The parameter optimization problem can be solved offline and stored as a look-up table of values. Using a look-up table allows the algorithm to run in real time. Given a colliding spacecraft, the properties of the collision geometry serve as indices of the look-up table that gives the optimal trajectory. For multiple colliding spacecraft, the set of trajectories that avoid all spacecraft is rapidly searched on-line. The optimal avoidance trajectory is implemented as a receding-horizon model predictive control law. Therefore, at each time step, the optimal avoidance trajectory is found and the first time step of its acceleration is applied. At the next time step of the control computer, the problem is re-solved and the new first time step is again applied. This continual updating allows the RCA algorithm to adapt to a colliding spacecraft that is making erratic course changes.
Convalescing Cluster Configuration Using a Superlative Framework
Sabitha, R.; Karthik, S.
2015-01-01
Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks. PMID:26543895
Clustering for unsupervised fault diagnosis in nuclear turbine shut-down transients
NASA Astrophysics Data System (ADS)
Baraldi, Piero; Di Maio, Francesco; Rigamonti, Marco; Zio, Enrico; Seraoui, Redouane
2015-06-01
Empirical methods for fault diagnosis usually entail a process of supervised training based on a set of examples of signal evolutions "labeled" with the corresponding, known classes of fault. However, in practice, the signals collected during plant operation may be, very often, "unlabeled", i.e., the information on the corresponding type of occurred fault is not available. To cope with this practical situation, in this paper we develop a methodology for the identification of transient signals showing similar characteristics, under the conjecture that operational/faulty transient conditions of the same type lead to similar behavior in the measured signals evolution. The methodology is founded on a feature extraction procedure, which feeds a spectral clustering technique, embedding the unsupervised fuzzy C-means (FCM) algorithm, which evaluates the functional similarity among the different operational/faulty transients. A procedure for validating the plausibility of the obtained clusters is also propounded based on physical considerations. The methodology is applied to a real industrial case, on the basis of 148 shut-down transients of a Nuclear Power Plant (NPP) steam turbine.
The Sloan Nearby Cluster Weak Lensing Survey
Kubo, Jeffrey M.; Annis, James T.; Hardin, Frances Mei; Kubik, Donna; Lawhorn, Kelsey; Lin, Huan; Nicklaus, Liana; Nelson, Dylan; Reis, Ribamar Rondon de Rezende; Seo, Hee-Jong; Soares-Santos, Marcelle; /Fermilab /Inst. Geo. Astron., Havana /Sao Paulo U. /Fermilab
2009-08-01
We describe and present initial results of a weak lensing survey of nearby (z {approx}< 0.1) galaxy clusters in the Sloan Digital Sky Survey (SDSS). In this first study, galaxy clusters are selected from the SDSS spectroscopic galaxy cluster catalogs of Miller et al. and Berlind et al. We report a total of seven individual low-redshift cluster weak lensing measurements that include A2048, A1767, A2244, A1066, A2199, and two clusters specifically identified with the C4 algorithm. Our program of weak lensing of nearby galaxy clusters in the SDSS will eventually reach {approx}200 clusters, making it the largest weak lensing survey of individual galaxy clusters to date.
Effectiveness of environmental cluster analysis in representing regional species diversity.
Trakhtenbrot, Ana; Kadmon, Ronen
2006-08-01
A major challenge of regional conservation planning is the identification of sets of sites that together represent the overall biodiversity of the relevant region. Environmental cluster analysis (ECA) has been proposed as a potential tool for efficient selection of conservation sites, but the consequences of methodological decisions involved in its application have not been tested so far. We evaluated the performance of ECA with respect to two such decisions: the choice of the clustering algorithm (single linkage, complete linkage, unweighted arithmetic average, unweighted centroid, Ward's minimum variance, and the ALOC algorithm) and the weight given to different groups of environmental variables (rainfall, temperature, and lithology). Specifically we tested how these decisions affect the spatial configuration of clusters of sites defined by the ECA, whether and how they affect the effectiveness of the ECA (i.e., its ability to represent regional species diversity), and whether the effectiveness of alternative methods of hierarchical clustering can be predicted a priori based on the cophenetic correlation. We used an extensive database of the flora of Israel to test these questions. Differences in both the clustering algorithm and the weighting regime had considerable effects on the spatial configuration of the ECA clusters. The single-linkage algorithm produced mostly single-cell clusters plus a single large-sized cluster and was therefore found inappropriate for environmental regionalization. The effectiveness of the ECA was also sensitive to changes in the clustering algorithm and the weighting regime. Yet, most combinations of clustering algorithms and weighting regimes performed significantly better in capturing regional biodiversity than random null models. The main deviation was classifications based on Ward's minimum variance algorithm, which performed less well relative to all other algorithms. The two algorithms that showed the highest effectiveness (unweighted average and unweighted centroid clustering) also exhibited the highest values of the cophenetic correlation, suggesting that this index may serve as a potential indicator for the effectiveness of alternative ECA algorithms. PMID:16922225
Clustering Binary Data in the Presence of Masking Variables
ERIC Educational Resources Information Center
Brusco, Michael J.
2004-01-01
A number of important applications require the clustering of binary data sets. Traditional nonhierarchical cluster analysis techniques, such as the popular K-means algorithm, can often be successfully applied to these data sets. However, the presence of masking variables in a data set can impede the ability of the K-means algorithm to recover theâ€¦
Clustering PPI data by combining FA and SHC method
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632
NASA Astrophysics Data System (ADS)
Booth, T. E.; Gubernatis, J. E.
2009-04-01
We present a procedure that in many cases enables the Monte Carlo sampling of states of a large system from the sampling of states of a smaller system. We illustrate this procedure, which we call the sewing algorithm, for sampling states from the transfer matrix of the two-dimensional Ising model.
NASA Technical Reports Server (NTRS)
Barth, Timothy J.; Lomax, Harvard
1987-01-01
The past decade has seen considerable activity in algorithm development for the Navier-Stokes equations. This has resulted in a wide variety of useful new techniques. Some examples for the numerical solution of the Navier-Stokes equations are presented, divided into two parts. One is devoted to the incompressible Navier-Stokes equations, and the other to the compressible form.
Wu, Jianfa; Peng, Dahao; Li, Zhuping; Zhao, Li; Ling, Huanzhang
2015-01-01
To effectively and accurately detect and classify network intrusion data, this paper introduces a general regression neural network (GRNN) based on the artificial immune algorithm with elitist strategies (AIAE). The elitist archive and elitist crossover were combined with the artificial immune algorithm (AIA) to produce the AIAE-GRNN algorithm, with the aim of improving its adaptivity and accuracy. In this paper, the mean square errors (MSEs) were considered the affinity function. The AIAE was used to optimize the smooth factors of the GRNN; then, the optimal smooth factor was solved and substituted into the trained GRNN. Thus, the intrusive data were classified. The paper selected a GRNN that was separately optimized using a genetic algorithm (GA), particle swarm optimization (PSO), and fuzzy C-mean clustering (FCM) to enable a comparison of these approaches. As shown in the results, the AIAE-GRNN achieves a higher classification accuracy than PSO-GRNN, but the running time of AIAE-GRNN is long, which was proved first. FCM and GA-GRNN were eliminated because of their deficiencies in terms of accuracy and convergence. To improve the running speed, the paper adopted principal component analysis (PCA) to reduce the dimensions of the intrusive data. With the reduction in dimensionality, the PCA-AIAE-GRNN decreases in accuracy less and has better convergence than the PCA-PSO-GRNN, and the running speed of the PCA-AIAE-GRNN was relatively improved. The experimental results show that the AIAE-GRNN has a higher robustness and accuracy than the other algorithms considered and can thus be used to classify the intrusive data. PMID:25807466
Wu, Jianfa; Peng, Dahao; Li, Zhuping; Zhao, Li; Ling, Huanzhang
2015-01-01
To effectively and accurately detect and classify network intrusion data, this paper introduces a general regression neural network (GRNN) based on the artificial immune algorithm with elitist strategies (AIAE). The elitist archive and elitist crossover were combined with the artificial immune algorithm (AIA) to produce the AIAE-GRNN algorithm, with the aim of improving its adaptivity and accuracy. In this paper, the mean square errors (MSEs) were considered the affinity function. The AIAE was used to optimize the smooth factors of the GRNN; then, the optimal smooth factor was solved and substituted into the trained GRNN. Thus, the intrusive data were classified. The paper selected a GRNN that was separately optimized using a genetic algorithm (GA), particle swarm optimization (PSO), and fuzzy C-mean clustering (FCM) to enable a comparison of these approaches. As shown in the results, the AIAE-GRNN achieves a higher classification accuracy than PSO-GRNN, but the running time of AIAE-GRNN is long, which was proved first. FCM and GA-GRNN were eliminated because of their deficiencies in terms of accuracy and convergence. To improve the running speed, the paper adopted principal component analysis (PCA) to reduce the dimensions of the intrusive data. With the reduction in dimensionality, the PCA-AIAE-GRNN decreases in accuracy less and has better convergence than the PCA-PSO-GRNN, and the running speed of the PCA-AIAE-GRNN was relatively improved. The experimental results show that the AIAE-GRNN has a higher robustness and accuracy than the other algorithms considered and can thus be used to classify the intrusive data. PMID:25807466
Data comparison algorithms for arms control treaty verification
Bieber, A.M. Jr.
1993-08-01
Arms control treaty verification measures often require comparison of measurements made on treaty-limited items (TLIs) with nominal or representative values in order to verify the nature or identity of the (TLIs) in question. This paper discusses some algorithms for comparing measurements on TLIs, including algorithms based on least-squares fitting techniques and multivariate algorithms based on cluster analysis and Mahalanobis distances.
NASA Technical Reports Server (NTRS)
1999-01-01
Penetrating 25,000 light-years of obscuring dust and myriad stars, NASA's Hubble Space Telescope has provided the clearest view yet of one of the largest young clusters of stars inside our Milky Way galaxy, located less than 100 light-years from the very center of the Galaxy. Having the equivalent mass greater than 10,000 stars like our sun, the monster cluster is ten times larger than typical young star clusters scattered throughout our Milky Way. It is destined to be ripped apart in just a few million years by gravitational tidal forces in the galaxy's core. But in its brief lifetime it shines more brightly than any other star cluster in the Galaxy. Quintuplet Cluster is 4 million years old. It has stars on the verge of blowing up as supernovae. It is the home of the brightest star seen in the galaxy, called the Pistol star. This image was taken in infrared light by Hubble's NICMOS camera in September 1997. The false colors correspond to infrared wavelengths. The galactic center stars are white, the red stars are enshrouded in dust or behind dust, and the blue stars are foreground stars between us and the Milky Way's center. The cluster is hidden from direct view behind black dust clouds in the constellation Sagittarius. If the cluster could be seen from earth it would appear to the naked eye as a 3rd magnitude star, 1/6th of a full moon's diameter apart.
When is Constrained Clustering Beneficial, and Why?
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; Basu, Sugato; Davidson, Ian
2006-01-01
Several researchers have shown that constraints can improve the results of a variety of clustering algorithms. However, there can be a large variation in this improvement, even for a fixed number of constraints for a given data set. We present the first attempt to provide insight into this phenomenon by characterizing two constraint set properties: informativeness and coherence. We show that these measures can help explain why some constraint sets are more beneficial to clustering algorithms than others. Since they can be computed prior to clustering, these measures can aid in deciding which constraints to use in practice.
NASA Astrophysics Data System (ADS)
Krick, Kessica
This proposal is a specific response to the strategic goal of NASA's research program to "discover how the universe works and explore how the universe evolved into its present form." Towards this goal, we propose to mine the Spitzer archive for all observations of galaxy groups and clusters for the purpose of studying galaxy evolution in clusters, contamination rates for Sunyaev Zeldovich cluster surveys, and to provide a database of Spitzer observed clusters to the broader community. Funding from this proposal will go towards two years of support for a Postdoc to do this work. After searching the Spitzer Heritage Archive, we have found 194 unique galaxy groups and clusters that have data from both the Infrared array camera (IRAC; Fazio et al. 2004) at 3.6 - 8 microns and the multiband imaging photometer for Spitzer (MIPS; Rieke et al. 2004) at 24microns. This large sample will add value beyond the individual datasets because it will be a larger sample of IR clusters than ever before and will have sufficient diversity in mass, redshift, and dynamical state to allow us to differentiate amongst the effects of these cluster properties. An infrared sample is important because it is unaffected by dust extinction while at the same time is an excellent measure of both stellar mass (IRAC wavelengths) and star formation rate (MIPS wavelengths). Additionally, IRAC can be used to differentiate star forming galaxies (SFG) from active galactic nuclei (AGN), due to their different spectral shapes in this wavelength regime. Specifically, we intend to identify SFG and AGN in galaxy groups and clusters. Groups and clusters differ from the field because the galaxy densities are higher, there is a large potential well due mainly to the mass of the dark matter, and there is hot X-ray gas (the intracluster medium; ICM). We will examine the impact of these differences in environment on galaxy formation by comparing cluster properties of AGN and SFG to those in the field. Also, we will examine the effect that evolutions of cluster redshift and dynamical state have on SFG and AGN in groups and clusters. In addition to environment, we will study the timescales of chemical enrichment of the ICM, using the SFG and AGN as tracers of processes that can transport metals outside of galaxies. Cosmological parameters can be measured based on observing galaxy clusters as signposts of the growth of structure in the universe. The best way to select a redshift independent sample is to use the SZ effect with mm observations to detect a shift in the cosmic microwave background spectrum as those photons scatter off hot gas in clusters. However, such mm observations are contaminated by the emission of SFG and AGN. We intend to characterize the magnitude of this effect on SZ surveys by understanding the frequency, radial distribution, and redshift distribution of these galaxies in clusters. Lastly, a compiled cluster catalog of all Spitzer observed clusters would be useful to the broader astronomical community. We plan to incorporate ancillary multi-wavelength data, where available, and to both publish our catalog in journals, and work with NED to make the catalog easily accessible in an efficient manner by the community.
Spatial cluster detection using dynamic programming
2012-01-01
Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm. PMID:22443103
Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
2015-01-01
Agglomerative hierarchical clustering becomes infeasible when applied to large datasets due to its O(N2) storage requirements. We present a multi-stage agglomerative hierarchical clustering (MAHC) approach aimed at large datasets of speech segments. The algorithm is based on an iterative divide-and-conquer strategy. The data is first split into independent subsets, each of which is clustered separately. Thus reduces the storage required for sequential implementations, and allows concurrent computation on parallel computing hardware. The resultant clusters are merged and subsequently re-divided into subsets, which are passed to the following iteration. We show that MAHC can match and even surpass the performance of the exact implementation when applied to datasets of speech segments. PMID:26517376
Java implementation of Class Association Rule algorithms
Energy Science and Technology Software Center (ESTSC)
2007-08-30
Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix andmoreÂ Â» a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.Â«Â less
Partition signed social networks via clustering dynamics
NASA Astrophysics Data System (ADS)
Wu, Jianshe; Zhang, Long; Li, Yong; Jiao, Yang
2016-02-01
Inspired by the dynamics phenomenon occurred in social networks, the WJJLGS model is modified to imitate the clustering dynamics of signed social networks. Analyses show that the clustering dynamics of the model can be applied to partition signed social networks. Traditionally, blockmodel is applied to partition signed networks. In this paper, a detailed dynamics-based algorithm for signed social networks (DBAS) is presented. Simulations on several typical real-world and illustrative networks that have been analyzed by the blockmodel verify the correctness of the proposed algorithm. The efficiency of the algorithm is verified on large scale synthetic networks.
DEDICATED FILTER FOR DEFECTS CLUSTERING IN RADIOGRAPHIC IMAGE
Sikora, R.; Swiadek, K.; Chady, T.
2009-03-03
Defect clusters such as linear or clustered porosity are in some cases even more important than single flaws. This paper presents two methods of defect clustering and algorithm for calculation of distances between flaws in digital radiographic image. Dedicated lookup table based filter is used for calculation of distances between objects in the specified range. For defect clustering two functions were developed. First one is based on MMD (Minimum Mean Distance) algorithm. Second one uses hierarchical procedures for clustering defects of various types, shapes and size.
Constrained spectral clustering under a local proximity structure assumption
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri; Xu, Qianjun; des Jardins, Marie
2005-01-01
This work focuses on incorporating pairwise constraints into a spectral clustering algorithm. A new constrained spectral clustering method is proposed, as well as an active constraint acquisition technique and a heuristic for parameter selection. We demonstrate that our constrained spectral clustering method, CSC, works well when the data exhibits what we term local proximity structure.
An Evaluation of Cluster Analytic Approaches to Initial Model Specification.
ERIC Educational Resources Information Center
Bacon, Donald R.
2001-01-01
Evaluated the performance of several alternative cluster analytic approaches to initial model specification using population parameter analyses and a Monte Carlo simulation. Of the six cluster approaches evaluated, the one using the correlations of item correlations as a proximity metric and average linking as a clustering algorithm performed theâ€¦
Donchev, Todor I.; Petrov, Ivan G.
2011-05-31
Described herein is an apparatus and a method for producing atom clusters based on a gas discharge within a hollow cathode. The hollow cathode includes one or more walls. The one or more walls define a sputtering chamber within the hollow cathode and include a material to be sputtered. A hollow anode is positioned at an end of the sputtering chamber, and atom clusters are formed when a gas discharge is generated between the hollow anode and the hollow cathode.
Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods
Zahoránszky, László A; Katona, Gyula Y; Hári, Péter; Málnási-Csizmadia, András; Zweig, Katharina A; Zahoránszky-Köhalmi, Gergely
2009-01-01
Background Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method. Results In this article we provide a general selection scheme, the level independent clustering selection method, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of cohesive clusters. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection. Conclusion Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets. PMID:19840391
Clustering Ensemble Method for Heterogeneous Partitions
NASA Astrophysics Data System (ADS)
Vega-Pons, Sandro; Ruiz-Shulcloper, José
Cluster ensemble is a promising technique for improving the clustering results. An alternative to generate the cluster ensemble is to use different representations of the data and different similarity measures between objects. This way, it is produced a cluster ensemble conformed by heterogeneous partitions obtained with different point of views of the faced problem. This diversity enhances the cluster ensemble but, it restricts the combination process since it makes difficult the use of the original data. In this paper, in order to solve these limitations, we propose a unified representation of the objects taking into account the whole information in the cluster ensemble. This representation allows working with the original data of the problem regardless of the used generation mechanism. Also, this new representation is embedded in the WKF [1] algorithm making a more robust cluster ensemble method. Experimental results with numerical, categorical and mixed datasets show the accuracy of the proposed method.
Learner Typologies Development Using OIndex and Data Mining Based Clustering Techniques
ERIC Educational Resources Information Center
Luan, Jing
2004-01-01
This explorative data mining project used distance based clustering algorithm to study 3 indicators, called OIndex, of student behavioral data and stabilized at a 6-cluster scenario following an exhaustive explorative study of 4, 5, and 6 cluster scenarios produced by K-Means and TwoStep algorithms. Using principles in data mining, the studyâ€¦
Bipartite graph partitioning and data clustering
Zha, Hongyuan; He, Xiaofeng; Ding, Chris; Gu, Ming; Simon, Horst D.
2001-05-07
Many data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, the authors propose a new data clustering method based on partitioning the underlying biopartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. They show that an approximate solution to the minimization problem can be obtained by computing a partial singular value decomposition (SVD) of the associated edge weight matrix of the bipartite graph. They point out the connection of their clustering algorithm to correspondence analysis used in multivariate analysis. They also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, they apply their clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency.
NASA Astrophysics Data System (ADS)
Zhou, Jiayin; Lim, Tuan-Kay; Chong, Vincent
2002-05-01
A knowledge-based fuzzy clustering (KBFC) MRI segmentation algorithm was proposed to obtain accurate tumor segmentation for tumor volume measurement of nasopharyngeal carcinoma (NPC). An initial segmentation was performed on T1 and contrast enhanced T1 MR images using a semi-supervised fuzzy c-means (SFCM) algorithm. Then, three types of anatomic and space knowledge--symmetry, connectivity and cluster center were used for image analysis which contributed the final tumor segmentation. After the segmentation, tumor volume was obtained by multi-planimetry method. Visual and quantitative validations were performed on phantom model and six data volumes of NPC patients, compared with ground truth (GT) and the results acquired using seeds growing (SG) for tumor segmentation. In visual format, KBFC showed better tumor segmentation image than SG. In quantitative segmentation quality estimation, on phantom model, the matching percent (MP) / correspondence ratio (CR) was 94.1-96.4% / 0.888-0.925 for KBFC and 94.1-96.0% / 0.884-0.918 for SG while on patient data volumes, it was 92.1+/- 2.6% / 0.884+/- 0.014 for KBFC and 87.4+/- 4.3% / 0.843+/- 0.041 for SG. In tumor volume measurement, on phantom model, measurement error was 4.2-5.0% for KBFC and 4.8-6.1% for SG while on patient data volumes, it was 6.6+/- 3.5% for KBFC and 8.8+/- 5.4% for SG. Based on these results, KBFC could provide high quality of MRI tumor segmentation for tumor volume measurement of NPC.
Clustering of High Throughput Gene Expression Data
Pirim, Harun; Ek?io?lu, Burak; Perkins, Andy; Yüceer, Çetin
2012-01-01
High throughput biological data need to be processed, analyzed, and interpreted to address problems in life sciences. Bioinformatics, computational biology, and systems biology deal with biological problems using computational methods. Clustering is one of the methods used to gain insight into biological processes, particularly at the genomics level. Clearly, clustering can be used in many areas of biological data analysis. However, this paper presents a review of the current clustering algorithms designed especially for analyzing gene expression data. It is also intended to introduce one of the main problems in bioinformatics - clustering gene expression data - to the operations research community. PMID:23144527
Matlab Cluster Ensemble Toolbox v. 1.0
2009-04-27
This is a Matlab toolbox for investigating the application of clus