A robust fuzzy local information C-Means clustering algorithm.
Krinidis, Stelios; Chatzis, Vassilios
2010-05-01
This paper presents a variation of fuzzy c-means (FCM) algorithm that provides image clustering. The proposed algorithm incorporates the local spatial information and gray level information in a novel fuzzy way. The new algorithm is called fuzzy local information C-Means (FLICM). FLICM can overcome the disadvantages of the known fuzzy c-means algorithms and at the same time enhances the clustering performance. The major characteristic of FLICM is the use of a fuzzy local (both spatial and gray level) similarity measure, aiming to guarantee noise insensitiveness and image detail preservation. Furthermore, the proposed algorithm is fully free of the empirically adjusted parameters (a, ¿(g), ¿(s), etc.) incorporated into all other fuzzy c-means algorithms proposed in the literature. Experiments performed on synthetic and real-world images show that FLICM algorithm is effective and efficient, providing robustness to noisy images.
NASA Astrophysics Data System (ADS)
Liu, Fang
2011-06-01
Image segmentation remains one of the major challenges in image analysis and computer vision. Fuzzy clustering, as a soft segmentation method, has been widely studied and successfully applied in mage clustering and segmentation. The fuzzy c-means (FCM) algorithm is the most popular method used in mage segmentation. However, most clustering algorithms such as the k-means and the FCM clustering algorithms search for the final clusters values based on the predetermined initial centers. The FCM clustering algorithms does not consider the space information of pixels and is sensitive to noise. In the paper, presents a new fuzzy c-means (FCM) algorithm with adaptive evolutionary programming that provides image clustering. The features of this algorithm are: 1) firstly, it need not predetermined initial centers. Evolutionary programming will help FCM search for better center and escape bad centers at local minima. Secondly, the spatial distance and the Euclidean distance is also considered in the FCM clustering. So this algorithm is more robust to the noises. Thirdly, the adaptive evolutionary programming is proposed. The mutation rule is adaptively changed with learning the useful knowledge in the evolving process. Experiment results shows that the new image segmentation algorithm is effective. It is providing robustness to noisy images.
A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters
Wang, Zhihao; Yi, Jing
2016-01-01
For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule n and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result. PMID:28042291
A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters.
Ren, Min; Liu, Peiyu; Wang, Zhihao; Yi, Jing
2016-01-01
For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule [Formula: see text] and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result.
An improved fuzzy c-means clustering algorithm based on shadowed sets and PSO.
Zhang, Jian; Shen, Ling
2014-01-01
To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect.
NASA Astrophysics Data System (ADS)
Wang, Deguang; Han, Baochang; Huang, Ming
Computer forensics is the technology of applying computer technology to access, investigate and analysis the evidence of computer crime. It mainly include the process of determine and obtain digital evidence, analyze and take data, file and submit result. And the data analysis is the key link of computer forensics. As the complexity of real data and the characteristics of fuzzy, evidence analysis has been difficult to obtain the desired results. This paper applies fuzzy c-means clustering algorithm based on particle swarm optimization (FCMP) in computer forensics, and it can be more satisfactory results.
Comments on "A robust fuzzy local information C-means clustering algorithm".
Celik, Turgay; Lee, Hwee Kuan
2013-03-01
In a recent paper, Krinidis and Chatzis proposed a variation of fuzzy c-means algorithm for image clustering. The local spatial and gray-level information are incorporated in a fuzzy way through an energy function. The local minimizers of the designed energy function to obtain the fuzzy membership of each pixel and cluster centers are proposed. In this paper, it is shown that the local minimizers of Krinidis and Chatzis to obtain the fuzzy membership and the cluster centers in an iterative manner are not exclusively solutions for true local minimizers of their designed energy function. Thus, the local minimizers of Krinidis and Chatzis do not converge to the correct local minima of the designed energy function not because of tackling to the local minima, but because of the design of energy function.
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
ERIC Educational Resources Information Center
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
Ma, Li; Li, Yang; Fan, Suohai; Fan, Runzhu
2015-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA). The proposed algorithm combines artificial fish swarm algorithm (AFSA) with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI) are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM).
NASA Astrophysics Data System (ADS)
Abdul-Nasir, Aimi Salihah; Mashor, Mohd Yusoff; Halim, Nurul Hazwani Abd; Mohamed, Zeehaida
2015-05-01
Malaria is a life-threatening parasitic infectious disease that corresponds for nearly one million deaths each year. Due to the requirement of prompt and accurate diagnosis of malaria, the current study has proposed an unsupervised pixel segmentation based on clustering algorithm in order to obtain the fully segmented red blood cells (RBCs) infected with malaria parasites based on the thin blood smear images of P. vivax species. In order to obtain the segmented infected cell, the malaria images are first enhanced by using modified global contrast stretching technique. Then, an unsupervised segmentation technique based on clustering algorithm has been applied on the intensity component of malaria image in order to segment the infected cell from its blood cells background. In this study, cascaded moving k-means (MKM) and fuzzy c-means (FCM) clustering algorithms has been proposed for malaria slide image segmentation. After that, median filter algorithm has been applied to smooth the image as well as to remove any unwanted regions such as small background pixels from the image. Finally, seeded region growing area extraction algorithm has been applied in order to remove large unwanted regions that are still appeared on the image due to their size in which cannot be cleaned by using median filter. The effectiveness of the proposed cascaded MKM and FCM clustering algorithms has been analyzed qualitatively and quantitatively by comparing the proposed cascaded clustering algorithm with MKM and FCM clustering algorithms. Overall, the results indicate that segmentation using the proposed cascaded clustering algorithm has produced the best segmentation performances by achieving acceptable sensitivity as well as high specificity and accuracy values compared to the segmentation results provided by MKM and FCM algorithms.
Ji, Ze-Xuan; Sun, Quan-Sen; Xia, De-Shen
2011-07-01
A modified possibilistic fuzzy c-means clustering algorithm is presented for fuzzy segmentation of magnetic resonance (MR) images that have been corrupted by intensity inhomogeneities and noise. By introducing a novel adaptive method to compute the weights of local spatial in the objective function, the new adaptive fuzzy clustering algorithm is capable of utilizing local contextual information to impose local spatial continuity, thus allowing the suppression of noise and helping to resolve classification ambiguity. To estimate the intensity inhomogeneity, the global intensity is introduced into the coherent local intensity clustering algorithm and takes the local and global intensity information into account. The segmentation target therefore is driven by two forces to smooth the derived optimal bias field and improve the accuracy of the segmentation task. The proposed method has been successfully applied to 3 T, 7 T, synthetic and real MR images with desirable results. Comparisons with other approaches demonstrate the superior performance of the proposed algorithm. Moreover, the proposed algorithm is robust to initialization, thereby allowing fully automatic applications.
NASA Astrophysics Data System (ADS)
Yu, Xuelian; Chen, Qian; Gu, Guohua; Qian, Weixian; Xu, Mengxi
2014-11-01
The integration between polarization and intensity images possessing complementary and discriminative information has emerged as a new and important research area. On the basis of the consideration that the resulting image has different clarity and layering requirement for the target and background, we propose a novel fusion method based on non-subsampled Contourlet transform (NSCT) and fuzzy C-means (FCM) segmentation for IR polarization and light intensity images. First, the polarization characteristic image is derived from fusion of the degree of polarization (DOP) and the angle of polarization (AOP) images using local standard variation and abrupt change degree (ACD) combined criteria. Then, the polarization characteristic image is segmented with FCM algorithm. Meanwhile, the two source images are respectively decomposed by NSCT. The regional energy-weighted and similarity measure are adopted to combine the low-frequency sub-band coefficients of the object. The high-frequency sub-band coefficients of the object boundaries are integrated through the maximum selection rule. In addition, the high-frequency sub-band coefficients of internal objects are integrated by utilizing local variation, matching measure and region feature weighting. The weighted average and maximum rules are employed independently in fusing the low-frequency and high-frequency components of the background. Finally, an inverse NSCT operation is accomplished and the final fused image is obtained. The experimental results illustrate that the proposed IR polarization image fusion algorithm can yield an improved performance in terms of the contrast between artificial target and cluttered background and a more detailed representation of the depicted scene.
Pang, Yachun; Li, Li; Hu, Wenyong; Peng, Yanxia; Liu, Lizhi; Shao, Yuanzhi
2012-01-01
This paper presents a novel two-step approach that incorporates fuzzy c-means (FCMs) clustering and gradient vector flow (GVF) snake algorithm for lesions contour segmentation on breast magnetic resonance imaging (BMRI). Manual delineation of the lesions by expert MR radiologists was taken as a reference standard in evaluating the computerized segmentation approach. The proposed algorithm was also compared with the FCMs clustering based method. With a database of 60 mass-like lesions (22 benign and 38 malignant cases), the proposed method demonstrated sufficiently good segmentation performance. The morphological and texture features were extracted and used to classify the benign and malignant lesions based on the proposed computerized segmentation contour and radiologists' delineation, respectively. Features extracted by the computerized characterization method were employed to differentiate the lesions with an area under the receiver-operating characteristic curve (AUC) of 0.968, in comparison with an AUC of 0.914 based on the features extracted from radiologists' delineation. The proposed method in current study can assist radiologists to delineate and characterize BMRI lesion, such as quantifying morphological and texture features and improving the objectivity and efficiency of BMRI interpretation with a certain clinical value. PMID:22952558
Student academic performance analysis using fuzzy C-means clustering
NASA Astrophysics Data System (ADS)
Rosadi, R.; Akamal; Sudrajat, R.; Kharismawan, B.; Hambali, Y. A.
2017-01-01
Grade Point Average (GPA) is commonly used as an indicator of academic performance. Academic performance evaluations is a basic way to evaluate the progression of student performance, when evaluating student’s academic performance, there are occasion where the student data is grouped especially when the amounts of data is large. Thus, the pattern of data relationship within and among groups can be revealed. Grouping data can be done by using clustering method, where one of the methods is the Fuzzy C-Means algorithm. Furthermore, this algorithm is then applied to a set of student data form the Faculty of Mathematics and Natural Sciences, Padjadjaran University.
Fuzzy C-Means Clustering and Energy Efficient Cluster Head Selection for Cooperative Sensor Network.
Bhatti, Dost Muhammad Saqib; Saeed, Nasir; Nam, Haewoon
2016-09-09
We propose a novel cluster based cooperative spectrum sensing algorithm to save the wastage of energy, in which clusters are formed using fuzzy c-means (FCM) clustering and a cluster head (CH) is selected based on a sensor's location within each cluster, its location with respect to fusion center (FC), its signal-to-noise ratio (SNR) and its residual energy. The sensing information of a single sensor is not reliable enough due to shadowing and fading. To overcome these issues, cooperative spectrum sensing schemes were proposed to take advantage of spatial diversity. For cooperative spectrum sensing, all sensors sense the spectrum and report the sensed energy to FC for the final decision. However, it increases the energy consumption of the network when a large number of sensors need to cooperate; in addition to that, the efficiency of the network is also reduced. The proposed algorithm makes the cluster and selects the CHs such that very little amount of network energy is consumed and the highest efficiency of the network is achieved. Using the proposed algorithm maximum probability of detection under an imperfect channel is accomplished with minimum energy consumption as compared to conventional clustering schemes.
Fuzzy C-Means Clustering and Energy Efficient Cluster Head Selection for Cooperative Sensor Network
Bhatti, Dost Muhammad Saqib; Saeed, Nasir; Nam, Haewoon
2016-01-01
We propose a novel cluster based cooperative spectrum sensing algorithm to save the wastage of energy, in which clusters are formed using fuzzy c-means (FCM) clustering and a cluster head (CH) is selected based on a sensor’s location within each cluster, its location with respect to fusion center (FC), its signal-to-noise ratio (SNR) and its residual energy. The sensing information of a single sensor is not reliable enough due to shadowing and fading. To overcome these issues, cooperative spectrum sensing schemes were proposed to take advantage of spatial diversity. For cooperative spectrum sensing, all sensors sense the spectrum and report the sensed energy to FC for the final decision. However, it increases the energy consumption of the network when a large number of sensors need to cooperate; in addition to that, the efficiency of the network is also reduced. The proposed algorithm makes the cluster and selects the CHs such that very little amount of network energy is consumed and the highest efficiency of the network is achieved. Using the proposed algorithm maximum probability of detection under an imperfect channel is accomplished with minimum energy consumption as compared to conventional clustering schemes. PMID:27618061
Rough set based generalized fuzzy c-means algorithm and quantitative indices.
Maji, Pradipta; Pal, Sankar K
2007-12-01
A generalized hybrid unsupervised learning algorithm, which is termed as rough-fuzzy possibilistic c-means (RFPCM), is proposed in this paper. It comprises a judicious integration of the principles of rough and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in class definition, the membership function of fuzzy sets enables efficient handling of overlapping partitions. It incorporates both probabilistic and possibilistic memberships simultaneously to avoid the problems of noise sensitivity of fuzzy c-means and the coincident clusters of PCM. The concept of crisp lower bound and fuzzy boundary of a class, which is introduced in the RFPCM, enables efficient selection of cluster prototypes. The algorithm is generalized in the sense that all existing variants of c-means algorithms can be derived from the proposed algorithm as a special case. Several quantitative indices are introduced based on rough sets for the evaluation of performance of the proposed c-means algorithm. The effectiveness of the algorithm, along with a comparison with other algorithms, has been demonstrated both qualitatively and quantitatively on a set of real-life data sets.
Malleable Fuzzy Local Median C Means Algorithm for Effective Biomedical Image Segmentation
NASA Astrophysics Data System (ADS)
Rajendran, Arunkumar; Balakrishnan, Nagaraj; Varatharaj, Mithya
2016-12-01
The traditional way of clustering plays an effective role in the field of segmentation which was developed to be more effective and also in the recent development the extraction of contextual information can be processed with ease. This paper presents a modified Fuzzy C-Means (FCM) algorithm that provides the better segmentation in the contour grayscale regions of the biomedical images where effective cluster is needed. Malleable Fuzzy Local Median C-Means (M-FLMCM) is the proposed algorithm, proposed to overcome the disadvantage of the traditional FCM method in which the convergence time requirement is more, lack of ability to remove the noise, and the inability to cluster the contour region such as images. M-FLMCM shows promising results in the experiment with real-world biomedical images. The experiment results, with 96 % accuracy compared to the other algorithms.
Generalized rough fuzzy c-means algorithm for brain MR image segmentation.
Ji, Zexuan; Sun, Quansen; Xia, Yong; Chen, Qiang; Xia, Deshen; Feng, Dagan
2012-11-01
Fuzzy sets and rough sets have been widely used in many clustering algorithms for medical image segmentation, and have recently been combined together to better deal with the uncertainty implied in observed image data. Despite of their wide spread applications, traditional hybrid approaches are sensitive to the empirical weighting parameters and random initialization, and hence may produce less accurate results. In this paper, a novel hybrid clustering approach, namely the generalized rough fuzzy c-means (GRFCM) algorithm is proposed for brain MR image segmentation. In this algorithm, each cluster is characterized by three automatically determined rough-fuzzy regions, and accordingly the membership of each pixel is estimated with respect to the region it locates. The importance of each region is balanced by a weighting parameter, and the bias field in MR images is modeled by a linear combination of orthogonal polynomials. The weighting parameter estimation and bias field correction have been incorporated into the iterative clustering process. Our algorithm has been compared to the existing rough c-means and hybrid clustering algorithms in both synthetic and clinical brain MR images. Experimental results demonstrate that the proposed algorithm is more robust to the initialization, noise, and bias field, and can produce more accurate and reliable segmentations.
Qin, Jiahu; Fu, Weiming; Gao, Huijun; Zheng, Wei Xing
2016-03-03
This paper is concerned with developing a distributed k-means algorithm and a distributed fuzzy c-means algorithm for wireless sensor networks (WSNs) where each node is equipped with sensors. The underlying topology of the WSN is supposed to be strongly connected. The consensus algorithm in multiagent consensus theory is utilized to exchange the measurement information of the sensors in WSN. To obtain a faster convergence speed as well as a higher possibility of having the global optimum, a distributed k-means++ algorithm is first proposed to find the initial centroids before executing the distributed k-means algorithm and the distributed fuzzy c-means algorithm. The proposed distributed k-means algorithm is capable of partitioning the data observed by the nodes into measure-dependent groups which have small in-group and large out-group distances, while the proposed distributed fuzzy c-means algorithm is capable of partitioning the data observed by the nodes into different measure-dependent groups with degrees of membership values ranging from 0 to 1. Simulation results show that the proposed distributed algorithms can achieve almost the same results as that given by the centralized clustering algorithms.
Elazab, Ahmed; AbdulAzeem, Yousry M; Wu, Shiqian; Hu, Qingmao
2016-03-17
Brain tissue segmentation from magnetic resonance (MR) images is an importance task for clinical use. The segmentation process becomes more challenging in the presence of noise, grayscale inhomogeneity, and other image artifacts. In this paper, we propose a robust kernelized local information fuzzy C-means clustering algorithm (RKLIFCM). It incorporates local information into the segmentation process (both grayscale and spatial) for more homogeneous segmentation. In addition, the Gaussian radial basis kernel function is adopted as a distance metric to replace the standard Euclidean distance. The main advantages of the new algorithm are: efficient utilization of local grayscale and spatial information, robustness to noise, ability to preserve image details, free from any parameter initialization, and with high speed as it runs on image histogram. We compared the proposed algorithm with 7 soft clustering algorithms that run on both image histogram and image pixels to segment brain MR images. Experimental results demonstrate that the proposed RKLIFCM algorithm is able to overcome the influence of noise and achieve higher segmentation accuracy with low computational complexity.
Elazab, Ahmed; Wang, Changmiao; Jia, Fucang; Wu, Jianhuang; Li, Guanglin; Hu, Qingmao
2015-01-01
An adaptively regularized kernel-based fuzzy C-means clustering framework is proposed for segmentation of brain magnetic resonance images. The framework can be in the form of three algorithms for the local average grayscale being replaced by the grayscale of the average filter, median filter, and devised weighted images, respectively. The algorithms employ the heterogeneity of grayscales in the neighborhood and exploit this measure for local contextual information and replace the standard Euclidean distance with Gaussian radial basis kernel functions. The main advantages are adaptiveness to local context, enhanced robustness to preserve image details, independence of clustering parameters, and decreased computational costs. The algorithms have been validated against both synthetic and clinical magnetic resonance images with different types and levels of noises and compared with 6 recent soft clustering algorithms. Experimental results show that the proposed algorithms are superior in preserving image details and segmentation accuracy while maintaining a low computational complexity.
Wang, Changmiao; Jia, Fucang; Wu, Jianhuang; Li, Guanglin
2015-01-01
An adaptively regularized kernel-based fuzzy C-means clustering framework is proposed for segmentation of brain magnetic resonance images. The framework can be in the form of three algorithms for the local average grayscale being replaced by the grayscale of the average filter, median filter, and devised weighted images, respectively. The algorithms employ the heterogeneity of grayscales in the neighborhood and exploit this measure for local contextual information and replace the standard Euclidean distance with Gaussian radial basis kernel functions. The main advantages are adaptiveness to local context, enhanced robustness to preserve image details, independence of clustering parameters, and decreased computational costs. The algorithms have been validated against both synthetic and clinical magnetic resonance images with different types and levels of noises and compared with 6 recent soft clustering algorithms. Experimental results show that the proposed algorithms are superior in preserving image details and segmentation accuracy while maintaining a low computational complexity. PMID:26793269
A wavelet relational fuzzy C-means algorithm for 2D gel image segmentation.
Rashwan, Shaheera; Faheem, Mohamed Talaat; Sarhan, Amany; Youssef, Bayumy A B
2013-01-01
One of the most famous algorithms that appeared in the area of image segmentation is the Fuzzy C-Means (FCM) algorithm. This algorithm has been used in many applications such as data analysis, pattern recognition, and image segmentation. It has the advantages of producing high quality segmentation compared to the other available algorithms. Many modifications have been made to the algorithm to improve its segmentation quality. The proposed segmentation algorithm in this paper is based on the Fuzzy C-Means algorithm adding the relational fuzzy notion and the wavelet transform to it so as to enhance its performance especially in the area of 2D gel images. Both proposed modifications aim to minimize the oversegmentation error incurred by previous algorithms. The experimental results of comparing both the Fuzzy C-Means (FCM) and the Wavelet Fuzzy C-Means (WFCM) to the proposed algorithm on real 2D gel images acquired from human leukemias, HL-60 cell lines, and fetal alcohol syndrome (FAS) demonstrate the improvement achieved by the proposed algorithm in overcoming the segmentation error. In addition, we investigate the effect of denoising on the three algorithms. This investigation proves that denoising the 2D gel image before segmentation can improve (in most of the cases) the quality of the segmentation.
SPEQTACLE: An automated generalized fuzzy C-means algorithm for tumor delineation in PET
Lapuyade-Lahorgue, Jérôme; Visvikis, Dimitris; Hatt, Mathieu; Pradier, Olivier; Cheze Le Rest, Catherine
2015-10-15
Purpose: Accurate tumor delineation in positron emission tomography (PET) images is crucial in oncology. Although recent methods achieved good results, there is still room for improvement regarding tumors with complex shapes, low signal-to-noise ratio, and high levels of uptake heterogeneity. Methods: The authors developed and evaluated an original clustering-based method called spatial positron emission quantification of tumor—Automatic Lp-norm estimation (SPEQTACLE), based on the fuzzy C-means (FCM) algorithm with a generalization exploiting a Hilbertian norm to more accurately account for the fuzzy and non-Gaussian distributions of PET images. An automatic and reproducible estimation scheme of the norm on an image-by-image basis was developed. Robustness was assessed by studying the consistency of results obtained on multiple acquisitions of the NEMA phantom on three different scanners with varying acquisition parameters. Accuracy was evaluated using classification errors (CEs) on simulated and clinical images. SPEQTACLE was compared to another FCM implementation, fuzzy local information C-means (FLICM) and fuzzy locally adaptive Bayesian (FLAB). Results: SPEQTACLE demonstrated a level of robustness similar to FLAB (variability of 14% ± 9% vs 14% ± 7%, p = 0.15) and higher than FLICM (45% ± 18%, p < 0.0001), and improved accuracy with lower CE (14% ± 11%) over both FLICM (29% ± 29%) and FLAB (22% ± 20%) on simulated images. Improvement was significant for the more challenging cases with CE of 17% ± 11% for SPEQTACLE vs 28% ± 22% for FLAB (p = 0.009) and 40% ± 35% for FLICM (p < 0.0001). For the clinical cases, SPEQTACLE outperformed FLAB and FLICM (15% ± 6% vs 37% ± 14% and 30% ± 17%, p < 0.004). Conclusions: SPEQTACLE benefitted from the fully automatic estimation of the norm on a case-by-case basis. This promising approach will be extended to multimodal images and multiclass estimation in future developments.
Tsantis, Stavros; Spiliopoulos, Stavros; Karnabatidis, Dimitrios; Skouroliakou, Aikaterini; Hazle, John D.; Kagadis, George C. E-mail: George.Kagadis@med.upatras.gr
2014-07-15
Purpose: Speckle suppression in ultrasound (US) images of various anatomic structures via a novel speckle noise reduction algorithm. Methods: The proposed algorithm employs an enhanced fuzzy c-means (EFCM) clustering and multiresolution wavelet analysis to distinguish edges from speckle noise in US images. The edge detection procedure involves a coarse-to-fine strategy with spatial and interscale constraints so as to classify wavelet local maxima distribution at different frequency bands. As an outcome, an edge map across scales is derived whereas the wavelet coefficients that correspond to speckle are suppressed in the inverse wavelet transform acquiring the denoised US image. Results: A total of 34 thyroid, liver, and breast US examinations were performed on a Logiq 9 US system. Each of these images was subjected to the proposed EFCM algorithm and, for comparison, to commercial speckle reduction imaging (SRI) software and another well-known denoising approach, Pizurica's method. The quantification of the speckle suppression performance in the selected set of US images was carried out via Speckle Suppression Index (SSI) with results of 0.61, 0.71, and 0.73 for EFCM, SRI, and Pizurica's methods, respectively. Peak signal-to-noise ratios of 35.12, 33.95, and 29.78 and edge preservation indices of 0.94, 0.93, and 0.86 were found for the EFCM, SIR, and Pizurica's method, respectively, demonstrating that the proposed method achieves superior speckle reduction performance and edge preservation properties. Based on two independent radiologists’ qualitative evaluation the proposed method significantly improved image characteristics over standard baseline B mode images, and those processed with the Pizurica's method. Furthermore, it yielded results similar to those for SRI for breast and thyroid images significantly better results than SRI for liver imaging, thus improving diagnostic accuracy in both superficial and in-depth structures. Conclusions: A new wavelet
Fuzzy C-means clustering with local information and kernel metric for image segmentation.
Gong, Maoguo; Liang, Yan; Shi, Jiao; Ma, Wenping; Ma, Jingjing
2013-02-01
In this paper, we present an improved fuzzy C-means (FCM) algorithm for image segmentation by introducing a tradeoff weighted fuzzy factor and a kernel metric. The tradeoff weighted fuzzy factor depends on the space distance of all neighboring pixels and their gray-level difference simultaneously. By using this factor, the new algorithm can accurately estimate the damping extent of neighboring pixels. In order to further enhance its robustness to noise and outliers, we introduce a kernel distance measure to its objective function. The new algorithm adaptively determines the kernel parameter by using a fast bandwidth selection rule based on the distance variance of all data points in the collection. Furthermore, the tradeoff weighted fuzzy factor and the kernel distance measure are both parameter free. Experimental results on synthetic and real images show that the new algorithm is effective and efficient, and is relatively independent of this type of noise.
A modified fuzzy C-means algorithm for bias field estimation and segmentation of MRI data.
Ahmed, Mohamed N; Yamany, Sameh M; Mohamed, Nevin; Farag, Aly A; Moriarty, Thomas
2002-03-01
In this paper, we present a novel algorithm for fuzzy segmentation of magnetic resonance imaging (MRI) data and estimation of intensity inhomogeneities using fuzzy logic. MRI intensity inhomogeneities can be attributed to imperfections in the radio-frequency coils or to problems associated with the acquisition sequences. The result is a slowly varying shading artifact over the image that can produce errors with conventional intensity-based classification. Our algorithm is formulated by modifying the objective function of the standard fuzzy c-means (FCM) algorithm to compensate for such inhomogeneities and to allow the labeling of a pixel (voxel) to be influenced by the labels in its immediate neighborhood. The neighborhood effect acts as a regularizer and biases the solution toward piecewise-homogeneous labelings. Such a regularization is useful in segmenting scans corrupted by salt and pepper noise. Experimental results on both synthetic images and MR data are given to demonstrate the effectiveness and efficiency of the proposed algorithm.
A multiple-kernel fuzzy C-means algorithm for image segmentation.
Chen, Long; Chen, C L Philip; Lu, Mingzhu
2011-10-01
In this paper, a generalized multiple-kernel fuzzy C-means (FCM) (MKFCM) methodology is introduced as a framework for image-segmentation problems. In the framework, aside from the fact that the composite kernels are used in the kernel FCM (KFCM), a linear combination of multiple kernels is proposed and the updating rules for the linear coefficients of the composite kernel are derived as well. The proposed MKFCM algorithm provides us a new flexible vehicle to fuse different pixel information in image-segmentation problems. That is, different pixel information represented by different kernels is combined in the kernel space to produce a new kernel. It is shown that two successful enhanced KFCM-based image-segmentation algorithms are special cases of MKFCM. Several new segmentation algorithms are also derived from the proposed MKFCM framework. Simulations on the segmentation of synthetic and medical images demonstrate the flexibility and advantages of MKFCM-based approaches.
Segmentation of pomegranate MR images using spatial fuzzy c-means (SFCM) algorithm
NASA Astrophysics Data System (ADS)
Moradi, Ghobad; Shamsi, Mousa; Sedaaghi, M. H.; Alsharif, M. R.
2011-10-01
Segmentation is one of the fundamental issues of image processing and machine vision. It plays a prominent role in a variety of image processing applications. In this paper, one of the most important applications of image processing in MRI segmentation of pomegranate is explored. Pomegranate is a fruit with pharmacological properties such as being anti-viral and anti-cancer. Having a high quality product in hand would be critical factor in its marketing. The internal quality of the product is comprehensively important in the sorting process. The determination of qualitative features cannot be manually made. Therefore, the segmentation of the internal structures of the fruit needs to be performed as accurately as possible in presence of noise. Fuzzy c-means (FCM) algorithm is noise-sensitive and pixels with noise are classified inversely. As a solution, in this paper, the spatial FCM algorithm in pomegranate MR images' segmentation is proposed. The algorithm is performed with setting the spatial neighborhood information in FCM and modification of fuzzy membership function for each class. The segmentation algorithm results on the original and the corrupted Pomegranate MR images by Gaussian, Salt Pepper and Speckle noises show that the SFCM algorithm operates much more significantly than FCM algorithm. Also, after diverse steps of qualitative and quantitative analysis, we have concluded that the SFCM algorithm with 5×5 window size is better than the other windows.
NASA Astrophysics Data System (ADS)
Kesikoğlu, M. H.; Atasever, Ü. H.; Özkan, C.
2013-10-01
Change detection analyze means that according to observations made in different times, the process of defining the change detection occurring in nature or in the state of any objects or the ability of defining the quantity of temporal effects by using multitemporal data sets. There are lots of change detection techniques met in literature. It is possible to group these techniques under two main topics as supervised and unsupervised change detection. In this study, the aim is to define the land cover changes occurring in specific area of Kayseri with unsupervised change detection techniques by using Landsat satellite images belonging to different years which are obtained by the technique of remote sensing. While that process is being made, image differencing method is going to be applied to the images by following the procedure of image enhancement. After that, the method of Principal Component Analysis is going to be applied to the difference image obtained. To determine the areas that have and don't have changes, the image is grouped as two parts by Fuzzy C-Means Clustering method. For achieving these processes, firstly the process of image to image registration is completed. As a result of this, the images are being referred to each other. After that, gray scale difference image obtained is partitioned into 3 × 3 nonoverlapping blocks. With the method of principal component analysis, eigenvector space is gained and from here, principal components are reached. Finally, feature vector space consisting principal component is partitioned into two clusters using Fuzzy C-Means Clustering and after that change detection process has been done.
Yin, Jiandong; Sun, Hongzan; Yang, Jiawen; Guo, Qiyong
2014-01-01
The arterial input function (AIF) plays a crucial role in the quantification of cerebral perfusion parameters. The traditional method for AIF detection is based on manual operation, which is time-consuming and subjective. Two automatic methods have been reported that are based on two frequently used clustering algorithms: fuzzy c-means (FCM) and K-means. However, it is still not clear which is better for AIF detection. Hence, we compared the performance of these two clustering methods using both simulated and clinical data. The results demonstrate that K-means analysis can yield more accurate and robust AIF results, although it takes longer to execute than the FCM method. We consider that this longer execution time is trivial relative to the total time required for image manipulation in a PACS setting, and is acceptable if an ideal AIF is obtained. Therefore, the K-means method is preferable to FCM in AIF detection.
Self-organization and clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
Wen, Ying; He, Lianghua; von Deneen, Karen M; Lu, Yue
2013-11-01
We present an effective method for brain tissue classification based on diffusion tensor imaging (DTI) data. The method accounts for two main DTI segmentation obstacles: random noise and magnetic field inhomogeneities. In the proposed method, DTI parametric maps were used to resolve intensity inhomogeneities of brain tissue segmentation because they could provide complementary information for tissues and define accurate tissue maps. An improved fuzzy c-means with spatial constraints proposal was used to enhance the noise and artifact robustness of DTI segmentation. Fuzzy c-means clustering with spatial constraints (FCM_S) could effectively segment images corrupted by noise, outliers, and other imaging artifacts. Its effectiveness contributes not only to the introduction of fuzziness for belongingness of each pixel but also to the exploitation of spatial contextual information. We proposed an improved FCM_S applied on DTI parametric maps, which explores the mean and covariance of the feature spatial information for automated segmentation of DTI. The experiments on synthetic images and real-world datasets showed that our proposed algorithms, especially with new spatial constraints, were more effective.
NASA Astrophysics Data System (ADS)
Graf, Norman A.
2001-07-01
An object-oriented framework for undertaking clustering algorithm studies has been developed. We present here the definitions for the abstract Cells and Clusters as well as the interface for the algorithm. We intend to use this framework to investigate the interplay between various clustering algorithms and the resulting jet reconstruction efficiency and energy resolutions to assist in the design of the calorimeter detector.
Ultrametric Hierarchical Clustering Algorithms.
ERIC Educational Resources Information Center
Milligan, Glenn W.
1979-01-01
Johnson has shown that the single linkage and complete linkage hierarchical clustering algorithms induce a metric on the data known as the ultrametric. Johnson's proof is extended to four other common clustering algorithms. Two additional methods also produce hierarchical structures which can violate the ultrametric inequality. (Author/CTM)
Parallel Wolff Cluster Algorithms
NASA Astrophysics Data System (ADS)
Bae, S.; Ko, S. H.; Coddington, P. D.
The Wolff single-cluster algorithm is the most efficient method known for Monte Carlo simulation of many spin models. Due to the irregular size, shape and position of the Wolff clusters, this method does not easily lend itself to efficient parallel implementation, so that simulations using this method have thus far been confined to workstations and vector machines. Here we present two parallel implementations of this algorithm, and show that one gives fairly good performance on a MIMD parallel computer.
Khan, Javed; Malik, Aamir Saeed; Kamel, Nidal; Dass, Sarat Chandra; Affandi, Azura Mohd
2015-01-01
Segmentation is the basic and important step for digital image analysis and understanding. Segmentation of acne lesions in the visual spectrum of light is very challenging due to factors such as varying skin tones due to ethnicity, camera calibration and the lighting conditions. In this approach the color image is transformed into various color spaces. The image is decomposed into the specified number of homogeneous regions based on the similarity of color using fuzzy C-means clustering technique. Features are extracted for each cluster and average values of these features are calculated. A new objective function is defined that selects the cluster holding the lesion pixels based on the average value of cluster features. In this study segmentation results are generated in four color spaces (RGB, rgb, YIQ, I1I2I3) and two individual color components (I3, Q). The number of clusters is varied from 2 to 6. The experiment was carried out on fifty images of acne patients. The performance of the proposed technique is measured in terms of the three mostly used metrics; sensitivity, specificity, and accuracy. Best results were obtained for Q and I3 color components of YIQ and I1I2I3 color spaces with the number of clusters equal to three. These color components show robustness against non-uniform illumination and maximize the gap between the lesion and skin color.
Meena Prakash, R; Shantha Selva Kumari, R
2017-01-01
The Fuzzy C Means (FCM) and Expectation Maximization (EM) algorithms are the most prevalent methods for automatic segmentation of MR brain images into three classes Gray Matter (GM), White Matter (WM) and Cerebrospinal Fluid (CSF). The major difficulties associated with these conventional methods for MR brain image segmentation are the Intensity Non-uniformity (INU) and noise. In this paper, EM and FCM with spatial information and bias correction are proposed to overcome these effects. The spatial information is incorporated by convolving the posterior probability during E-Step of the EM algorithm with mean filter. Also, a method of pixel re-labeling is included to improve the segmentation accuracy. The proposed method is validated by extensive experiments on both simulated and real brain images from standard database. Quantitative and qualitative results depict that the method is superior to the conventional methods by around 25% and over the state-of-the art method by 8%.
Basic cluster compression algorithm
NASA Technical Reports Server (NTRS)
Hilbert, E. E.; Lee, J.
1980-01-01
Feature extraction and data compression of LANDSAT data is accomplished by BCCA program which reduces costs associated with transmitting, storing, distributing, and interpreting multispectral image data. Algorithm uses spatially local clustering to extract features from image data to describe spectral characteristics of data set. Approach requires only simple repetitive computations, and parallel processing can be used for very high data rates. Program is written in FORTRAN IV for batch execution and has been implemented on SEL 32/55.
NASA Astrophysics Data System (ADS)
An, Yu; Liu, Jie; Ye, Jinzuo; Mao, Yamin; Yang, Xin; Jiang, Shixin; Chi, Chongwei; Tian, Jie
2015-03-01
As an important molecular imaging modality, fluorescence molecular imaging (FMI) has the advantages of high sensitivity, low cost and ease of use. By labeling the regions of interest with fluorophore, FMI can noninvasively obtain the distribution of fluorophore in-vivo. However, due to the fact that the spectrum of fluorescence is in the section of the visible light range, there are mass of autofluorescence on the surface of the bio-tissues, which is a major disturbing factor in FMI. Meanwhile, the high-level of dark current for charge-coupled device (CCD) camera and other influencing factor can also produce a lot of background noise. In this paper, a novel method for image denoising of FMI based on fuzzy C-Means clustering (FCM) is proposed, because the fluorescent signal is the major component of the fluorescence images, and the intensity of autofluorescence and other background signals is relatively lower than the fluorescence signal. First, the fluorescence image is smoothed by sliding-neighborhood operations to initially eliminate the noise. Then, the wavelet transform (WLT) is performed on the fluorescence images to obtain the major component of the fluorescent signals. After that, the FCM method is adopt to separate the major component and background of the fluorescence images. Finally, the proposed method was validated using the original data obtained by in vivo implanted fluorophore experiment, and the results show that our proposed method can effectively obtain the fluorescence signal while eliminate the background noise, which could increase the quality of fluorescence images.
NASA Astrophysics Data System (ADS)
Ghaffarian, Saman; Gökaşar, Ilgın
2016-01-01
This study presents an approach for the automatic detection of vehicles using very high-resolution images and road vector data. Initially, road vector data and aerial images are integrated to extract road regions. Then, the extracted road/street region is clustered using an automatic histogram-based fuzzy C-means algorithm, and edge pixels are detected using the Canny edge detector. In order to automatically detect vehicles, we developed a local perceptual grouping approach based on fusion of edge detection and clustering outputs. To provide the locality, an ellipse is generated using characteristics of the candidate clusters individually. Then, ratio of edge pixels to nonedge pixels in the corresponding ellipse is computed to distinguish the vehicles. Finally, a point-merging rule is conducted to merge the points that satisfy a predefined threshold and are supposed to denote the same vehicles. The experimental validation of the proposed method was carried out on six very high-resolution aerial images that illustrate two highways, two shadowed roads, a crowded narrow street, and a street in a dense urban area with crowded parked vehicles. The evaluation of the results shows that our proposed method performed 86% and 83% in overall correctness and completeness, respectively.
Keller, Brad M.; Nathan, Diane L.; Wang Yan; Zheng Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina
2012-08-15
Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') and vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then
NASA Astrophysics Data System (ADS)
Moulik, P.; Tiampo, K. F.
2009-05-01
The information on three-dimensional geometry as well as the identification of active fault segments is critical to our assessment of seismic risks. Numerical modeling of the aftershock locations, times and magnitudes are also crucial to characterize a fault zone. In this study, a pattern recognition technique based on the Fuzzy C- means clustering algorithm (Bezdek, 1981) is proposed to allow each earthquake to be associated with different fault segments. The spatial covariance tensor for each cluster and the associated earthquakes are used to find optimal anisotropic clusters and designate them as faults, similar to the OADC method (Ouillon et al., 2008). The location, size and orientation of the reconstructed faults segments are characterized using a fuzzy covariance matrix (Gustafson and Kessel, 1978). The output consists of a set of distinct fault segments along with the associated earthquakes at different fuzzy membership grades (Zadeh, 1965). A resultant matrix consists of the fuzzy membership grade for different earthquakes and corresponding faults segments specifying their degree of association with values from zero to one. The spatial distribution of earthquakes of different magnitudes and membership grades for a fault segment is incorporated in an anisotropic spatial kernel which characterizes the aftershock density at a distance vector in the ETAS model (Kagan and Knopoff, 1987; Ogata, 1988). An optimal spatio-temporal distribution of aftershocks is obtained for each fault segment without considering a priori distributions such as Gaussian or power law (Helmstetter et al., 2006; Helmstetter and Sornette, 2002). The model is tested on the aftershock sequence from the Denali, 2002 earthquake in Alaska and the fault reconstruction results compared with the known faults in the area. Therefore, a new method to incorporate the anisotropic nature of aftershock diffusion along with the reconstruction of fault networks from seismicity catalogs is formulated in
NASA Astrophysics Data System (ADS)
Brandl, Miriam B.; Beck, Dominik; Pham, Tuan D.
2011-06-01
The high dimensionality of image-based dataset can be a drawback for classification accuracy. In this study, we propose the application of fuzzy c-means clustering, cluster validity indices and the notation of a joint-feature-clustering matrix to find redundancies of image-features. The introduced matrix indicates how frequently features are grouped in a mutual cluster. The resulting information can be used to find data-derived feature prototypes with a common biological meaning, reduce data storage as well as computation times and improve the classification accuracy.
Abdulbaqi, Hayder Saad; Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin; Abood, Loay Kadom
2015-04-24
Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.
NASA Astrophysics Data System (ADS)
Abdulbaqi, Hayder Saad; Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin; Abood, Loay Kadom
2015-04-01
Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.
CLAG: an unsupervised non hierarchical clustering algorithm handling biological data
2012-01-01
Background Searching for similarities in a set of biological data is intrinsically difficult due to possible data points that should not be clustered, or that should group within several clusters. Under these hypotheses, hierarchical agglomerative clustering is not appropriate. Moreover, if the dataset is not known enough, like often is the case, supervised classification is not appropriate either. Results CLAG (for CLusters AGgregation) is an unsupervised non hierarchical clustering algorithm designed to cluster a large variety of biological data and to provide a clustered matrix and numerical values indicating cluster strength. CLAG clusterizes correlation matrices for residues in protein families, gene-expression and miRNA data related to various cancer types, sets of species described by multidimensional vectors of characters, binary matrices. It does not ask to all data points to cluster and it converges yielding the same result at each run. Its simplicity and speed allows it to run on reasonably large datasets. Conclusions CLAG can be used to investigate the cluster structure present in biological datasets and to identify its underlying graph. It showed to be more informative and accurate than several known clustering methods, as hierarchical agglomerative clustering, k-means, fuzzy c-means, model-based clustering, affinity propagation clustering, and not to suffer of the convergence problem proper to this latter. PMID:23216858
Chang, Yeun-Chung; Huang, Yan-Hao; Huang, Chiun-Sheng; Chang, Pei-Kang; Chen, Jeon-Hor; Chang, Ruey-Feng
2012-04-01
The purpose of this study is to evaluate the diagnostic efficacy of the representative characteristic kinetic curve of dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) extracted by fuzzy c-means (FCM) clustering for the discrimination of benign and malignant breast tumors using a novel computer-aided diagnosis (CAD) system. About the research data set, DCE-MRIs of 132 solid breast masses with definite histopathologic diagnosis (63 benign and 69 malignant) were used in this study. At first, the tumor region was automatically segmented using the region growing method based on the integrated color map formed by the combination of kinetic and area under curve color map. Then, the FCM clustering was used to identify the time-signal curve with the larger initial enhancement inside the segmented region as the representative kinetic curve, and then the parameters of the Tofts pharmacokinetic model for the representative kinetic curve were compared with conventional curve analysis (maximal enhancement, time to peak, uptake rate and washout rate) for each mass. The results were analyzed with a receiver operating characteristic curve and Student's t test to evaluate the classification performance. Accuracy, sensitivity, specificity, positive predictive value and negative predictive value of the combined model-based parameters of the extracted kinetic curve from FCM clustering were 86.36% (114/132), 85.51% (59/69), 87.30% (55/63), 88.06% (59/67) and 84.62% (55/65), better than those from a conventional curve analysis. The A(Z) value was 0.9154 for Tofts model-based parametric features, better than that for conventional curve analysis (0.8673), for discriminating malignant and benign lesions. In conclusion, model-based analysis of the characteristic kinetic curve of breast mass derived from FCM clustering provides effective lesion classification. This approach has potential in the development of a CAD system for DCE breast MRI.
Introduction to Cluster Monte Carlo Algorithms
NASA Astrophysics Data System (ADS)
Luijten, E.
This chapter provides an introduction to cluster Monte Carlo algorithms for classical statistical-mechanical systems. A brief review of the conventional Metropolis algorithm is given, followed by a detailed discussion of the lattice cluster algorithm developed by Swendsen and Wang and the single-cluster variant introduced by Wolff. For continuum systems, the geometric cluster algorithm of Dress and Krauth is described. It is shown how their geometric approach can be generalized to incorporate particle interactions beyond hardcore repulsions, thus forging a connection between the lattice and continuum approaches. Several illustrative examples are discussed.
NASA Astrophysics Data System (ADS)
Castro, Marcelo A.; Thomasson, David; Avila, Nilo A.; Hufton, Jennifer; Senseney, Justin; Johnson, Reed F.; Dyall, Julie
2013-03-01
Monkeypox virus is an emerging zoonotic pathogen that results in up to 10% mortality in humans. Knowledge of clinical manifestations and temporal progression of monkeypox disease is limited to data collected from rare outbreaks in remote regions of Central and West Africa. Clinical observations show that monkeypox infection resembles variola infection. Given the limited capability to study monkeypox disease in humans, characterization of the disease in animal models is required. A previous work focused on the identification of inflammatory patterns using PET/CT image modality in two non-human primates previously inoculated with the virus. In this work we extended techniques used in computer-aided detection of lung tumors to identify inflammatory lesions from monkeypox virus infection and their progression using CT images. Accurate estimation of partial volumes of lung lesions via segmentation is difficult because of poor discrimination between blood vessels, diseased regions, and outer structures. We used hard C-means algorithm in conjunction with landmark based registration to estimate the extent of monkeypox virus induced disease before inoculation and after disease progression. Automated estimation is in close agreement with manual segmentation.
Farjam, Reza; Tsien, Christina I.; Lawrence, Theodore S.; Cao, Yue
2014-01-15
Purpose: To develop a pharmacokinetic modelfree framework to analyze the dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data for assessment of response of brain metastases to radiation therapy. Methods: Twenty patients with 45 analyzable brain metastases had MRI scans prior to whole brain radiation therapy (WBRT) and at the end of the 2-week therapy. The volumetric DCE images covering the whole brain were acquired on a 3T scanner with approximately 5 s temporal resolution and a total scan time of about 3 min. DCE curves from all voxels of the 45 brain metastases were normalized and then temporally aligned. A DCE matrix that is constructed from the aligned DCE curves of all voxels of the 45 lesions obtained prior to WBRT is processed by principal component analysis to generate the principal components (PCs). Then, the projection coefficient maps prior to and at the end of WBRT are created for each lesion. Next, a pattern recognition technique, based upon fuzzy-c-means clustering, is used to delineate the tumor subvolumes relating to the value of the significant projection coefficients. The relationship between changes in different tumor subvolumes and treatment response was evaluated to differentiate responsive from stable and progressive tumors. Performance of the PC-defined tumor subvolume was also evaluated by receiver operating characteristic (ROC) analysis in prediction of nonresponsive lesions and compared with physiological-defined tumor subvolumes. Results: The projection coefficient maps of the first three PCs contain almost all response-related information in DCE curves of brain metastases. The first projection coefficient, related to the area under DCE curves, is the major component to determine response while the third one has a complimentary role. In ROC analysis, the area under curve of 0.88 ± 0.05 and 0.86 ± 0.06 were achieved for the PC-defined and physiological-defined tumor subvolume in response assessment. Conclusions: The PC
Online clustering algorithms for radar emitter classification.
Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max
2005-08-01
Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.
Optimal Hops-Based Adaptive Clustering Algorithm
NASA Astrophysics Data System (ADS)
Xuan, Xin; Chen, Jian; Zhen, Shanshan; Kuo, Yonghong
This paper proposes an optimal hops-based adaptive clustering algorithm (OHACA). The algorithm sets an energy selection threshold before the cluster forms so that the nodes with less energy are more likely to go to sleep immediately. In setup phase, OHACA introduces an adaptive mechanism to adjust cluster head and load balance. And the optimal distance theory is applied to discover the practical optimal routing path to minimize the total energy for transmission. Simulation results show that OHACA prolongs the life of network, improves utilizing rate and transmits more data because of energy balance.
An algorithm for spatial heirarchy clustering
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Velasco, F. R. D.
1981-01-01
A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.
Cluster hybrid Monte Carlo simulation algorithms.
Plascak, J A; Ferrenberg, Alan M; Landau, D P
2002-06-01
We show that addition of Metropolis single spin flips to the Wolff cluster-flipping Monte Carlo procedure leads to a dramatic increase in performance for the spin-1/2 Ising model. We also show that adding Wolff cluster flipping to the Metropolis or heat bath algorithms in systems where just cluster flipping is not immediately obvious (such as the spin-3/2 Ising model) can substantially reduce the statistical errors of the simulations. A further advantage of these methods is that systematic errors introduced by the use of imperfect random-number generation may be largely healed by hybridizing single spin flips with cluster flipping.
Cluster hybrid Monte Carlo simulation algorithms
NASA Astrophysics Data System (ADS)
Plascak, J. A.; Ferrenberg, Alan M.; Landau, D. P.
2002-06-01
We show that addition of Metropolis single spin flips to the Wolff cluster-flipping Monte Carlo procedure leads to a dramatic increase in performance for the spin-1/2 Ising model. We also show that adding Wolff cluster flipping to the Metropolis or heat bath algorithms in systems where just cluster flipping is not immediately obvious (such as the spin-3/2 Ising model) can substantially reduce the statistical errors of the simulations. A further advantage of these methods is that systematic errors introduced by the use of imperfect random-number generation may be largely healed by hybridizing single spin flips with cluster flipping.
Performance Comparison Of Evolutionary Algorithms For Image Clustering
NASA Astrophysics Data System (ADS)
Civicioglu, P.; Atasever, U. H.; Ozkan, C.; Besdok, E.; Karkinli, A. E.; Kesikoglu, A.
2014-09-01
Evolutionary computation tools are able to process real valued numerical sets in order to extract suboptimal solution of designed problem. Data clustering algorithms have been intensively used for image segmentation in remote sensing applications. Despite of wide usage of evolutionary algorithms on data clustering, their clustering performances have been scarcely studied by using clustering validation indexes. In this paper, the recently proposed evolutionary algorithms (i.e., Artificial Bee Colony Algorithm (ABC), Gravitational Search Algorithm (GSA), Cuckoo Search Algorithm (CS), Adaptive Differential Evolution Algorithm (JADE), Differential Search Algorithm (DSA) and Backtracking Search Optimization Algorithm (BSA)) and some classical image clustering techniques (i.e., k-means, fcm, som networks) have been used to cluster images and their performances have been compared by using four clustering validation indexes. Experimental test results exposed that evolutionary algorithms give more reliable cluster-centers than classical clustering techniques, but their convergence time is quite long.
Parallel Clustering Algorithms for Structured AMR
Gunney, B T; Wissink, A M; Hysom, D A
2005-10-26
We compare several different parallel implementation approaches for the clustering operations performed during adaptive gridding operations in patch-based structured adaptive mesh refinement (SAMR) applications. Specifically, we target the clustering algorithm of Berger and Rigoutsos (BR91), which is commonly used in many SAMR applications. The baseline for comparison is a simplistic parallel extension of the original algorithm that works well for up to O(10{sup 2}) processors. Our goal is a clustering algorithm for machines of up to O(10{sup 5}) processors, such as the 64K-processor IBM BlueGene/Light system. We first present an algorithm that avoids the unneeded communications of the simplistic approach to improve the clustering speed by up to an order of magnitude. We then present a new task-parallel implementation to further reduce communication wait time, adding another order of magnitude of improvement. The new algorithms also exhibit more favorable scaling behavior for our test problems. Performance is evaluated on a number of large scale parallel computer systems, including a 16K-processor BlueGene/Light system.
Improved fuzzy clustering algorithms in segmentation of DC-enhanced breast MRI.
Kannan, S R; Ramathilagam, S; Devi, Pandiyarajan; Sathya, A
2012-02-01
Segmentation of medical images is a difficult and challenging problem due to poor image contrast and artifacts that result in missing or diffuse organ/tissue boundaries. Many researchers have applied various techniques however fuzzy c-means (FCM) based algorithms is more effective compared to other methods. The objective of this work is to develop some robust fuzzy clustering segmentation systems for effective segmentation of DCE - breast MRI. This paper obtains the robust fuzzy clustering algorithms by incorporating kernel methods, penalty terms, tolerance of the neighborhood attraction, additional entropy term and fuzzy parameters. The initial centers are obtained using initialization algorithm to reduce the computation complexity and running time of proposed algorithms. Experimental works on breast images show that the proposed algorithms are effective to improve the similarity measurement, to handle large amount of noise, to have better results in dealing the data corrupted by noise, and other artifacts. The clustering results of proposed methods are validated using Silhouette Method.
Cluster Algorithm Special Purpose Processor
NASA Astrophysics Data System (ADS)
Talapov, A. L.; Shchur, L. N.; Andreichenko, V. B.; Dotsenko, Vl. S.
We describe a Special Purpose Processor, realizing the Wolff algorithm in hardware, which is fast enough to study the critical behaviour of 2D Ising-like systems containing more than one million spins. The processor has been checked to produce correct results for a pure Ising model and for Ising model with random bonds. Its data also agree with the Nishimori exact results for spin glass. Only minor changes of the SPP design are necessary to increase the dimensionality and to take into account more complex systems such as Potts models.
A hierarchical clustering algorithm for MIMD architecture.
Du, Zhihua; Lin, Feng
2004-12-01
Hierarchical clustering is the most often used method for grouping similar patterns of gene expression data. A fundamental problem with existing implementations of this clustering method is the inability to handle large data sets within a reasonable time and memory resources. We propose a parallelized algorithm of hierarchical clustering to solve this problem. Our implementation on a multiple instruction multiple data (MIMD) architecture shows considerable reduction in computational time and inter-node communication overhead, especially for large data sets. We use the standard message passing library, message passing interface (MPI) for any MIMD systems.
Ergen, Burhan
2014-01-01
This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT) and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM) clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases. PMID:24790590
Noise-enhanced clustering and competitive learning algorithms.
Osoba, Osonde; Kosko, Bart
2013-01-01
Noise can provably speed up convergence in many centroid-based clustering algorithms. This includes the popular k-means clustering algorithm. The clustering noise benefit follows from the general noise benefit for the expectation-maximization algorithm because many clustering algorithms are special cases of the expectation-maximization algorithm. Simulations show that noise also speeds up convergence in stochastic unsupervised competitive learning, supervised competitive learning, and differential competitive learning.
Spectral clustering algorithms for ultrasound image segmentation.
Archip, Neculai; Rohling, Robert; Cooperberg, Peter; Tahmasebpour, Hamid; Warfield, Simon K
2005-01-01
Image segmentation algorithms derived from spectral clustering analysis rely on the eigenvectors of the Laplacian of a weighted graph obtained from the image. The NCut criterion was previously used for image segmentation in supervised manner. We derive a new strategy for unsupervised image segmentation. This article describes an initial investigation to determine the suitability of such segmentation techniques for ultrasound images. The extension of the NCut technique to the unsupervised clustering is first described. The novel segmentation algorithm is then performed on simulated ultrasound images. Tests are also performed on abdominal and fetal images with the segmentation results compared to manual segmentation. Comparisons with the classical NCut algorithm are also presented. Finally, segmentation results on other types of medical images are shown.
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various conditions were collected in conjunction with JSC postural control studies using a Tilt-Translation Device (TTD). The University of West Florida proposed applying the Fuzzy C-Means Clustering (FCM) Algorithms to this data with a view towards identifying various states and stages. Data supplied by NASA/JSC were submitted to the FCM algorithms in an attempt to identify and characterize cluster substructure in a mixed ensemble of pre- and post-adaptational TTD data. Following several unsuccessful trials with FCM using a full 11 dimensional data set, a set of two channels (features) were found to enable FCM to separate pre- from post-adaptational TTD data. The main conclusions are that: (1) FCM seems able to separate pre- from post-TTD subject no. 2 on the one trial that was used, but only in certain subintervals of time; and (2) Channels 2 (right rear transducer force) and 8 (hip sway bar) contain better discrimination information than other supersets and combinations of the data that were tried so far.
Cluster compression algorithm: A joint clustering/data compression concept
NASA Technical Reports Server (NTRS)
Hilbert, E. E.
1977-01-01
The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.
Chaotic map clustering algorithm for EEG analysis
NASA Astrophysics Data System (ADS)
Bellotti, R.; De Carlo, F.; Stramaglia, S.
2004-03-01
The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
Dynamic exponents for potts model cluster algorithms
NASA Astrophysics Data System (ADS)
Coddington, Paul D.; Baillie, Clive F.
We have studied the Swendsen-Wang and Wolff cluster update algorithms for the Ising model in 2, 3 and 4 dimensions. The data indicate simple relations between the specific heat and the Wolff autocorrelations, and between the magnetization and the Swendsen-Wang autocorrelations. This implies that the dynamic critical exponents are related to the static exponents of the Ising model. We also investigate the possibility of similar relationships for the Q-state Potts model.
First Cluster Algorithm Special Purpose Processor
NASA Astrophysics Data System (ADS)
Talapov, A. L.; Andreichenko, V. B.; Dotsenko S., Vi.; Shchur, L. N.
We describe the architecture of the special purpose processor built to realize in hardware cluster Wolff algorithm, which is not hampered by a critical slowing down. The processor simulates two-dimensional Ising-like spin systems. With minor changes the same very effective architecture, which can be defined as a Memory Machine, can be used to study phase transitions in a wide range of models in two or three dimensions.
Dimensionality Reduction Particle Swarm Algorithm for High Dimensional Clustering
Cui, Xiaohui; ST Charles, Jesse Lee; Potok, Thomas E; Beaver, Justin M
2008-01-01
The Particle Swarm Optimization (PSO) clustering algorithm can generate more compact clustering results than the traditional K-means clustering algorithm. However, when clustering high dimensional datasets, the PSO clustering algorithm is notoriously slow because its computation cost increases exponentially with the size of the dataset dimension. Dimensionality reduction techniques offer solutions that both significantly improve the computation time, and yield reasonably accurate clustering results in high dimensional data analysis. In this paper, we introduce research that combines different dimensionality reduction techniques with the PSO clustering algorithm in order to reduce the complexity of high dimensional datasets and speed up the PSO clustering process. We report significant improvements in total runtime. Moreover, the clustering accuracy of the dimensionality reduction PSO clustering algorithm is comparable to the one that uses full dimension space.
Accelerating Fuzzy-C Means Using an Estimated Subsample Size
Parker, Jonathon K.; Hall, Lawrence O.
2015-01-01
Many algorithms designed to accelerate the Fuzzy c-Means (FCM) clustering algorithm randomly sample the data. Typically, no statistical method is used to estimate the subsample size, despite the impact subsample sizes have on speed and quality. This paper introduces two new accelerated algorithms, GOFCM and MSERFCM, that use a statistical method to estimate the subsample size. GOFCM, a variant of SPFCM, also leverages progressive sampling. MSERFCM, a variant of rseFCM, gains a speedup from improved initialization. A general, novel stopping criterion for accelerated clustering is introduced. The new algorithms are compared to FCM and four accelerated variants of FCM. GOFCM's speedup was 4-47 times that of FCM and faster than SPFCM on each of the six datasets used in experiments. For five of the datasets, partitions were within 1% of those of FCM. MSERFCM's speedup was 5-26 times that of FCM and produced partitions within 3% of those of FCM on all datasets. A unique dataset, consisting of plankton images, exposed the strengths and weaknesses of many of the algorithms tested. It is shown that the new stopping criterion is effective in speeding up algorithms such as SPFCM and the final partitions are very close to those of FCM. PMID:26617455
Evaluation of Hierarchical Clustering Algorithms for Document Datasets
2002-06-03
new class of agglomerative algorithms, in which we introduced intermediate clusters obtained by partitional clustering algorithms to constrain the space ...of the corresponding clusters. The various clustering algorithms that are described in this paper use the vector- space model [26] to represent each...document. In this model, each document d is considered to be a vector in the term- space . In particular, we employed the t f id f term weighting model
Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters
Tellaroli, Paola; Bazzi, Marco; Donato, Michele; Brazzale, Alessandra R.; Drăghici, Sorin
2016-01-01
Four of the most common limitations of the many available clustering methods are: i) the lack of a proper strategy to deal with outliers; ii) the need for a good a priori estimate of the number of clusters to obtain reasonable results; iii) the lack of a method able to detect when partitioning of a specific data set is not appropriate; and iv) the dependence of the result on the initialization. Here we propose Cross-clustering (CC), a partial clustering algorithm that overcomes these four limitations by combining the principles of two well established hierarchical clustering algorithms: Ward’s minimum variance and Complete-linkage. We validated CC by comparing it with a number of existing clustering methods, including Ward’s and Complete-linkage. We show on both simulated and real datasets, that CC performs better than the other methods in terms of: the identification of the correct number of clusters, the identification of outliers, and the determination of real cluster memberships. We used CC to cluster samples in order to identify disease subtypes, and on gene profiles, in order to determine groups of genes with the same behavior. Results obtained on a non-biological dataset show that the method is general enough to be successfully used in such diverse applications. The algorithm has been implemented in the statistical language R and is freely available from the CRAN contributed packages repository. PMID:27015427
Improved Ant Colony Clustering Algorithm and Its Performance Study
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Improved Ant Colony Clustering Algorithm and Its Performance Study.
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering.
NASA Astrophysics Data System (ADS)
Dekkers, M. J.; Heslop, D.; Herrero-Bervera, E.; Acton, G.; Krasa, D.
2014-12-01
Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6.44.1' N, 91.56.1' W) on the Cocos Plate occurs in 15.2 Ma oceanic crust generated by superfast seafloor spreading. Presently, it is the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Here we interpret down-hole trends in several rock-magnetic parameters with fuzzy c-means cluster analysis, a multivariate statistical technique. The parameters include the magnetization ratio, the coercivity ratio, the coercive force, the low-field susceptibility, and the Curie temperature. By their combined, multivariate, analysis the effects of magmatic and hydrothermal processes can be evaluated. The optimal number of clusters - a key point in the analysis because there is no a priori information on this - was determined through a combination of approaches: by calculation of several cluster validity indices, by testing for coherent cluster distributions on non-linear-map plots, and importantly by testing for stability of the cluster solution from all possible starting points. Here, we consider a solution robust if the cluster allocation is independent of the starting configuration. The five-cluster solution appeared to be robust. Three clusters are distinguished in the extrusive segment of the Hole that express increasing hydrothermal alteration of the lavas. The sheeted dike and gabbro portions are characterized by two clusters, both with higher coercivities than in lava samples. Extensive alteration, however, can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. All clusters display rock magnetic characteristics in line with a stable NRM. This implies that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Determination of the absolute paleointensity with thermal techniques is
Applying fuzzy clustering optimization algorithm to extracting traffic spatial pattern
NASA Astrophysics Data System (ADS)
Hu, Chunchun; Shi, Wenzhong; Meng, Lingkui; Liu, Min
2009-10-01
Traditional analytical methods for traffic information can't meet to need of intelligent traffic system. Mining value-add information can deal with more traffic problems. The paper exploits a new clustering optimization algorithm to extract useful spatial clustered pattern for predicting long-term traffic flow from macroscopic view. Considering the sensitivity of initial parameters and easy falling into local extreme in FCM algorithm, the new algorithm applies Particle Swarm Optimization method, which can discovery the globe optimal result, to the FCM algorithm. And the algorithm exploits the union of the clustering validity index and objective function of the FCM algorithm as the fitness function of the PSO algorithm. The experimental result indicates that it is effective and efficient. For fuzzy clustering of road traffic data, it can produce useful spatial clustered pattern. And the clustered centers represent the locations which have heavy traffic flow. Moreover, the parameters of the patterns can provide intelligent traffic system with assistant decision support.
Parallelization of Edge Detection Algorithm using MPI on Beowulf Cluster
NASA Astrophysics Data System (ADS)
Haron, Nazleeni; Amir, Ruzaini; Aziz, Izzatdin A.; Jung, Low Tan; Shukri, Siti Rohkmah
In this paper, we present the design of parallel Sobel edge detection algorithm using Foster's methodology. The parallel algorithm is implemented using MPI message passing library and master/slave algorithm. Every processor performs the same sequential algorithm but on different part of the image. Experimental results conducted on Beowulf cluster are presented to demonstrate the performance of the parallel algorithm.
A hybrid monkey search algorithm for clustering analysis.
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
The Georgi algorithms of jet clustering
NASA Astrophysics Data System (ADS)
Ge, Shao-Feng
2015-05-01
We reveal the direct link between the jet clustering algorithms recently proposed by Howard Georgi and parton shower kinematics, providing firm foundation from the theoretical side. The kinematics of this class of elegant algorithms is explored systematically for partons with arbitrary masses and the jet function is generalized to J {/β ( n)} with a jet function index n in order to achieve more degrees of freedom. Based on three basic requirements that, the result of jet clustering is process-independent and hence logically consistent, for softer subjets the inclusion cone is larger to conform with the fact that parton shower tends to emit softer partons at earlier stage with larger opening angle, and that the cone size cannot be too large in order to avoid mixing up neighbor jets, we derive constraints on the jet function parameter β and index n which are closely related to cone size cutoff. Finally, we discuss how jet function values can be made invariant under Lorentz boost.
Clustering algorithm for determining community structure in large networks
NASA Astrophysics Data System (ADS)
Pujol, Josep M.; Béjar, Javier; Delgado, Jordi
2006-07-01
We propose an algorithm to find the community structure in complex networks based on the combination of spectral analysis and modularity optimization. The clustering produced by our algorithm is as accurate as the best algorithms on the literature of modularity optimization; however, the main asset of the algorithm is its efficiency. The best match for our algorithm is Newman’s fast algorithm, which is the reference algorithm for clustering in large networks due to its efficiency. When both algorithms are compared, our algorithm outperforms the fast algorithm both in efficiency and accuracy of the clustering, in terms of modularity. Thus, the results suggest that the proposed algorithm is a good choice to analyze the community structure of medium and large networks in the range of tens and hundreds of thousand vertices.
Energy Aware Clustering Algorithms for Wireless Sensor Networks
NASA Astrophysics Data System (ADS)
Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian
2011-09-01
The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models
NASA Technical Reports Server (NTRS)
Mjoisness, Eric; Castano, Rebecca; Gray, Alexander
1999-01-01
We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
A Comparative Study of Protein Sequence Clustering Algorithms
NASA Astrophysics Data System (ADS)
Eldin, A. Sharaf; Abdelgaber, S.; Soliman, T.; Kassim, S.; Abdo, A.
In this paper, we survey four clustering techniques and discuss their advantages and drawbacks. A review of eight different protein sequence clustering algorithms has been accomplished. Moreover, a comparison between the algorithms on the basis of some factors has been presented.
New SIMD Algorithms for Cluster Labeling on Parallel Computers
NASA Astrophysics Data System (ADS)
Apostolakis, John; Coddington, Paul; Marinari, Enzo
Cluster algorithms are non-local Monte Carlo update schemes which can greatly increase the efficiency of computer simulations of spin models of magnets. The major computational task in these algorithms is connected component labeling, to identify clusters of connected sites on a lattice. We have devised some new SIMD component labeling algorithms, and implemented them on the Connection Machine. We investigate their performance when applied to the cluster update of the two-dimensional Ising spin model. These algorithms could also be applied to other problems which use connected component labeling, such as percolation and image analysis.
Single-Pass Clustering Algorithm Based on Storm
NASA Astrophysics Data System (ADS)
Fang, LI; Longlong, DAI; Zhiying, JIANG; Shunzi, LI
2017-02-01
The dramatically increasing volume of data makes the computational complexity of traditional clustering algorithm rise rapidly accordingly, which leads to the longer time. So as to improve the efficiency of the stream data clustering, a distributed real-time clustering algorithm (S-Single-Pass) based on the classic Single-Pass [1] algorithm and Storm [2] computation framework was designed in this paper. By employing this kind of method in the Topic Detection and Tracking (TDT) [3], the real-time performance of topic detection arises effectively. The proposed method splits the clustering process into two parts: one part is to form clusters for the multi-thread parallel clustering, the other part is to merge the generated clusters in the previous process and update the global clusters. Through the experimental results, the conclusion can be drawn that the proposed method have the nearly same clustering accuracy as the traditional Single-Pass algorithm and the clustering accuracy remains steady, computing rate increases linearly when increasing the number of cluster machines and nodes (processing threads).
MRI brain tumor segmentation based on improved fuzzy c-means method
NASA Astrophysics Data System (ADS)
Deng, Wankai; Xiao, Wei; Pan, Chao; Liu, Jianguo
2009-10-01
This paper focuses on the image segmentation, which is one of the key problems in medical image processing. A new medical image segmentation method is proposed based on fuzzy c- means algorithm and spatial information. Firstly, we classify the image into the region of interest and background using fuzzy c means algorithm. Then we use the information of the tissues' gradient and the intensity inhomogeneities of regions to improve the quality of segmentation. The sum of the mean variance in the region and the reciprocal of the mean gradient along the edge of the region are chosen as an objective function. The minimum of the sum is optimum result. The result shows that the clustering segmentation algorithm is effective.
Clustering algorithms for Stokes space modulation format recognition.
Boada, Ricard; Borkowski, Robert; Monroy, Idelfonso Tafur
2015-06-15
Stokes space modulation format recognition (Stokes MFR) is a blind method enabling digital coherent receivers to infer modulation format information directly from a received polarization-division-multiplexed signal. A crucial part of the Stokes MFR is a clustering algorithm, which largely influences the performance of the detection process, particularly at low signal-to-noise ratios. This paper reports on an extensive study of six different clustering algorithms: k-means, expectation maximization, density-based DBSCAN and OPTICS, spectral clustering and maximum likelihood clustering, used for discriminating between dual polarization: BPSK, QPSK, 8-PSK, 8-QAM, and 16-QAM. We determine essential performance metrics for each clustering algorithm and modulation format under test: minimum required signal-to-noise ratio, detection accuracy and algorithm complexity.
A biased random-key genetic algorithm for data clustering.
Festa, P
2013-09-01
Cluster analysis aims at finding subsets (clusters) of a given set of entities, which are homogeneous and/or well separated. Starting from the 1990s, cluster analysis has been applied to several domains with numerous applications. It has emerged as one of the most exciting interdisciplinary fields, having benefited from concepts and theoretical results obtained by different scientific research communities, including genetics, biology, biochemistry, mathematics, and computer science. The last decade has brought several new algorithms, which are able to solve larger sized and real-world instances. We will give an overview of the main types of clustering and criteria for homogeneity or separation. Solution techniques are discussed, with special emphasis on the combinatorial optimization perspective, with the goal of providing conceptual insights and literature references to the broad community of clustering practitioners. A new biased random-key genetic algorithm is also described and compared with several efficient hybrid GRASP algorithms recently proposed to cluster biological data.
A fuzzy clustering algorithm to detect planar and quadric shapes
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa
1992-01-01
In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.
Efficient Cluster Algorithm for Spin Glasses in Any Space Dimension
NASA Astrophysics Data System (ADS)
Zhu, Zheng; Ochoa, Andrew J.; Katzgraber, Helmut G.
2015-08-01
Spin systems with frustration and disorder are notoriously difficult to study, both analytically and numerically. While the simulation of ferromagnetic statistical mechanical models benefits greatly from cluster algorithms, these accelerated dynamics methods remain elusive for generic spin-glass-like systems. Here, we present a cluster algorithm for Ising spin glasses that works in any space dimension and speeds up thermalization by at least one order of magnitude at temperatures where thermalization is typically difficult. Our isoenergetic cluster moves are based on the Houdayer cluster algorithm for two-dimensional spin glasses and lead to a speedup over conventional state-of-the-art methods that increases with the system size. We illustrate the benefits of the isoenergetic cluster moves in two and three space dimensions, as well as the nonplanar chimera topology found in the D-Wave Inc. quantum annealing machine.
A novel spatial clustering algorithm based on Delaunay triangulation
NASA Astrophysics Data System (ADS)
Yang, Xiankun; Cui, Weihong
2008-12-01
Exploratory data analysis is increasingly more necessary as larger spatial data is managed in electro-magnetic media. Spatial clustering is one of the very important spatial data mining techniques. So far, a lot of spatial clustering algorithms have been proposed. In this paper we propose a robust spatial clustering algorithm named SCABDT (Spatial Clustering Algorithm Based on Delaunay Triangulation). SCABDT demonstrates important advantages over the previous works. First, it discovers even arbitrary shape of cluster distribution. Second, in order to execute SCABDT, we do not need to know any priori nature of distribution. Third, like DBSCAN, Experiments show that SCABDT does not require so much CPU processing time. Finally it handles efficiently outliers.
A Fast Implementation of the ISODATA Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2005-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
Implementing Agglomerative Hierarchic Clustering Algorithms for Use in Document Retrieval.
ERIC Educational Resources Information Center
Voorhees, Ellen M.
1986-01-01
Describes a computerized information retrieval system that uses three agglomerative hierarchic clustering algorithms--single link, complete link, and group average link--and explains their implementations. It is noted that these implementations have been used to cluster a collection of 12,000 documents. (LRW)
Efficient Record Linkage Algorithms Using Complete Linkage Clustering
Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar
2016-01-01
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. PMID:27124604
Efficient Record Linkage Algorithms Using Complete Linkage Clustering.
Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar
2016-01-01
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times.
Clustering of Hadronic Showers with a Structural Algorithm
Charles, M.J.; /SLAC
2005-12-13
The internal structure of hadronic showers can be resolved in a high-granularity calorimeter. This structure is described in terms of simple components and an algorithm for reconstruction of hadronic clusters using these components is presented. Results from applying this algorithm to simulated hadronic Z-pole events in the SiD concept are discussed.
Critical dynamics of cluster algorithms in the dilute Ising model
NASA Astrophysics Data System (ADS)
Hennecke, M.; Heyken, U.
1993-08-01
Autocorrelation times for thermodynamic quantities at T C are calculated from Monte Carlo simulations of the site-diluted simple cubic Ising model, using the Swendsen-Wang and Wolff cluster algorithms. Our results show that for these algorithms the autocorrelation times decrease when reducing the concentration of magnetic sites from 100% down to 40%. This is of crucial importance when estimating static properties of the model, since the variances of these estimators increase with autocorrelation time. The dynamical critical exponents are calculated for both algorithms, observing pronounced finite-size effects in the energy autocorrelation data for the algorithm of Wolff. We conclude that, when applied to the dilute Ising model, cluster algorithms become even more effective than local algorithms, for which increasing autocorrelation times are expected.
CCL: an algorithm for the efficient comparison of clusters
Hundt, R.; Schön, J. C.; Neelamraju, S.; Zagorac, J.; Jansen, M.
2013-01-01
The systematic comparison of the atomic structure of solids and clusters has become an important task in crystallography, chemistry, physics and materials science, in particular in the context of structure prediction and structure determination of nanomaterials. In this work, an efficient and robust algorithm for the comparison of cluster structures is presented, which is based on the mapping of the point patterns of the two clusters onto each other. This algorithm has been implemented as the module CCL in the structure visualization and analysis program KPLOT. PMID:23682193
NASA Astrophysics Data System (ADS)
Sun, Jiajia; Li, Yaoguo
2017-02-01
Joint inversion that simultaneously inverts multiple geophysical data sets to recover a common Earth model is increasingly being applied to exploration problems. Petrophysical data can serve as an effective constraint to link different physical property models in such inversions. There are two challenges, among others, associated with the petrophysical approach to joint inversion. One is related to the multimodality of petrophysical data because there often exist more than one relationship between different physical properties in a region of study. The other challenge arises from the fact that petrophysical relationships have different characteristics and can exhibit point, linear, quadratic, or exponential forms in a crossplot. The fuzzy c-means (FCM) clustering technique is effective in tackling the first challenge and has been applied successfully. We focus on the second challenge in this paper and develop a joint inversion method based on variations of the FCM clustering technique. To account for the specific shapes of petrophysical relationships, we introduce several different fuzzy clustering algorithms that are capable of handling different shapes of petrophysical relationships. We present two synthetic and one field data examples and demonstrate that, by choosing appropriate distance measures for the clustering component in the joint inversion algorithm, the proposed joint inversion method provides an effective means of handling common petrophysical situations we encounter in practice. The jointly inverted models have both enhanced structural similarity and increased petrophysical correlation, and better represent the subsurface in the spatial domain and the parameter domain of physical properties.
NASA Astrophysics Data System (ADS)
Sun, Jiajia; Li, Yaoguo
2016-11-01
Joint inversion that simultaneously inverts multiple geophysical data sets to recover a common Earth model is increasingly being applied to exploration problems. Petrophysical data can serve as an effective constraint to link different physical property models in such inversions. There are two challenges, among others, associated with the petrophysical approach to joint inversion. One is related to the multi-modality of petrophysical data because there often exist more than one relationship between different physical properties in a region of study. The other challenge arises from the fact that petrophysical relationships have different characteristics and can exhibit point, linear, quadratic, or exponential forms in a crossplot. The fuzzy c-means (FCM) clustering technique is effective in tackling the first challenge and has been applied successfully. We focus on the second challenge in this paper and develop a joint inversion method based on variations of the FCM clustering technique. To account for the specific shapes of petrophysical relationships, we introduce several different fuzzy clustering algorithms that are capable of handling different shapes of petrophysical relationships. We present two synthetic and one field data examples and demonstrate that, by choosing appropriate distance measures for the clustering component in the joint inversion algorithm, the proposed joint inversion method provides an effective means of handling common petrophysical situations we encounter in practice. The jointly inverted models have both enhanced structural similarity and increased petrophysical correlation, and better represent the subsurface in the spatial domain and the parameter domain of physical properties.
Quantum AdaBoost algorithm via cluster state
NASA Astrophysics Data System (ADS)
Li, Yuan
2017-03-01
The principle and theory of quantum computation are investigated by researchers for many years, and further applied to improve the efficiency of classical machine learning algorithms. Based on physical mechanism, a quantum version of AdaBoost (Adaptive Boosting) training algorithm is proposed in this paper, of which purpose is to construct a strong classifier. In the proposed scheme with cluster state in quantum mechanism is to realize the weak learning algorithm, and then update the corresponding weight of examples. As a result, a final classifier can be obtained by combining efficiently weak hypothesis based on measuring cluster state to reweight the distribution of examples.
Efficient cluster algorithm for CP(N-1) models
NASA Astrophysics Data System (ADS)
Beard, B. B.; Pepe, M.; Riederer, S.; Wiese, U.-J.
2006-11-01
Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard Wilson formulation of lattice field theory. In fact, there is a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. In this paper, we construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a regularization for CP(N-1) models in the framework of D-theory. We present detailed studies of the autocorrelations and find a dynamical critical exponent that is consistent with z=0.
Measuring Constraint-Set Utility for Partitional Clustering Algorithms
NASA Technical Reports Server (NTRS)
Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato
2006-01-01
Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.
Digital News Graph Clustering using Chinese Whispers Algorithm
NASA Astrophysics Data System (ADS)
Pratama, M. F. E.; Kemas, R. S. W.; Anisa, H.
2017-01-01
As the exponential growth of news creation on the internet, the amount of digital news has reached out billion numbers. Digital news is naturally linked each other but it needs to be grouped so that user can easily classify the news that they read. Graph is the most suitable data model to represent digital news since its can describing relation in easy and flexible manner. Thus, to overcome grouping problems, in this paper we using Chinese Whispers Algorithm as the graph clustering approach. We choose Chinese Whisper Algorithm based on consideration that the algorithm is able to make clusters from a big graph data with a relatively fast process [8], that appropriate with the characteristics of digital news. In this research, we examine the graph quality by comparing intra and inter-cluster weights of every node. This scenario gives us a quite high result that 95% of nodes have intra-cluster weight higher than its inter-cluster weight. We also investigate the graph accuracy by comparing the cluster results with expert judgement. As the result, the average accuracy of digital news graph clustering using Chinese Whisper algorithm is 80%.
Research on retailer data clustering algorithm based on Spark
NASA Astrophysics Data System (ADS)
Huang, Qiuman; Zhou, Feng
2017-03-01
Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.
A geometric clustering algorithm with applications to structural data.
Xu, Shutan; Zou, Shuxue; Wang, Lincong
2015-05-01
An important feature of structural data, especially those from structural determination and protein-ligand docking programs, is that their distribution could be mostly uniform. Traditional clustering algorithms developed specifically for nonuniformly distributed data may not be adequate for their classification. Here we present a geometric partitional algorithm that could be applied to both uniformly and nonuniformly distributed data. The algorithm is a top-down approach that recursively selects the outliers as the seeds to form new clusters until all the structures within a cluster satisfy a classification criterion. The algorithm has been evaluated on a diverse set of real structural data and six sets of test data. The results show that it is superior to the previous algorithms for the clustering of structural data and is similar to or better than them for the classification of the test data. The algorithm should be especially useful for the identification of the best but minor clusters and for speeding up an iterative process widely used in NMR structure determination.
Sampling Within k-Means Algorithm to Cluster Large Datasets
Bejarano, Jeremy; Bose, Koushiki; Brannan, Tyler; Thomas, Anita; Adragni, Kofi; Neerchal, Nagaraj; Ostrouchov, George
2011-08-01
Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.
CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET
Bajwa, Khalid Bashir; Khan, Salabat; Chaudary, Nadeem Majeed; Akram, Adeel
2016-01-01
A vehicular ad hoc network (VANET) is a wirelessly connected network of vehicular nodes. A number of techniques, such as message ferrying, data aggregation, and vehicular node clustering aim to improve communication efficiency in VANETs. Cluster heads (CHs), selected in the process of clustering, manage inter-cluster and intra-cluster communication. The lifetime of clusters and number of CHs determines the efficiency of network. In this paper a Clustering algorithm based on Ant Colony Optimization (ACO) for VANETs (CACONET) is proposed. CACONET forms optimized clusters for robust communication. CACONET is compared empirically with state-of-the-art baseline techniques like Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO). Experiments varying the grid size of the network, the transmission range of nodes, and number of nodes in the network were performed to evaluate the comparative effectiveness of these algorithms. For optimized clustering, the parameters considered are the transmission range, direction and speed of the nodes. The results indicate that CACONET significantly outperforms MOPSO and CLPSO. PMID:27149517
CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET.
Aadil, Farhan; Bajwa, Khalid Bashir; Khan, Salabat; Chaudary, Nadeem Majeed; Akram, Adeel
2016-01-01
A vehicular ad hoc network (VANET) is a wirelessly connected network of vehicular nodes. A number of techniques, such as message ferrying, data aggregation, and vehicular node clustering aim to improve communication efficiency in VANETs. Cluster heads (CHs), selected in the process of clustering, manage inter-cluster and intra-cluster communication. The lifetime of clusters and number of CHs determines the efficiency of network. In this paper a Clustering algorithm based on Ant Colony Optimization (ACO) for VANETs (CACONET) is proposed. CACONET forms optimized clusters for robust communication. CACONET is compared empirically with state-of-the-art baseline techniques like Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO). Experiments varying the grid size of the network, the transmission range of nodes, and number of nodes in the network were performed to evaluate the comparative effectiveness of these algorithms. For optimized clustering, the parameters considered are the transmission range, direction and speed of the nodes. The results indicate that CACONET significantly outperforms MOPSO and CLPSO.
Exact and heuristic algorithms for weighted cluster editing.
Rahmann, Sven; Wittkop, Tobias; Baumbach, Jan; Martin, Marcel; Truss, Anke; Böcker, Sebastian
2007-01-01
Clustering objects according to given similarity or distance values is a ubiquitous problem in computational biology with diverse applications, e.g., in defining families of orthologous genes, or in the analysis of microarray experiments. While there exists a plenitude of methods, many of them produce clusterings that can be further improved. "Cleaning up" initial clusterings can be formalized as projecting a graph on the space of transitive graphs; it is also known as the cluster editing or cluster partitioning problem in the literature. In contrast to previous work on cluster editing, we allow arbitrary weights on the similarity graph. To solve the so-defined weighted transitive graph projection problem, we present (1) the first exact fixed-parameter algorithm, (2) a polynomial-time greedy algorithm that returns the optimal result on a well-defined subset of "close-to-transitive" graphs and works heuristically on other graphs, and (3) a fast heuristic that uses ideas similar to those from the Fruchterman-Reingold graph layout algorithm. We compare quality and running times of these algorithms on both artificial graphs and protein similarity graphs derived from the 66 organisms of the COG dataset.
A new fuzzy C-means method for magnetic resonance image brain segmentation
NASA Astrophysics Data System (ADS)
Altameem, Torki; Zanaty, E. A.; Tolba, Amr
2015-10-01
In this paper, we introduce a new fuzzy c-means (FCM) method in order to improve the magnetic resonance images' (MRIs) segmentation. The proposed method combines the FCM and possiblistic c-means (PCM) functions using a weighted Gaussian function. The weighted Gaussian function is given to indicate the spatial influence of the neighbouring pixels on the central pixel. The parameters of weighting coefficients are automatically determined in the implementation using the Gaussian function for every pixel in the image. The proposed method is realised by modifying the objective function of the PCM algorithm to produce memberships and possibilities simultaneously, along with the usual point prototypes or cluster centres for each cluster. The membership values can be interpreted as degrees of possibility of the points belonging to the classes, that is, the compatibilities of the points with the class prototypes to overcome the coincident clusters problem of PCM. The efficiency of the proposed algorithm is demonstrated by extensive segmentation experiments using MRIs and comparison with other state-of-the-art algorithms. In the proposed method, the effect of noise is controlled by incorporating the possibility (typicality) function in addition to the membership function. Consideration of these constraints can greatly control the noise in the image as shown in our experiments.
Cluster Recognition Algorithms for Battlefield Simulation.
1996-01-01
j,s’ 1 4 acJ y • 4 1 I A, r 22403 ( 23268 •s 0.. 233700 b "Ilk 0 *3 5* % ** % ,i• .3 0. •,. 3 4%4 ;4 26 " 24564 - N.. 24996 Figure A-65. Initial...22153 22403 22657 4 I 22909 23161 23413 23665 23917 Figure A-142. Circular clustering from data set 45. 170 p 424 27-21 427PO � � 142858
Functional clustering algorithm for the analysis of dynamic network data
NASA Astrophysics Data System (ADS)
Feldt, S.; Waddell, J.; Hetrick, V. L.; Berke, J. D.; Żochowski, M.
2009-05-01
We formulate a technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large
Development of clustering algorithms for Compressed Baryonic Matter experiment
NASA Astrophysics Data System (ADS)
Kozlov, G. E.; Ivanov, V. V.; Lebedev, A. A.; Vassiliev, Yu. O.
2015-05-01
A clustering problem for the coordinate detectors in the Compressed Baryonic Matter (CBM) experiment is discussed. Because of the high interaction rate and huge datasets to be dealt with, clustering algorithms are required to be fast and efficient and capable of processing events with high track multiplicity. At present there are two different approaches to the problem. In the first one each fired pad bears information about its charge, while in the second one a pad can or cannot be fired, thus rendering the separation of overlapping clusters a difficult task. To deal with the latter, two different clustering algorithms were developed, integrated into the CBMROOT software environment, and tested with various types of simulated events. Both of them are found to be highly efficient and accurate.
NCUBE - A clustering algorithm based on a discretized data space
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Northouse, R. A.
1974-01-01
Cluster analysis involves the unsupervised grouping of data. The process provides an automatic procedure for generating known training samples for pattern classification. NCUBE, the clustering algorithm presented, is based upon the concept of imposing a gridwork on the data space. The NCUBE computer implementation of this concept provides an easily derived form of piecewise linear discrimination. This piecewise linear discrimination permits the separation of some types of data groups that are not linearly separable.
Particle flow reconstruction based on the directed tree clustering algorithm
Chakraborty, D.; Lima, J. G. R.; McIntosh, R.; Zutshi, V.
2006-10-27
We present the status of particle flow algorithm development at Northern Illinois University. A key element in our approach is the calorimeter-based directed tree clustering algorithm. We have attempted to identify and tackle the essential challenges and analyze the effect of several different approaches to the reconstruction of jet energies and the Z-boson mass. A number of possibilities have been studied, such as analog vs. digital energy measurement, hit density-based clustering and the use of single or multiple energy thresholds. We plan to use this PFA-based reconstruction to compare some of the proposed detector technologies and geometries.
The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey
Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer, Hans; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U., ICG /North Carolina U. /Chicago U., Astron. Astrophys. Ctr. /Chicago U., EFI /Michigan U. /Fermilab /Princeton U. Observ. /Garching, Max Planck Inst., MPE /Pittsburgh U. /Tokyo U., ICRR /Baltimore, Space Telescope Sci. /Penn State U. /Chicago U. /Stavropol, Astrophys. Observ. /Heidelberg, Max Planck Inst. Astron. /INI, SAO
2005-03-01
We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster. However, if we
2013-01-01
Index (Hubert & Arabie 1985) Range: [-1, 1] Similarity 2 FMI Fowlkes-Mallows Index (Fowlkes & Mallows 1983) Range: [0, 1] Similarity 3...distance between hand-crafted and algorithmic clusterings Clustering ARI FMI JC JMS MM NMI. RI VDM VI Mean-30k 0.6170 0.6628 0.4466 0.7208 29614 0.9579...0.7472 36312 0.9574 0.9985 1200 0.8556 Table 5: Similarity or distance between hand-crafted and random clusterings Clustering ARI FMI JC JMS MM NMI RI
Clustered Self Organising Migrating Algorithm for the Quadratic Assignment Problem
NASA Astrophysics Data System (ADS)
Davendra, Donald; Zelinka, Ivan; Senkerik, Roman
2009-08-01
An approach of population dynamics and clustering for permutative problems is presented in this paper. Diversity indicators are created from solution ordering and its mapping is shown as an advantage for population control in metaheuristics. Self Organising Migrating Algorithm (SOMA) is modified using this approach and vetted with the Quadratic Assignment Problem (QAP). Extensive experimentation is conducted on benchmark problems in this area.
Morphology of open clusters NGC 1857 and Czernik 20 using clustering algorithms
NASA Astrophysics Data System (ADS)
Bhattacharya, S.; Mahulkar, V.; Pandaokar, S.; Singh, P. K.
2017-01-01
The morphology and cluster membership of the Galactic open clusters-Czernik 20 and NGC 1857 were analyzed using two different clustering algorithms. We present the maiden use of density-based spatial clustering of applications with noise (DBSCAN) to determine open cluster morphology from spatial distribution. The region of analysis has also been spatially classified using a statistical membership determination algorithm. We utilized near infrared (NIR) data for a suitably large region around the clusters from the United Kingdom Infrared Deep Sky Survey Galactic Plane Survey star catalogue database, and also from the Two Micron All Sky Survey star catalogue database. The densest regions of the cluster morphologies (1 for Czernik 20 and 2 for NGC 1857) thus identified were analyzed with a K-band extinction map and color-magnitude diagrams (CMDs). To address significant discrepancy in known distance and reddening parameters, we carried out field decontamination of these CMDs and subsequent isochrone fitting of the cleaned CMDs to obtain reliable distance and reddening parameters for the clusters (Czernik 20: D = 2900 pc; E(J- K) = 0 . 33; NGC 1857: D = 2400 pc; E(J- K) =0.18-0.19). The isochrones were also used to convert the luminosity functions for the densest regions of Czernik 20 and NGC 1857 into mass function, to derive their slopes. Additionally, a previously unknown over-density consistent with that of a star cluster is identified in the region of analysis.
Analysis and Implementation of Graph Clustering for Digital News Using Star Clustering Algorithm
NASA Astrophysics Data System (ADS)
Ahdi, A. B.; SW, K. R.; Herdiani, A.
2017-01-01
Since Web 2.0 notion emerged and is used extensively by many services in the Internet, we see an unprecedented proliferation of digital news. Those digital news is very rich in term of content and link to other news/sources but lack of category information. This make the user could not easily identify or grouping all the news that they read into set of groups. Naturally, digital news are linked data because every digital new has relation/connection with other digital news/resources. The most appropriate model for linked data is graph model. Graph model is suitable for this purpose due its flexibility in describing relation and its easy-to-understand visualization. To handle the grouping issue, we use graph clustering approach. There are many graph clustering algorithm available, such as MST Clustering, Chameleon, Makarov Clustering and Star Clustering. From all of these options, we choose Star Clustering because this algorithm is more easy-to-understand, more accurate, efficient and guarantee the quality of clusters results. In this research, we investigate the accuracy of the cluster results by comparing it with expert judgement. We got quite high accuracy level, which is 80.98% and for the cluster quality, we got promising result which is 62.87%.
Coupled cluster algorithms for networks of shared memory parallel processors
NASA Astrophysics Data System (ADS)
Bentz, Jonathan L.; Olson, Ryan M.; Gordon, Mark S.; Schmidt, Michael W.; Kendall, Ricky A.
2007-05-01
As the popularity of using SMP systems as the building blocks for high performance supercomputers increases, so too increases the need for applications that can utilize the multiple levels of parallelism available in clusters of SMPs. This paper presents a dual-layer distributed algorithm, using both shared-memory and distributed-memory techniques to parallelize a very important algorithm (often called the "gold standard") used in computational chemistry, the single and double excitation coupled cluster method with perturbative triples, i.e. CCSD(T). The algorithm is presented within the framework of the GAMESS [M.W. Schmidt, K.K. Baldridge, J.A. Boatz, S.T. Elbert, M.S. Gordon, J.J. Jensen, S. Koseki, N. Matsunaga, K.A. Nguyen, S. Su, T.L. Windus, M. Dupuis, J.A. Montgomery, General atomic and molecular electronic structure system, J. Comput. Chem. 14 (1993) 1347-1363]. (General Atomic and Molecular Electronic Structure System) program suite and the Distributed Data Interface [M.W. Schmidt, G.D. Fletcher, B.M. Bode, M.S. Gordon, The distributed data interface in GAMESS, Comput. Phys. Comm. 128 (2000) 190]. (DDI), however, the essential features of the algorithm (data distribution, load-balancing and communication overhead) can be applied to more general computational problems. Timing and performance data for our dual-level algorithm is presented on several large-scale clusters of SMPs.
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Johnson, J. K.
1979-01-01
An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
Improved Gravitation Field Algorithm and Its Application in Hierarchical Clustering
Zheng, Ming; Sun, Ying; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang
2012-01-01
Background Gravitation field algorithm (GFA) is a new optimization algorithm which is based on an imitation of natural phenomena. GFA can do well both for searching global minimum and multi-minima in computational biology. But GFA needs to be improved for increasing efficiency, and modified for applying to some discrete data problems in system biology. Method An improved GFA called IGFA was proposed in this paper. Two parts were improved in IGFA. The first one is the rule of random division, which is a reasonable strategy and makes running time shorter. The other one is rotation factor, which can improve the accuracy of IGFA. And to apply IGFA to the hierarchical clustering, the initial part and the movement operator were modified. Results Two kinds of experiments were used to test IGFA. And IGFA was applied to hierarchical clustering. The global minimum experiment was used with IGFA, GFA, GA (genetic algorithm) and SA (simulated annealing). Multi-minima experiment was used with IGFA and GFA. The two experiments results were compared with each other and proved the efficiency of IGFA. IGFA is better than GFA both in accuracy and running time. For the hierarchical clustering, IGFA is used to optimize the smallest distance of genes pairs, and the results were compared with GA and SA, singular-linkage clustering, UPGMA. The efficiency of IGFA is proved. PMID:23173043
A comparison of clustering algorithms in article recommendation system
NASA Astrophysics Data System (ADS)
Tantanasiriwong, Supaporn
2011-12-01
Recommendation system is considered a tool that can be used to recommend researchers about resources that are suitable for their research of interest by using content-based filtering. In this paper, clustering algorithm as an unsupervised learning is introduced for grouping objects based on their feature selection and similarities. The information of publication in Science Cited Index is used to be dataset for clustering as a feature extraction in terms of dimensionality reduction of these articles by comparing Latent Dirichlet Allocation (LDA), Principal Component Analysis (PCA), and K-Mean to determine the best algorithm. In my experiment, the selected database consists of 2625 documents extraction extracted from SCI corpus from 2001 to 2009. Clustering into ranks as 50,100,200,250 is used to consider and using F-Measure evaluate among them in three algorithms. The result of this paper showed that LDA technique given the accuracy up to 95.5% which is the highest effective than any other clustering technique.
A comparison of clustering algorithms in article recommendation system
NASA Astrophysics Data System (ADS)
Tantanasiriwong, Supaporn
2012-01-01
Recommendation system is considered a tool that can be used to recommend researchers about resources that are suitable for their research of interest by using content-based filtering. In this paper, clustering algorithm as an unsupervised learning is introduced for grouping objects based on their feature selection and similarities. The information of publication in Science Cited Index is used to be dataset for clustering as a feature extraction in terms of dimensionality reduction of these articles by comparing Latent Dirichlet Allocation (LDA), Principal Component Analysis (PCA), and K-Mean to determine the best algorithm. In my experiment, the selected database consists of 2625 documents extraction extracted from SCI corpus from 2001 to 2009. Clustering into ranks as 50,100,200,250 is used to consider and using F-Measure evaluate among them in three algorithms. The result of this paper showed that LDA technique given the accuracy up to 95.5% which is the highest effective than any other clustering technique.
A decentralized fuzzy C-means-based energy-efficient routing protocol for wireless sensor networks.
Alia, Osama Moh'd
2014-01-01
Energy conservation in wireless sensor networks (WSNs) is a vital consideration when designing wireless networking protocols. In this paper, we propose a Decentralized Fuzzy Clustering Protocol, named DCFP, which minimizes total network energy dissipation to promote maximum network lifetime. The process of constructing the infrastructure for a given WSN is performed only once at the beginning of the protocol at a base station, which remains unchanged throughout the network's lifetime. In this initial construction step, a fuzzy C-means algorithm is adopted to allocate sensor nodes into their most appropriate clusters. Subsequently, the protocol runs its rounds where each round is divided into a CH-Election phase and a Data Transmission phase. In the CH-Election phase, the election of new cluster heads is done locally in each cluster where a new multicriteria objective function is proposed to enhance the quality of elected cluster heads. In the Data Transmission phase, the sensing and data transmission from each sensor node to their respective cluster head is performed and cluster heads in turn aggregate and send the sensed data to the base station. Simulation results demonstrate that the proposed protocol improves network lifetime, data delivery, and energy consumption compared to other well-known energy-efficient protocols.
Mapping cultivable land from satellite imagery with clustering algorithms
NASA Astrophysics Data System (ADS)
Arango, R. B.; Campos, A. M.; Combarro, E. F.; Canas, E. R.; Díaz, I.
2016-07-01
Open data satellite imagery provides valuable data for the planning and decision-making processes related with environmental domains. Specifically, agriculture uses remote sensing in a wide range of services, ranging from monitoring the health of the crops to forecasting the spread of crop diseases. In particular, this paper focuses on a methodology for the automatic delimitation of cultivable land by means of machine learning algorithms and satellite data. The method uses a partition clustering algorithm called Partitioning Around Medoids and considers the quality of the clusters obtained for each satellite band in order to evaluate which one better identifies cultivable land. The proposed method was tested with vineyards using as input the spectral and thermal bands of the Landsat 8 satellite. The experimental results show the great potential of this method for cultivable land monitoring from remote-sensed multispectral imagery.
Synchronous Firefly Algorithm for Cluster Head Selection in WSN
Baskaran, Madhusudhanan; Sadagopan, Chitra
2015-01-01
Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC. PMID:26495431
Advanced defect detection algorithm using clustering in ultrasonic NDE
NASA Astrophysics Data System (ADS)
Gongzhang, Rui; Gachagan, Anthony
2016-02-01
A range of materials used in industry exhibit scattering properties which limits ultrasonic NDE. Many algorithms have been proposed to enhance defect detection ability, such as the well-known Split Spectrum Processing (SSP) technique. Scattering noise usually cannot be fully removed and the remaining noise can be easily confused with real feature signals, hence becoming artefacts during the image interpretation stage. This paper presents an advanced algorithm to further reduce the influence of artefacts remaining in A-scan data after processing using a conventional defect detection algorithm. The raw A-scan data can be acquired from either traditional single transducer or phased array configurations. The proposed algorithm uses the concept of unsupervised machine learning to cluster segmental defect signals from pre-processed A-scans into different classes. The distinction and similarity between each class and the ensemble of randomly selected noise segments can be observed by applying a classification algorithm. Each class will then be labelled as `legitimate reflector' or `artefacts' based on this observation and the expected probability of defection (PoD) and probability of false alarm (PFA) determined. To facilitate data collection and validate the proposed algorithm, a 5MHz linear array transducer is used to collect A-scans from both austenitic steel and Inconel samples. Each pulse-echo A-scan is pre-processed using SSP and the subsequent application of the proposed clustering algorithm has provided an additional reduction to PFA while maintaining PoD for both samples compared with SSP results alone.
ICANP2: Isoenergetic cluster algorithm for NP-complete Problems
NASA Astrophysics Data System (ADS)
Zhu, Zheng; Fang, Chao; Katzgraber, Helmut G.
NP-complete optimization problems with Boolean variables are of fundamental importance in computer science, mathematics and physics. Most notably, the minimization of general spin-glass-like Hamiltonians remains a difficult numerical task. There has been a great interest in designing efficient heuristics to solve these computationally difficult problems. Inspired by the rejection-free isoenergetic cluster algorithm developed for Ising spin glasses, we present a generalized cluster update that can be applied to different NP-complete optimization problems with Boolean variables. The cluster updates allow for a wide-spread sampling of phase space, thus speeding up optimization. By carefully tuning the pseudo-temperature (needed to randomize the configurations) of the problem, we show that the method can efficiently tackle problems on topologies with a large site-percolation threshold. We illustrate the ICANP2 heuristic on paradigmatic optimization problems, such as the satisfiability problem and the vertex cover problem.
Clustering Algorithms: Their Application to Gene Expression Data
Oyelade, Jelili; Isewon, Itunuoluwa; Oladipupo, Funke; Aromolaran, Olufemi; Uwoghiren, Efosa; Ameh, Faridah; Achas, Moses; Adebiyi, Ezekiel
2016-01-01
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure. PMID:27932867
Sweeney, Timothy E; Chen, Albert C; Gevaert, Olivier
2015-11-19
In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a sort of 'dark art', with multiple techniques available for assessing the validity of unsupervised clustering algorithms. Here, we present a new technique for unsupervised clustering that uses multiple clustering algorithms, multiple validity metrics, and progressively bigger subsets of the data to produce an intuitive 3D map of cluster stability that can help determine the optimal number of clusters in a data set, a technique we call COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL). COMMUNAL locally optimizes algorithms and validity measures for the data being used. We show its application to simulated data with a known K, and then apply this technique to several well-known cancer gene expression datasets, showing that COMMUNAL provides new insights into clustering behavior and stability in all tested cases. COMMUNAL is shown to be a useful tool for determining K in complex biological datasets, and is freely available as a package for R.
Identifying multiple influential spreaders by a heuristic clustering algorithm
NASA Astrophysics Data System (ADS)
Bao, Zhong-Kui; Liu, Jian-Guo; Zhang, Hai-Feng
2017-03-01
The problem of influence maximization in social networks has attracted much attention. However, traditional centrality indices are suitable for the case where a single spreader is chosen as the spreading source. Many times, spreading process is initiated by simultaneously choosing multiple nodes as the spreading sources. In this situation, choosing the top ranked nodes as multiple spreaders is not an optimal strategy, since the chosen nodes are not sufficiently scattered in networks. Therefore, one ideal situation for multiple spreaders case is that the spreaders themselves are not only influential but also they are dispersively distributed in networks, but it is difficult to meet the two conditions together. In this paper, we propose a heuristic clustering (HC) algorithm based on the similarity index to classify nodes into different clusters, and finally the center nodes in clusters are chosen as the multiple spreaders. HC algorithm not only ensures that the multiple spreaders are dispersively distributed in networks but also avoids the selected nodes to be very "negligible". Compared with the traditional methods, our experimental results on synthetic and real networks indicate that the performance of HC method on influence maximization is more significant.
Dynamically Incremental K-means++ Clustering Algorithm Based on Fuzzy Rough Set Theory
NASA Astrophysics Data System (ADS)
Li, Wei; Wang, Rujing; Jia, Xiufang; Jiang, Qing
Being classic K-means++ clustering algorithm only for static data, dynamically incremental K-means++ clustering algorithm (DK-Means++) is presented based on fuzzy rough set theory in this paper. Firstly, in DK-Means++ clustering algorithm, the formula of similar degree is improved by weights computed by using of the important degree of attributes which are reduced on the basis of rough fuzzy set theory. Secondly, new data only need match granular which was clustered by K-means++ algorithm or seldom new data is clustered by classic K-means++ algorithm in global data. In this way, that all data is re-clustered each time in dynamic data set is avoided, so the efficiency of clustering is improved. Throughout our experiments showing, DK-Means++ algorithm can objectively and efficiently deal with clustering problem of dynamically incremental data.
Gravitation field algorithm and its application in gene cluster
2010-01-01
Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA) which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM) of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab) are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA. PMID:20854683
Mekhmoukh, Abdenour; Mokrani, Karim
2015-11-01
In this paper, a new image segmentation method based on Particle Swarm Optimization (PSO) and outlier rejection combined with level set is proposed. A traditional approach to the segmentation of Magnetic Resonance (MR) images is the Fuzzy C-Means (FCM) clustering algorithm. The membership function of this conventional algorithm is sensitive to the outlier and does not integrate the spatial information in the image. The algorithm is very sensitive to noise and in-homogeneities in the image, moreover, it depends on cluster centers initialization. To improve the outlier rejection and to reduce the noise sensitivity of conventional FCM clustering algorithm, a novel extended FCM algorithm for image segmentation is presented. In general, in the FCM algorithm the initial cluster centers are chosen randomly, with the help of PSO algorithm the clusters centers are chosen optimally. Our algorithm takes also into consideration the spatial neighborhood information. These a priori are used in the cost function to be optimized. For MR images, the resulting fuzzy clustering is used to set the initial level set contour. The results confirm the effectiveness of the proposed algorithm.
A new detection algorithm for microcalcification clusters in mammographic screening
NASA Astrophysics Data System (ADS)
Xie, Weiying; Ma, Yide; Li, Yunsong
2015-05-01
A novel approach for microcalcification clusters detection is proposed. At the first time, we make a short analysis of mammographic images with microcalcification lesions to confirm these lesions have much greater gray values than normal regions. After summarizing the specific feature of microcalcification clusters in mammographic screening, we make more focus on preprocessing step including eliminating the background, image enhancement and eliminating the pectoral muscle. In detail, Chan-Vese Model is used for eliminating background. Then, we do the application of combining morphology method and edge detection method. After the AND operation and Sobel filter, we use Hough Transform, it can be seen that the result have outperformed for eliminating the pectoral muscle which is approximately the gray of microcalcification. Additionally, the enhancement step is achieved by morphology. We make effort on mammographic image preprocessing to achieve lower computational complexity. As well known, it is difficult to robustly achieve mammograms analysis due to low contrast between normal and lesion tissues, there are also much noise in such images. After a serious preprocessing algorithm, a method based on blob detection is performed to microcalcification clusters according their specific features. The proposed algorithm has employed Laplace operator to improve Difference of Gaussians (DoG) function in terms of low contrast images. A preliminary evaluation of the proposed method performs on a known public database namely MIAS, rather than synthetic images. The comparison experiments and Cohen's kappa coefficients all demonstrate that our proposed approach can potentially obtain better microcalcification clusters detection results in terms of accuracy, sensitivity and specificity.
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1992-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.
Graphical representations and cluster algorithms I. Discrete spin systems
NASA Astrophysics Data System (ADS)
Chayes, L.; Machta, J.
1997-02-01
Graphical representations similar to the FK representation are developed for a variety of spin-systems. In several cases, it is established that these representations have (FKG) monotonicity properties which enables characterization theorems for the uniqueness phase and the low-temperature phase of the spin system. Certain systems with intermediate phases and/or first-order transitions are also described in terms of the percolation properties of the representations. In all cases, these representations lead, in a natural fashion, to Swendsen-Wang-type algorithms. Hence, at least in the above-mentioned instances, these algorithms realize the program described by Kandel and Domany, Phys. Rev. B 43 (1991) 8539-8548. All of the algorithms are shown to satisfy a Li-Sokal bound which (at least for systems with a divergent specific heat) implies critical slowing down. However, the representations also give rise to invaded cluster algorithms which may allow for the rapid simulation of some of these systems at their transition points.
Dynamic and static properties of the invaded cluster algorithm
NASA Astrophysics Data System (ADS)
Moriarty, K.; Machta, J.; Chayes, L. Y.
1999-02-01
Simulations of the two-dimensional Ising and three-state Potts models at their critical points are performed using the invaded cluster (IC) algorithm. It is argued that observables measured on a sublattice of size l should exhibit a crossover to Swendsen-Wang (SW) behavior for l sufficiently less than the lattice size L, and a scaling form is proposed to describe the crossover phenomenon. It is found that the energy autocorrelation time τɛ(l,L) for an l×l sublattice attains a maximum in the crossover region, and a dynamic exponent zIC for the IC algorithm is defined according to τɛ,max~LzIC. Simulation results for the three-state model yield zIC=0.346+/-0.002, which is smaller than values of the dynamic exponent found for the SW and Wolff algorithms and also less than the Li-Sokal bound. The results are less conclusive for the Ising model, but it appears that zIC<0.21 and possibly that τɛ,max~ln L so that zIC=0-similar to previous results for the SW and Wolff algorithms.
A clustering method of Chinese medicine prescriptions based on modified firefly algorithm.
Yuan, Feng; Liu, Hong; Chen, Shou-Qiang; Xu, Liang
2016-12-01
This paper is aimed to study the clustering method for Chinese medicine (CM) medical cases. The traditional K-means clustering algorithm had shortcomings such as dependence of results on the selection of initial value, trapping in local optimum when processing prescriptions form CM medical cases. Therefore, a new clustering method based on the collaboration of firefly algorithm and simulated annealing algorithm was proposed. This algorithm dynamically determined the iteration of firefly algorithm and simulates sampling of annealing algorithm by fitness changes, and increased the diversity of swarm through expansion of the scope of the sudden jump, thereby effectively avoiding premature problem. The results from confirmatory experiments for CM medical cases suggested that, comparing with traditional K-means clustering algorithms, this method was greatly improved in the individual diversity and the obtained clustering results, the computing results from this method had a certain reference value for cluster analysis on CM prescriptions.
jClustering, an Open Framework for the Development of 4D Clustering Algorithms
Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J.
2013-01-01
We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913
jClustering, an open framework for the development of 4D clustering algorithms.
Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J
2013-01-01
We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary.
Thermodynamic Casimir effect in films: the exchange cluster algorithm.
Hasenbusch, Martin
2015-02-01
We study the thermodynamic Casimir force for films with various types of boundary conditions and the bulk universality class of the three-dimensional Ising model. To this end, we perform Monte Carlo simulations of the improved Blume-Capel model on the simple cubic lattice. In particular, we employ the exchange or geometric cluster cluster algorithm [Heringa and Blöte, Phys. Rev. E 57, 4976 (1998)]. In a previous work, we demonstrated that this algorithm allows us to compute the thermodynamic Casimir force for the plate-sphere geometry efficiently. It turns out that also for the film geometry a substantial reduction of the statistical error can achieved. Concerning physics, we focus on (O,O) boundary conditions, where O denotes the ordinary surface transition. These are implemented by free boundary conditions on both sides of the film. Films with such boundary conditions undergo a phase transition in the universality class of the two-dimensional Ising model. We determine the inverse transition temperature for a large range of thicknesses L(0) of the film and study the scaling of this temperature with L(0). In the neighborhood of the transition, the thermodynamic Casimir force is affected by finite size effects, where finite size refers to a finite transversal extension L of the film. We demonstrate that these finite size effects can be computed by using the universal finite size scaling function of the free energy of the two-dimensional Ising model.
Ternary alloy material prediction using genetic algorithm and cluster expansion
Chen, Chong
2015-12-01
This thesis summarizes our study on the crystal structures prediction of Fe-V-Si system using genetic algorithm and cluster expansion. Our goal is to explore and look for new stable compounds. We started from the current ten known experimental phases, and calculated formation energies of those compounds using density functional theory (DFT) package, namely, VASP. The convex hull was generated based on the DFT calculations of the experimental known phases. Then we did random search on some metal rich (Fe and V) compositions and found that the lowest energy structures were body centered cube (bcc) underlying lattice, under which we did our computational systematic searches using genetic algorithm and cluster expansion. Among hundreds of the searched compositions, thirteen were selected and DFT formation energies were obtained by VASP. The stability checking of those thirteen compounds was done in reference to the experimental convex hull. We found that the composition, 24-8-16, i.e., Fe_{3}VSi_{2} is a new stable phase and it can be very inspiring to the future experiments.
NASA Astrophysics Data System (ADS)
Komura, Yukihiro
2015-09-01
Cluster-labeling algorithms that use a single GPU can be roughly divided into direct and two-stage approaches. To date, both types use an iterative method to compare the labels of nearest-neighbor sites. In this paper, I present a GPU-based cluster-labeling algorithm that does not use conventional iteration. The proposed method is applicable to both direct algorithms and two-stage approaches. Under the proposed approach, only one comparison with the nearest-neighbor site is needed for a two-dimensional (2D) system, and just two comparisons are needed for three-dimensional (3D) systems. As an application of the new cluster-labeling algorithm, I consider the Swendsen-Wang (SW) multi-cluster spin flip algorithm. The performance of the proposed method is compared with that of other cluster-labeling algorithms for the SW multi-cluster spin flip problem using the 2D and 3D Ising models. As a result, the computation time of the new algorithm is shown to be 40% faster than that of the previous algorithm for the 2D Ising model, and 20% faster than that of the previous algorithm for the 3D Ising model at the critical temperature.
Poole, William; Leinonen, Kalle; Shmulevich, Ilya; Knijnenburg, Theo A; Bernard, Brady
2017-02-01
Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.
Poole, William; Leinonen, Kalle; Shmulevich, Ilya
2017-01-01
Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C. PMID:28170390
NASA Astrophysics Data System (ADS)
Gong, Lina; Xu, Tao; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen
2017-03-01
The traditional microblog recommendation algorithm has the problems of low efficiency and modest effect in the era of big data. In the aim of solving these issues, this paper proposed a mixed recommendation algorithm with user clustering. This paper first introduced the situation of microblog marketing industry. Then, this paper elaborates the user interest modeling process and detailed advertisement recommendation methods. Finally, this paper compared the mixed recommendation algorithm with the traditional classification algorithm and mixed recommendation algorithm without user clustering. The results show that the mixed recommendation algorithm with user clustering has good accuracy and recall rate in the microblog advertisements promotion.
GX-Means: A model-based divide and merge algorithm for geospatial image clustering
Vatsavai, Raju; Symons, Christopher T; Chandola, Varun; Jun, Goo
2011-01-01
One of the practical issues in clustering is the specification of the appropriate number of clusters, which is not obvious when analyzing geospatial datasets, partly because they are huge (both in size and spatial extent) and high dimensional. In this paper we present a computationally efficient model-based split and merge clustering algorithm that incrementally finds model parameters and the number of clusters. Additionally, we attempt to provide insights into this problem and other data mining challenges that are encountered when clustering geospatial data. The basic algorithm we present is similar to the G-means and X-means algorithms; however, our proposed approach avoids certain limitations of these well-known clustering algorithms that are pertinent when dealing with geospatial data. We compare the performance of our approach with the G-means and X-means algorithms. Experimental evaluation on simulated data and on multispectral and hyperspectral remotely sensed image data demonstrates the effectiveness of our algorithm.
NASA Astrophysics Data System (ADS)
Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
User-Based Document Clustering by Redescribing Subject Descriptions with a Genetic Algorithm.
ERIC Educational Resources Information Center
Gordon, Michael D.
1991-01-01
Discussion of clustering of documents and queries in information retrieval systems focuses on the use of a genetic algorithm to adapt subject descriptions so that documents become more effective in matching relevant queries. Various types of clustering are explained, and simulation experiments used to test the genetic algorithm are described. (27…
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
ERIC Educational Resources Information Center
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
A Cluster Algorithm for the 2-D SU(3) × SU(3) Chiral Model
NASA Astrophysics Data System (ADS)
Ji, Da-ren; Zhang, Jian-bo
1996-07-01
To extend the cluster algorithm to SU(N) × SU(N) chiral models, a variant version of Wolff's cluster algorithm is proposed and tested for the 2-dimensional SU(3) × SU(3) chiral model. The results show that the new method can reduce the critical slowing down in SU(3) × SU(3) chiral model.
Security clustering algorithm based on reputation in hierarchical peer-to-peer network
NASA Astrophysics Data System (ADS)
Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji
2013-03-01
For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.
NASA Technical Reports Server (NTRS)
Lambeck, P. F.; Rice, D. P.
1976-01-01
Signature extension is intended to increase the space-time range over which a set of training statistics can be used to classify data without significant loss of recognition accuracy. A first cluster matching algorithm MASC (Multiplicative and Additive Signature Correction) was developed at the Environmental Research Institute of Michigan to test the concept of using associations between training and recognition area cluster statistics to define an average signature transformation. A more recent signature extension module CROP-A (Cluster Regression Ordered on Principal Axis) has shown evidence of making significant associations between training and recognition area cluster statistics, with the clusters to be matched being selected automatically by the algorithm.
Comparison and evaluation of network clustering algorithms applied to genetic interaction networks.
Hou, Lin; Wang, Lin; Berg, Arthur; Qian, Minping; Zhu, Yunping; Li, Fangting; Deng, Minghua
2012-01-01
The goal of network clustering algorithms detect dense clusters in a network, and provide a first step towards the understanding of large scale biological networks. With numerous recent advances in biotechnologies, large-scale genetic interactions are widely available, but there is a limited understanding of which clustering algorithms may be most effective. In order to address this problem, we conducted a systematic study to compare and evaluate six clustering algorithms in analyzing genetic interaction networks, and investigated influencing factors in choosing algorithms. The algorithms considered in this comparison include hierarchical clustering, topological overlap matrix, bi-clustering, Markov clustering, Bayesian discriminant analysis based community detection, and variational Bayes approach to modularity. Both experimentally identified and synthetically constructed networks were used in this comparison. The accuracy of the algorithms is measured by the Jaccard index in comparing predicted gene modules with benchmark gene sets. The results suggest that the choice differs according to the network topology and evaluation criteria. Hierarchical clustering showed to be best at predicting protein complexes; Bayesian discriminant analysis based community detection proved best under epistatic miniarray profile (EMAP) datasets; the variational Bayes approach to modularity was noticeably better than the other algorithms in the genome-scale networks.
Parallelization of the Wolff single-cluster algorithm
NASA Astrophysics Data System (ADS)
Kaupužs, J.; Rimšāns, J.; Melnik, R. V. N.
2010-02-01
A parallel [open multiprocessing (OpenMP)] implementation of the Wolff single-cluster algorithm has been developed and tested for the three-dimensional (3D) Ising model. The developed procedure is generalizable to other lattice spin models and its effectiveness depends on the specific application at hand. The applicability of the developed methodology is discussed in the context of the applications, where a sophisticated shuffling scheme is used to generate pseudorandom numbers of high quality, and an iterative method is applied to find the critical temperature of the 3D Ising model with a great accuracy. For the lattice with linear size L=1024 , we have reached the speedup about 1.79 times on two processors and about 2.67 times on four processors, as compared to the serial code. According to our estimation, the speedup about three times on four processors is reachable for the O(n) models with n≥2 . Furthermore, the application of the developed OpenMP code allows us to simulate larger lattices due to greater operative (shared) memory available.
Parallelization of the Wolff single-cluster algorithm.
Kaupuzs, J; Rimsāns, J; Melnik, R V N
2010-02-01
A parallel [open multiprocessing (OpenMP)] implementation of the Wolff single-cluster algorithm has been developed and tested for the three-dimensional (3D) Ising model. The developed procedure is generalizable to other lattice spin models and its effectiveness depends on the specific application at hand. The applicability of the developed methodology is discussed in the context of the applications, where a sophisticated shuffling scheme is used to generate pseudorandom numbers of high quality, and an iterative method is applied to find the critical temperature of the 3D Ising model with a great accuracy. For the lattice with linear size L=1024, we have reached the speedup about 1.79 times on two processors and about 2.67 times on four processors, as compared to the serial code. According to our estimation, the speedup about three times on four processors is reachable for the O(n) models with n> or =2. Furthermore, the application of the developed OpenMP code allows us to simulate larger lattices due to greater operative (shared) memory available.
NASA Astrophysics Data System (ADS)
Morales-Esteban, Antonio; Martínez-Álvarez, Francisco; Scitovski, Sanja; Scitovski, Rudolf
2014-12-01
In this paper we construct an efficient adaptive Mahalanobis k-means algorithm. In addition, we propose a new efficient algorithm to search for a globally optimal partition obtained by using the adoptive Mahalanobis distance-like function. The algorithm is a generalization of the previously proposed incremental algorithm (Scitovski and Scitovski, 2013). It successively finds optimal partitions with k = 2 , 3 , … clusters. Therefore, it can also be used for the estimation of the most appropriate number of clusters in a partition by using various validity indexes. The algorithm has been applied to the seismic catalogues of Croatia and the Iberian Peninsula. Both regions are characterized by a moderate seismic activity. One of the main advantages of the algorithm is its ability to discover not only circular but also elliptical shapes, whose geometry fits the faults better. Three seismogenic zonings are proposed for Croatia and two for the Iberian Peninsula and adjacent areas, according to the clusters discovered by the algorithm.
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect. PMID:25435862
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
C-element: a new clustering algorithm to find high quality functional modules in PPI networks.
Ghasemi, Mahdieh; Rahgozar, Maseud; Bidkhori, Gholamreza; Masoudi-Nejad, Ali
2013-01-01
Graph clustering algorithms are widely used in the analysis of biological networks. Extracting functional modules in protein-protein interaction (PPI) networks is one such use. Most clustering algorithms whose focuses are on finding functional modules try either to find a clique like sub networks or to grow clusters starting from vertices with high degrees as seeds. These algorithms do not make any difference between a biological network and any other networks. In the current research, we present a new procedure to find functional modules in PPI networks. Our main idea is to model a biological concept and to use this concept for finding good functional modules in PPI networks. In order to evaluate the quality of the obtained clusters, we compared the results of our algorithm with those of some other widely used clustering algorithms on three high throughput PPI networks from Sacchromyces Cerevisiae, Homo sapiens and Caenorhabditis elegans as well as on some tissue specific networks. Gene Ontology (GO) analyses were used to compare the results of different algorithms. Each algorithm's result was then compared with GO-term derived functional modules. We also analyzed the effect of using tissue specific networks on the quality of the obtained clusters. The experimental results indicate that the new algorithm outperforms most of the others, and this improvement is more significant when tissue specific networks are used.
A Special Local Clustering Algorithm for Identifying the Genes Associated With Alzheimer’s Disease
Pang, Chao-Yang; Hu, Wei; Hu, Ben-Qiong; Shi, Ying; Vanderburg, Charles R.; Rogers, Jack T.
2010-01-01
Clustering is the grouping of similar objects into a class. Local clustering feature refers to the phenomenon whereby one group of data is separated from another, and the data from these different groups are clustered locally. A compact class is defined as one cluster in which all similar elements cluster tightly within the cluster. Herein, the essence of the local clustering feature, revealed by mathematical manipulation, results in a novel clustering algorithm termed as the special local clustering (SLC) algorithm that was used to process gene microarray data related to Alzheimer’s disease (AD). SLC algorithm was able to group together genes with similar expression patterns and identify significantly varied gene expression values as isolated points. If a gene belongs to a compact class in control data and appears as an isolated point in incipient, moderate and/or severe AD gene microarray data, this gene is possibly associated with AD. Application of a clustering algorithm in disease-associated gene identification such as in AD is rarely reported. PMID:20089478
Block clustering based on difference of convex functions (DC) programming and DC algorithms.
Le, Hoai Minh; Le Thi, Hoai An; Dinh, Tao Pham; Huynh, Van Ngai
2013-10-01
We investigate difference of convex functions (DC) programming and the DC algorithm (DCA) to solve the block clustering problem in the continuous framework, which traditionally requires solving a hard combinatorial optimization problem. DC reformulation techniques and exact penalty in DC programming are developed to build an appropriate equivalent DC program of the block clustering problem. They lead to an elegant and explicit DCA scheme for the resulting DC program. Computational experiments show the robustness and efficiency of the proposed algorithm and its superiority over standard algorithms such as two-mode K-means, two-mode fuzzy clustering, and block classification EM.
Deb, Suash; Yang, Xin-She
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.
Ab initio study on (CO2)n clusters via electrostatics- and molecular tailoring-based algorithm
NASA Astrophysics Data System (ADS)
Jovan Jose, K. V.; Gadre, Shridhar R.
An algorithm based on molecular electrostatic potential (MESP) and molecular tailoring approach (MTA) for building energetically favorable molecular clusters is presented. This algorithm is tested on prototype (CO2)n clusters with n = 13, 20, and 25 to explore their structure, energetics, and properties. The most stable clusters in this series are seen to show more number of triangular motifs. Many-body energy decomposition analysis performed on the most stable clusters reveals that the 2-body is the major contributor (>96%) to the total interaction energy. Vibrational frequencies and molecular electrostatic potentials are also evaluated for these large clusters through MTA. The MTA-based MESPs of these clusters show a remarkably good agreement with the corresponding actual ones. The most intense MTA-based normal mode frequencies are in fair agreement with the actual ones for smaller clusters. These calculated asymmetric stretching frequencies are blue-shifted with reference to the CO2 monomer.
NASA Astrophysics Data System (ADS)
Gatos, Ilias; Tsantis, Stavros; Skouroliakou, Aikaterini; Theotokas, Ioannis; Zoumpoulis, Pavlos S.; Kagadis, George C.
2015-09-01
The aim of the present study is to determine an optimal elasticity cut-off value for discriminating Healthy from Pathological fibrotic patients by means of Fuzzy C-Means automatic segmentation and maximum participation cluster mean value employment in Shear Wave Elastography (SWE) images. The clinical dataset comprised 32 subjects (16 Healthy and 16 histological or Fibroscan verified Chronic Liver Disease). An experienced Radiologist performed SWE measurement placing a region of interest (ROI) on each subject's right liver lobe providing a SWE image for each patient. Subsequently Fuzzy C-Means clustering was performed on every SWE image utilizing 5 clusters. Mean Stiffness value and pixels number of each cluster were calculated. The mean stiffness value feature of the cluster with maximum pixels number was then fed as input for ROC analysis. The selected Mean Stiffness value feature an Area Under the Curve (AUC) of 0.8633 with Optimum Cut-off value of 7.5 kPa with sensitivity and specificity values of 0.8438 and 0.875 and balanced accuracy of 0.8594. Examiner's classification measurements exhibited sensitivity, specificity and balanced accuracy value of 0.8125 with 7.1 kPa cutoff value. A new promising automatic algorithm was implemented with more objective criteria of defining optimum elasticity cut-off values for discriminating fibrosis stages for SWE. More subjects are needed in order to define if this algorithm is an objective tool to outperform manual ROI selection.
NASA Astrophysics Data System (ADS)
Ju, Ying; Zhang, Songming; Ding, Ningxiang; Zeng, Xiangxiang; Zhang, Xingyi
2016-09-01
The field of complex network clustering is gaining considerable attention in recent years. In this study, a multi-objective evolutionary algorithm based on membranes is proposed to solve the network clustering problem. Population are divided into different membrane structures on average. The evolutionary algorithm is carried out in the membrane structures. The population are eliminated by the vector of membranes. In the proposed method, two evaluation objectives termed as Kernel J-means and Ratio Cut are to be minimized. Extensive experimental studies comparison with state-of-the-art algorithms proves that the proposed algorithm is effective and promising.
Ju, Ying; Zhang, Songming; Ding, Ningxiang; Zeng, Xiangxiang; Zhang, Xingyi
2016-01-01
The field of complex network clustering is gaining considerable attention in recent years. In this study, a multi-objective evolutionary algorithm based on membranes is proposed to solve the network clustering problem. Population are divided into different membrane structures on average. The evolutionary algorithm is carried out in the membrane structures. The population are eliminated by the vector of membranes. In the proposed method, two evaluation objectives termed as Kernel J-means and Ratio Cut are to be minimized. Extensive experimental studies comparison with state-of-the-art algorithms proves that the proposed algorithm is effective and promising. PMID:27670156
NASA Technical Reports Server (NTRS)
Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William
2006-01-01
We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.
A new clustering algorithm for scanning electron microscope images
NASA Astrophysics Data System (ADS)
Yousef, Amr; Duraisamy, Prakash; Karim, Mohammad
2016-04-01
A scanning electron microscope (SEM) is a type of electron microscope that produces images of a sample by scanning it with a focused beam of electrons. The electrons interact with the sample atoms, producing various signals that are collected by detectors. The gathered signals contain information about the sample's surface topography and composition. The electron beam is generally scanned in a raster scan pattern, and the beam's position is combined with the detected signal to produce an image. The most common configuration for an SEM produces a single value per pixel, with the results usually rendered as grayscale images. The captured images may be produced with insufficient brightness, anomalous contrast, jagged edges, and poor quality due to low signal-to-noise ratio, grained topography and poor surface details. The segmentation of the SEM images is a tackling problems in the presence of the previously mentioned distortions. In this paper, we are stressing on the clustering of these type of images. In that sense, we evaluate the performance of the well-known unsupervised clustering and classification techniques such as connectivity based clustering (hierarchical clustering), centroid-based clustering, distribution-based clustering and density-based clustering. Furthermore, we propose a new spatial fuzzy clustering technique that works efficiently on this type of images and compare its results against these regular techniques in terms of clustering validation metrics.
A novel harmony search-K means hybrid algorithm for clustering gene expression data.
Nazeer, Ka Abdul; Sebastian, Mp; Kumar, Sd Madhu
2013-01-01
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.
Zhu, Bohui; Ding, Yongsheng; Hao, Kuangrong
2013-01-01
This paper presents a novel maximum margin clustering method with immune evolution (IEMMC) for automatic diagnosis of electrocardiogram (ECG) arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias. PMID:23690875
A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
NASA Astrophysics Data System (ADS)
Zhang, Xian-Kun; Tian, Xue; Li, Ya-Nan; Song, Chen
2014-08-01
The label propagation algorithm (LPA) is a graph-based semi-supervised learning algorithm, which can predict the information of unlabeled nodes by a few of labeled nodes. It is a community detection method in the field of complex networks. This algorithm is easy to implement with low complexity and the effect is remarkable. It is widely applied in various fields. However, the randomness of the label propagation leads to the poor robustness of the algorithm, and the classification result is unstable. This paper proposes a LPA based on edge clustering coefficient. The node in the network selects a neighbor node whose edge clustering coefficient is the highest to update the label of node rather than a random neighbor node, so that we can effectively restrain the random spread of the label. The experimental results show that the LPA based on edge clustering coefficient has made improvement in the stability and accuracy of the algorithm.
A new clustering algorithm applicable to multispectral and polarimetric SAR images
NASA Technical Reports Server (NTRS)
Wong, Yiu-Fai; Posner, Edward C.
1993-01-01
We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm
Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong
2016-01-01
In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis. PMID:27959895
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm.
Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong
2016-01-01
In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.
An Efficient Document Clustering Algorithm and Its Application to a Document Browser.
ERIC Educational Resources Information Center
Tanaka, Hideki; Kumano, Tadashi; Uratani, Noriyoshi; Ehara, Terumasa
1999-01-01
Presents a document-clustering algorithm that uses a term frequency vector for each document in a Japanese collection to produce a hierarchy in the form of a document classification tree. Introduces an application of this algorithm to a Japanese-to-English translation-aid system. (Author/LRW)
GenClust: A genetic algorithm for clustering gene expression data
Di Gesú, Vito; Giancarlo, Raffaele; Lo Bosco, Giosué; Raimondi, Alessandra; Scaturro, Davide
2005-01-01
Background Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, compact and easy to update; (b) it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Conclusion Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology. PMID:16336639
The loop-cluster algorithm for the case of the 6 vertex model
NASA Astrophysics Data System (ADS)
Evertz, Hans Gerd; Marcu, Mihai
1993-03-01
We present the loop algorithm, a new type of cluster algorithm that we recently introduced for the F model. Using the framework of Kandel and Domany, we show how to generalize the algorithm to the arrow flip symmetric 6 vertex model. We propose the principle of least possible freezing as the guide to choosing the values of free parameters in the algorithm. Finally, we briefly discuss the application of our algorithm to simulations of quantum spin systems. In particular, all necessary information is provided for the simulation of spin {1}/{2} Heisenberg and ¢x¢x z models.
Semi-supervised clustering algorithm for haplotype assembly problem based on MEC model.
Xu, Xin-Shun; Li, Ying-Xin
2012-01-01
Haplotype assembly is to infer a pair of haplotypes from localized polymorphism data. In this paper, a semi-supervised clustering algorithm-SSK (semi-supervised K-means) is proposed for it, which, to our knowledge, is the first semi-supervised clustering method for it. In SSK, some positive information is firstly extracted. The information is then used to help k-means to cluster all SNP fragments into two sets from which two haplotypes can be reconstructed. The performance of SSK is tested on both real data and simulated data. The results show that it outperforms several state-of-the-art algorithms on minimum error correction (MEC) model.
CHRONICLE: A Two-Stage Density-Based Clustering Algorithm for Dynamic Networks
NASA Astrophysics Data System (ADS)
Kim, Min-Soo; Han, Jiawei
Information networks, such as social networks and that extracted from bibliographic data, are changing dynamically over time. It is crucial to discover time-evolving communities in dynamic networks. In this paper, we study the problem of finding time-evolving communities such that each community freely forms, evolves, and dissolves for any time period. Although the previous t-partite graph based methods are quite effective for discovering such communities from large-scale dynamic networks, they have some weak points such as finding only stable clusters of single path type and not being scalable w.r.t. the time period. We propose CHRONICLE, an efficient clustering algorithm that discovers not only clusters of single path type but also clusters of path group type. In order to find clusters of both types and also control the dynamicity of clusters, CHRONICLE performs the two-stage density-based clustering, which performs the 2nd-stage density-based clustering for the t-partite graph constructed from the 1st-stage density-based clustering result for each timestamp network. For a given data set, CHRONICLE finds all clusters in a fixed time by using a fixed amount of memory, regardless of the number of clusters and the length of clusters. Experimental results using real data sets show that CHRONICLE finds a wider range of clusters in a shorter time with a much smaller amount of memory than the previous method.
A clustering algorithm based on two distance functions for MEC model.
Wang, Ying; Feng, Enmin; Wang, Ruisheng
2007-04-01
Haplotype reconstruction, based on aligned single nucleotide polymorphism (SNP) fragments, is to infer a pair of haplotypes from localized polymorphism data gathered through short genome fragment assembly. This paper first presents two distance functions, which are used to measure the difference degree and similarity degree between SNP fragments. Based on the two distance functions, a clustering algorithm is proposed in order to solve MEC model. The algorithm involves two sections. One is to determine the initial haplotype pair, the other concerns with inferring true haplotype pair by re-clustering. The comparison results prove that our algorithm utilizing two distance functions is effective and feasible.
Empirical relations between static and dynamic exponents for Ising model cluster algorithms
NASA Astrophysics Data System (ADS)
Coddington, Paul D.; Baillie, Clive F.
1992-02-01
We have measured the autocorrelations for the Swendsen-Wang and the Wolff cluster update algorithms for the Ising model in two, three, and four dimensions. The data for the Wolff algorithm suggest that the autocorrelations are linearly related to the specific heat, in which case the dynamic critical exponent is zint,EW=α/ν. For the Swendsen-Wang algorithm, scaling the autocorrelations by the average maximum cluster size gives either a constant or a logarithm, which implies that zint,ESW=β/ν for the Ising model.
An improved clustering algorithm of tunnel monitoring data for cloud computing.
Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing
2014-01-01
With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data.
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network.
Vimalarani, C; Subramanian, R; Sivanandam, S N
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption.
Ju, Chunhua
2013-01-01
Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525
GPU-based single-cluster algorithm for the simulation of the Ising model
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2012-02-01
We present the GPU calculation with the common unified device architecture (CUDA) for the Wolff single-cluster algorithm of the Ising model. Proposing an algorithm for a quasi-block synchronization, we realize the Wolff single-cluster Monte Carlo simulation with CUDA. We perform parallel computations for the newly added spins in the growing cluster. As a result, the GPU calculation speed for the two-dimensional Ising model at the critical temperature with the linear size L = 4096 is 5.60 times as fast as the calculation speed on a current CPU core. For the three-dimensional Ising model with the linear size L = 256, the GPU calculation speed is 7.90 times as fast as the CPU calculation speed. The idea of quasi-block synchronization can be used not only in the cluster algorithm but also in many fields where the synchronization of all threads is required.
Uy, D.L.
1996-02-01
An algorithm for detection and identification of image clusters or {open_quotes}blobs{close_quotes} based on color information for an autonomous mobile robot is developed. The input image data are first processed using a crisp color fuszzyfier, a binary smoothing filter, and a median filter. The processed image data is then inputed to the image clusters detection and identification program. The program employed the concept of {open_quotes}elastic rectangle{close_quotes}that stretches in such a way that the whole blob is finally enclosed in a rectangle. A C-program is develop to test the algorithm. The algorithm is tested only on image data of 8x8 sizes with different number of blobs in them. The algorithm works very in detecting and identifying image clusters.
A fast density-based clustering algorithm for real-time Internet of Things stream.
Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.
Efficient cluster Monte Carlo algorithm for Ising spin glasses in more than two space dimensions
NASA Astrophysics Data System (ADS)
Ochoa, Andrew J.; Zhu, Zheng; Katzgraber, Helmut G.
2015-03-01
A cluster algorithm that speeds up slow dynamics in simulations of nonplanar Ising spin glasses away from criticality is urgently needed. In theory, the cluster algorithm proposed by Houdayer poses no advantage over local moves in systems with a percolation threshold below 50%, such as cubic lattices. However, we show that the frustration present in Ising spin glasses prevents the growth of system-spanning clusters at temperatures roughly below the characteristic energy scale J of the problem. Adding Houdayer cluster moves to simulations of Ising spin glasses for T ~ J produces a speedup that grows with the system size over conventional local moves. We show results for the nonplanar quasi-two-dimensional Chimera graph of the D-Wave Two quantum annealer, as well as conventional three-dimensional Ising spin glasses, where in both cases the addition of cluster moves speeds up thermalization visibly in the physically-interesting low temperature regime.
A Class of Manifold Regularized Multiplicative Update Algorithms for Image Clustering.
Yang, Shangming; Yi, Zhang; He, Xiaofei; Li, Xuelong
2015-12-01
Multiplicative update algorithms are important tools for information retrieval, image processing, and pattern recognition. However, when the graph regularization is added to the cost function, different classes of sample data may be mapped to the same subspace, which leads to the increase of data clustering error rate. In this paper, an improved nonnegative matrix factorization (NMF) cost function is introduced. Based on the cost function, a class of novel graph regularized NMF algorithms is developed, which results in a class of extended multiplicative update algorithms with manifold structure regularization. Analysis shows that in the learning, the proposed algorithms can efficiently minimize the rank of the data representation matrix. Theoretical results presented in this paper are confirmed by simulations. For different initializations and data sets, variation curves of cost functions and decomposition data are presented to show the convergence features of the proposed update rules. Basis images, reconstructed images, and clustering results are utilized to present the efficiency of the new algorithms. Last, the clustering accuracies of different algorithms are also investigated, which shows that the proposed algorithms can achieve state-of-the-art performance in applications of image clustering.
Zhong, Wei; Altun, Gulsah; Harrison, Robert; Tai, Phang C; Pan, Yi
2005-09-01
Information about local protein sequence motifs is very important to the analysis of biologically significant conserved regions of protein sequences. These conserved regions can potentially determine the diverse conformation and activities of proteins. In this work, recurring sequence motifs of proteins are explored with an improved K-means clustering algorithm on a new dataset. The structural similarity of these recurring sequence clusters to produce sequence motifs is studied in order to evaluate the relationship between sequence motifs and their structures. To the best of our knowledge, the dataset used by our research is the most updated dataset among similar studies for sequence motifs. A new greedy initialization method for the K-means algorithm is proposed to improve traditional K-means clustering techniques. The new initialization method tries to choose suitable initial points, which are well separated and have the potential to form high-quality clusters. Our experiments indicate that the improved K-means algorithm satisfactorily increases the percentage of sequence segments belonging to clusters with high structural similarity. Careful comparison of sequence motifs obtained by the improved and traditional algorithms also suggests that the improved K-means clustering algorithm may discover some relatively weak and subtle sequence motifs, which are undetectable by the traditional K-means algorithms. Many biochemical tests reported in the literature show that these sequence motifs are biologically meaningful. Experimental results also indicate that the improved K-means algorithm generates more detailed sequence motifs representing common structures than previous research. Furthermore, these motifs are universally conserved sequence patterns across protein families, overcoming some weak points of other popular sequence motifs. The satisfactory result of the experiment suggests that this new K-means algorithm may be applied to other areas of bioinformatics
Two generalizations of Kohonen clustering
NASA Technical Reports Server (NTRS)
Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.
1993-01-01
The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.
NASA Astrophysics Data System (ADS)
Ball, R. C.; Lee, J. R.
1996-03-01
We prove that a new, irreversible growth algorithm, Non-Deletion Reaction-Limited Cluster-cluster Aggregation (NDRLCA), produces equilibrium Branched Polymers, expected to exhibit Lattice Animal statistics [1]. We implement NDRLCA, off-lattice, as a computer simulation for embedding dimension d=2 and 3, obtaining values for critical exponents, fractal dimension D and cluster mass distribution exponent tau: d=2, D≈ 1.53± 0.05, tau = 1.09± 0.06; d=3, D=1.96± 0.04, tau =1.50± 0.04 in good agreement with theoretical LA values. The simulation results do not support recent suggestions [2] that BPs may be in the same universality class as percolation. We also obtain values for a model-dependent critical “fugacity”, z_c and investigate the finite-size effects of our simulation, quantifying notions of “inbreeding” that occur in this algorithm. Finally we use an extension of the NDRLCA proof to show that standard Reaction-Limited Cluster-cluster Aggregation is very unlikely to be in the same universality class as Branched Polymers/Lattice Animals unless the backnone dimension for the latter is considerably less than the published value.
A Community Detection Algorithm Based on Topology Potential and Spectral Clustering
Wang, Zhixiao; Chen, Zhaotong; Zhao, Ya; Chen, Shaoda
2014-01-01
Community detection is of great value for complex networks in understanding their inherent law and predicting their behavior. Spectral clustering algorithms have been successfully applied in community detection. This kind of methods has two inadequacies: one is that the input matrixes they used cannot provide sufficient structural information for community detection and the other is that they cannot necessarily derive the proper community number from the ladder distribution of eigenvector elements. In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with nodes' topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes. Experiments results showed that the new algorithm gave excellent performance on artificial networks and real world networks and outperforms other community detection methods. PMID:25147846
Salem, Sameh A; Salem, Nancy M; Nandi, Asoke K
2007-03-01
In this paper, segmentation of blood vessels from colour retinal images using a novel clustering algorithm with a partial supervision strategy is proposed. The proposed clustering algorithm, which is a RAdius based Clustering ALgorithm (RACAL), uses a distance based principle to map the distributions of the data by utilising the premise that clusters are determined by a distance parameter, without having to specify the number of clusters. Additionally, the proposed clustering algorithm is enhanced with a partial supervision strategy and it is demonstrated that it is able to segment blood vessels of small diameters and low contrasts. Results are compared with those from the KNN classifier and show that the proposed RACAL performs better than the KNN in case of abnormal images as it succeeds in segmenting small and low contrast blood vessels, while it achieves comparable results for normal images. For automation process, RACAL can be used as a classifier and results show that it performs better than the KNN classifier in both normal and abnormal images.
A pairwise alignment algorithm which favors clusters of blocks.
Nédélec, Elodie; Moncion, Thomas; Gassiat, Elisabeth; Bossard, Bruno; Duchateau-Nguyen, Guillemette; Denise, Alain; Termier, Michel
2005-01-01
Pairwise sequence alignments aim to decide whether two sequences are related and, if so, to exhibit their related domains. Recent works have pointed out that a significant number of true homologous sequences are missed when using classical comparison algorithms. This is the case when two homologous sequences share several little blocks of homology, too small to lead to a significant score. On the other hand, classical alignment algorithms, when detecting homologies, may fail to recognize all the significant biological signals. The aim of the paper is to give a solution to these two problems. We propose a new scoring method which tends to increase the score of an alignment when "blocks" are detected. This so-called Block-Scoring algorithm, which makes use of dynamic programming, is worth being used as a complementary tool to classical exact alignments methods. We validate our approach by applying it on a large set of biological data. Finally, we give a limit theorem for the score statistics of the algorithm.
Longitudinally-invariant k⊥-clustering algorithms for hadron-hadron collisions
NASA Astrophysics Data System (ADS)
Catani, S.; Dokshitzer, Yu. L.; Seymour, M. H.; Webber, B. R.
1993-09-01
We propose a version of the QCD-motivated " k⊥" jet-clustering algorithm for hadron-hadron collisions which is invariant under boosts along the beam directions. This leads to improved factorization properties and closer correspondence to experimental practice at hadron colliders. We examine alternative definitions of the resolution variables and cluster recombination scheme, and show that the algorithm can be implemented efficiently on a computer to provide a full clustering history of each event. Using simulated data at √ S = 1.8 TeV, we study the effects of calorimeter segmentation, hadronization and the soft underlying event, and compare the results with those obtained using a conventional cone-type algorithm.
NASA Astrophysics Data System (ADS)
Cruz, S. M. A.; Marques, J. M. C.; Pereira, F. B.
2016-10-01
We propose improvements to our evolutionary algorithm (EA) [J. M. C. Marques and F. B. Pereira, J. Mol. Liq. 210, 51 (2015)] in order to avoid dissociative solutions in the global optimization of clusters with competing attractive and repulsive interactions. The improved EA outperforms the original version of the method for charged colloidal clusters in the size range 3 ≤ N ≤ 25, which is a very stringent test for global optimization algorithms. While the Bernal spiral is the global minimum for clusters in the interval 13 ≤ N ≤ 18, the lowest-energy structure is a peculiar, so-called beaded-necklace, motif for 19 ≤ N ≤ 25. We have also applied the method for larger sizes and unusual quasi-linear and branched clusters arise as low-energy structures.
Tuning a Major Part of a Clustering Algorithm.
1988-02-01
with core points identified by the local density algorithm TOTAL MINORITES t-4.5* t.A4* Median 8 Ilb uHinge 10 20 Max 14 44 (Mean) 8.1 15.4 *Actual...2.7) curent* average-linkage" coniplete-linkage*** Median 25h 47h 35 u~linge 33 65 39 Max 73 97 56 (Mean) 29.7 51.1 36.0 *Actual total minorites
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network.
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-02-19
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency.
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-01-01
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
The Development of FPGA-Based Pseudo-Iterative Clustering Algorithms
NASA Astrophysics Data System (ADS)
Drueke, Elizabeth; Fisher, Wade; Plucinski, Pawel
2016-03-01
The Large Hadron Collider (LHC) in Geneva, Switzerland, is set to undergo major upgrades in 2025 in the form of the High-Luminosity Large Hadron Collider (HL-LHC). In particular, several hardware upgrades are proposed to the ATLAS detector, one of the two general purpose detectors. These hardware upgrades include, but are not limited to, a new hardware-level clustering algorithm, to be performed by a field programmable gate array, or FPGA. In this study, we develop that clustering algorithm and compare the output to a Python-implemented topoclustering algorithm developed at the University of Oregon. Here, we present the agreement between the FPGA output and expected output, with particular attention to the time required by the FPGA to complete the algorithm and other limitations set by the FPGA itself.
Node Non-Uniform Deployment Based on Clustering Algorithm for Underwater Sensor Networks
Jiang, Peng; Liu, Jun; Wu, Feng
2015-01-01
A node non-uniform deployment based on clustering algorithm for underwater sensor networks (UWSNs) is proposed in this study. This algorithm is proposed because optimizing network connectivity rate and network lifetime is difficult for the existing node non-uniform deployment algorithms under the premise of improving the network coverage rate for UWSNs. A high network connectivity rate is achieved by determining the heterogeneous communication ranges of nodes during node clustering. Moreover, the concept of aggregate contribution degree is defined, and the nodes with lower aggregate contribution degrees are used to substitute the dying nodes to decrease the total movement distance of nodes and prolong the network lifetime. Simulation results show that the proposed algorithm can achieve a better network coverage rate and network connectivity rate, as well as decrease the total movement distance of nodes and prolong the network lifetime. PMID:26633408
An Efficient Algorithm for Clustering of Large-Scale Mass Spectrometry Data.
Saeed, Fahad; Pisitkun, Trairak; Knepper, Mark A; Hoffert, Jason D
2012-10-04
High-throughput spectrometers are capable of producing data sets containing thousands of spectra for a single biological sample. These data sets contain a substantial amount of redundancy from peptides that may get selected multiple times in a LC-MS/MS experiment. In this paper, we present an efficient algorithm, CAMS (Clustering Algorithm for Mass Spectra) for clustering mass spectrometry data which increases both the sensitivity and confidence of spectral assignment. CAMS utilizes a novel metric, called F-set, that allows accurate identification of the spectra that are similar. A graph theoretic framework is defined that allows the use of F-set metric efficiently for accurate cluster identifications. The accuracy of the algorithm is tested on real HCD and CID data sets with varying amounts of peptides. Our experiments show that the proposed algorithm is able to cluster spectra with very high accuracy in a reasonable amount of time for large spectral data sets. Thus, the algorithm is able to decrease the computational time by compressing the data sets while increasing the throughput of the data by interpreting low S/N spectra.
Critical slowing down of cluster algorithms for Ising models coupled to 2-d gravity
NASA Astrophysics Data System (ADS)
Bowick, Mark; Falcioni, Marco; Harris, Geoffrey; Marinari, Enzo
1994-02-01
We simulate single and multiple Ising models coupled to 2-d gravity using both the Swendsen-Wang and Wolff algorithms to update the spins. We study the integrated autocorrelation time and find that there is considerable critical slowing down, particularly in the magnetization. We argue that this is primarily due to the local nature of the dynamical triangulation algorithm and to the generation of a distribution of baby universes which inhibits cluster growth.
An improved K-means clustering algorithm in agricultural image segmentation
NASA Astrophysics Data System (ADS)
Cheng, Huifeng; Peng, Hui; Liu, Shanmei
Image segmentation is the first important step to image analysis and image processing. In this paper, according to color crops image characteristics, we firstly transform the color space of image from RGB to HIS, and then select proper initial clustering center and cluster number in application of mean-variance approach and rough set theory followed by clustering calculation in such a way as to automatically segment color component rapidly and extract target objects from background accurately, which provides a reliable basis for identification, analysis, follow-up calculation and process of crops images. Experimental results demonstrate that improved k-means clustering algorithm is able to reduce the computation amounts and enhance precision and accuracy of clustering.
A multilevel gamma-clustering layout algorithm for visualization of biological networks.
Hruz, Tomas; Wyss, Markus; Lucas, Christoph; Laule, Oliver; von Rohr, Peter; Zimmermann, Philip; Bleuler, Stefan
2013-01-01
Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel γ -clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs.
The Geometric Cluster Algorithm: Rejection-Free Monte Carlo Simulation of Complex Fluids
NASA Astrophysics Data System (ADS)
Luijten, Erik
2005-03-01
The study of complex fluids is an area of intense research activity, in which exciting and counter-intuitive behavior continue to be uncovered. Ironically, one of the very factors responsible for such interesting properties, namely the presence of multiple relevant time and length scales, often greatly complicates accurate theoretical calculations and computer simulations that could explain the observations. We have recently developed a new Monte Carlo simulation methodootnotetextJ. Liu and E. Luijten, Phys. Rev. Lett.92, 035504 (2004); see also Physics Today, March 2004, pp. 25--27. that overcomes this problem for several classes of complex fluids. Our approach can accelerate simulations by orders of magnitude by introducing nonlocal, collective moves of the constituents. Strikingly, these cluster Monte Carlo moves are proposed in such a manner that the algorithm is rejection-free. The identification of the clusters is based upon geometric symmetries and can be considered as the off-latice generalization of the widely-used Swendsen--Wang and Wolff algorithms for lattice spin models. While phrased originally for complex fluids that are governed by the Boltzmann distribution, the geometric cluster algorithm can be used to efficiently sample configurations from an arbitrary underlying distribution function and may thus be applied in a variety of other areas. In addition, I will briefly discuss various extensions of the original algorithm, including methods to influence the size of the clusters that are generated and ways to introduce density fluctuations.
Cluster algorithm for two-dimensional U(1) lattice gauge theory
NASA Astrophysics Data System (ADS)
Sinclair, R.
1992-03-01
We use gauge fixing to rewrite the two-dimensional U(1) pure gauge model with Wilson action and periodic boundary conditions as a nonfrustrated XY model on a closed chain. The Wolff single-cluster algorithm is then applied, eliminating critical slowing down of topological modes and Polyakov loops.
Solving the depth of the repeated texture areas based on the clustering algorithm
NASA Astrophysics Data System (ADS)
Xiong, Zhang; Zhang, Jun; Tian, Jinwen
2015-12-01
The reconstruction of the 3D scene in the monocular stereo vision needs to get the depth of the field scenic points in the picture scene. But there will inevitably be error matching in the process of image matching, especially when there are a large number of repeat texture areas in the images, there will be lots of error matches. At present, multiple baseline stereo imaging algorithm is commonly used to eliminate matching error for repeated texture areas. This algorithm can eliminate the ambiguity correspond to common repetition texture. But this algorithm has restrictions on the baseline, and has low speed. In this paper, we put forward an algorithm of calculating the depth of the matching points in the repeat texture areas based on the clustering algorithm. Firstly, we adopt Gauss Filter to preprocess the images. Secondly, we segment the repeated texture regions in the images into image blocks by using spectral clustering segmentation algorithm based on super pixel and tag the image blocks. Then, match the two images and solve the depth of the image. Finally, the depth of the image blocks takes the median in all depth values of calculating point in the bock. So the depth of repeated texture areas is got. The results of a lot of image experiments show that the effect of our algorithm for calculating the depth of repeated texture areas is very good.
An effective trust-based recommendation method using a novel graph clustering algorithm
NASA Astrophysics Data System (ADS)
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
Performance evaluation of simple linear iterative clustering algorithm on medical image processing.
Cong, Jinyu; Wei, Benzheng; Yin, Yilong; Xi, Xiaoming; Zheng, Yuanjie
2014-01-01
Simple Linear Iterative Clustering (SLIC) algorithm is increasingly applied to different kinds of image processing because of its excellent perceptually meaningful characteristics. In order to better meet the needs of medical image processing and provide technical reference for SLIC on the application of medical image segmentation, two indicators of boundary accuracy and superpixel uniformity are introduced with other indicators to systematically analyze the performance of SLIC algorithm, compared with Normalized cuts and Turbopixels algorithm. The extensive experimental results show that SLIC is faster and less sensitive to the image type and the setting superpixel number than other similar algorithms such as Turbopixels and Normalized cuts algorithms. And it also has a great benefit to the boundary recall, the robustness of fuzzy boundary, the setting superpixel size and the segmentation performance on medical image segmentation.
Unsupervised unstained cell detection by SIFT keypoint clustering and self-labeling algorithm.
Muallal, Firas; Schöll, Simon; Sommerfeldt, Björn; Maier, Andreas; Steidl, Stefan; Buchholz, Rainer; Hornegger, Joachim
2014-01-01
We propose a novel unstained cell detection algorithm based on unsupervised learning. The algorithm utilizes the scale invariant feature transform (SIFT), a self-labeling algorithm, and two clustering steps in order to achieve high performance in terms of time and detection accuracy. Unstained cell imaging is dominated by phase contrast and bright field microscopy. Therefore, the algorithm was assessed on images acquired using these two modalities. Five cell lines having in total 37 images and 7250 cells were considered for the evaluation: CHO, L929, Sf21, HeLa, and Bovine cells. The obtained F-measures were between 85.1 and 89.5. Compared to the state-of-the-art, the algorithm achieves very close F-measure to the supervised approaches in much less time.
BMI optimization by using parallel UNDX real-coded genetic algorithm with Beowulf cluster
NASA Astrophysics Data System (ADS)
Handa, Masaya; Kawanishi, Michihiro; Kanki, Hiroshi
2007-12-01
This paper deals with the global optimization algorithm of the Bilinear Matrix Inequalities (BMIs) based on the Unimodal Normal Distribution Crossover (UNDX) GA. First, analyzing the structure of the BMIs, the existence of the typical difficult structures is confirmed. Then, in order to improve the performance of algorithm, based on results of the problem structures analysis and consideration of BMIs characteristic properties, we proposed the algorithm using primary search direction with relaxed Linear Matrix Inequality (LMI) convex estimation. Moreover, in these algorithms, we propose two types of evaluation methods for GA individuals based on LMI calculation considering BMI characteristic properties more. In addition, in order to reduce computational time, we proposed parallelization of RCGA algorithm, Master-Worker paradigm with cluster computing technique.
NASA Astrophysics Data System (ADS)
You, Tao; Cheng, Hui-Min; Ning, Yi-Zi; Shia, Ben-Chang; Zhang, Zhong-Yuan
2016-12-01
Like clustering analysis, community detection aims at assigning nodes in a network into different communities. Fdp is a recently proposed density-based clustering algorithm which does not need the number of clusters as prior input and the result is insensitive to its parameter. However, Fdp cannot be directly applied to community detection due to its inability to recognize the community centers in the network. To solve the problem, a new community detection method (named IsoFdp) is proposed in this paper. First, we use IsoMap technique to map the network data into a low dimensional manifold which can reveal diverse pair-wised similarity. Then Fdp is applied to detect the communities in the network. An improved partition density function is proposed to select the proper number of communities automatically. We test our method on both synthetic and real-world networks, and the results demonstrate the effectiveness of our algorithm over the state-of-the-art methods.
Hybridization of evolutionary algorithms and local search by means of a clustering method.
Martínez-Estudillo, Alfonso C; Hervás-Martínez, César; Martínez-Estudillo, Francisco J; García-Pedrajas, Nicolás
2006-06-01
This paper presents a hybrid evolutionary algorithm (EA) to solve nonlinear-regression problems. Although EAs have proven their ability to explore large search spaces, they are comparatively inefficient in fine tuning the solution. This drawback is usually avoided by means of local optimization algorithms that are applied to the individuals of the population. The algorithms that use local optimization procedures are usually called hybrid algorithms. On the other hand, it is well known that the clustering process enables the creation of groups (clusters) with mutually close points that hopefully correspond to relevant regions of attraction. Local-search procedures can then be started once in every such region. This paper proposes the combination of an EA, a clustering process, and a local-search procedure to the evolutionary design of product-units neural networks. In the methodology presented, only a few individuals are subject to local optimization. Moreover, the local optimization algorithm is only applied at specific stages of the evolutionary process. Our results show a favorable performance when the regression method proposed is compared to other standard methods.
Dong, Feng; Pierpaoli, Elena; Gunn, James E.; Wechsler, Risa H.
2007-10-29
We present a modified adaptive matched filter algorithm designed to identify clusters of galaxies in wide-field imaging surveys such as the Sloan Digital Sky Survey. The cluster-finding technique is fully adaptive to imaging surveys with spectroscopic coverage, multicolor photometric redshifts, no redshift information at all, and any combination of these within one survey. It works with high efficiency in multi-band imaging surveys where photometric redshifts can be estimated with well-understood error distributions. Tests of the algorithm on realistic mock SDSS catalogs suggest that the detected sample is {approx} 85% complete and over 90% pure for clusters with masses above 1.0 x 10{sup 14}h{sup -1} M and redshifts up to z = 0.45. The errors of estimated cluster redshifts from maximum likelihood method are shown to be small (typically less that 0.01) over the whole redshift range with photometric redshift errors typical of those found in the Sloan survey. Inside the spherical radius corresponding to a galaxy overdensity of {Delta} = 200, we find the derived cluster richness {Lambda}{sub 200} a roughly linear indicator of its virial mass M{sub 200}, which well recovers the relation between total luminosity and cluster mass of the input simulation.
Numerical convergence and interpretation of the fuzzy c-shells clustering algorithm.
Bezdek, J C; Hathaway, R J
1992-01-01
R. N. Dave's (1990) version of fuzzy c-shells is an iterative clustering algorithm which requires the application of Newton's method or a similar general optimization technique at each half step in any sequence of iterates for minimizing the associated objective function. An important computational question concerns the accuracy of the solution required at each half step within the overall iteration. The general convergence theory for grouped coordination minimization is applied to this question to show that numerically exact solution of the half-step subproblems in Dave's algorithm is not necessary. One iteration of Newton's method in each coordinate minimization half step yields a sequence obtained using the fuzzy c-shells algorithm with numerically exact coordinate minimization at each half step. It is shown that fuzzy c-shells generates hyperspherical prototypes to the clusters it finds for certain special cases of the measure of dissimilarity used.
Haplotype-based quantitative trait mapping using a clustering algorithm
Li, Jing; Zhou, Yingyao; Elston, Robert C
2006-01-01
Background With the availability of large-scale, high-density single-nucleotide polymorphism (SNP) markers, substantial effort has been made in identifying disease-causing genes using linkage disequilibrium (LD) mapping by haplotype analysis of unrelated individuals. In addition to complex diseases, many continuously distributed quantitative traits are of primary clinical and health significance. However the development of association mapping methods using unrelated individuals for quantitative traits has received relatively less attention. Results We recently developed an association mapping method for complex diseases by mining the sharing of haplotype segments (i.e., phased genotype pairs) in affected individuals that are rarely present in normal individuals. In this paper, we extend our previous work to address the problem of quantitative trait mapping from unrelated individuals. The method is non-parametric in nature, and statistical significance can be obtained by a permutation test. It can also be incorporated into the one-way ANCOVA (analysis of covariance) framework so that other factors and covariates can be easily incorporated. The effectiveness of the approach is demonstrated by extensive experimental studies using both simulated and real data sets. The results show that our haplotype-based approach is more robust than two statistical methods based on single markers: a single SNP association test (SSA) and the Mann-Whitney U-test (MWU). The algorithm has been incorporated into our existing software package called HapMiner, which is available from our website at . Conclusion For QTL (quantitative trait loci) fine mapping, to identify QTNs (quantitative trait nucleotides) with realistic effects (the contribution of each QTN less than 10% of total variance of the trait), large samples sizes (≥ 500) are needed for all the methods. The overall performance of HapMiner is better than that of the other two methods. Its effectiveness further depends on other
NASA Astrophysics Data System (ADS)
Othman, Khairulnizam; Ahmad, Afandi
2016-11-01
In this research we explore the application of normalize denoted new techniques in advance fast c-mean in to the problem of finding the segment of different breast tissue regions in mammograms. The goal of the segmentation algorithm is to see if new denotes fuzzy c- mean algorithm could separate different densities for the different breast patterns. The new density segmentation is applied with multi-selection of seeds label to provide the hard constraint, whereas the seeds labels are selected based on user defined. New denotes fuzzy c- mean have been explored on images of various imaging modalities but not on huge format digital mammograms just yet. Therefore, this project is mainly focused on using normalize denoted new techniques employed in fuzzy c-mean to perform segmentation to increase visibility of different breast densities in mammography images. Segmentation of the mammogram into different mammographic densities is useful for risk assessment and quantitative evaluation of density changes. Our proposed methodology for the segmentation of mammograms on the basis of their region into different densities based categories has been tested on MIAS database and Trueta Database.
A priori data-driven multi-clustered reservoir generation algorithm for echo state network.
Li, Xiumin; Zhong, Ling; Xue, Fangzheng; Zhang, Anguo
2015-01-01
Echo state networks (ESNs) with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision.
An adaptive enhancement algorithm for infrared video based on modified k-means clustering
NASA Astrophysics Data System (ADS)
Zhang, Linze; Wang, Jingqi; Wu, Wen
2016-09-01
In this paper, we have proposed a video enhancement algorithm to improve the output video of the infrared camera. Sometimes the video obtained by infrared camera is very dark since there is no clear target. In this case, infrared video should be divided into frame images by frame extraction, in order to carry out the image enhancement. For the first frame image, which can be divided into k sub images by using K-means clustering according to the gray interval it occupies before k sub images' histogram equalization according to the amount of information per sub image, we used a method to solve a problem that final cluster centers close to each other in some cases; and for the other frame images, their initial cluster centers can be determined by the final clustering centers of the previous ones, and the histogram equalization of each sub image will be carried out after image segmentation based on K-means clustering. The histogram equalization can make the gray value of the image to the whole gray level, and the gray level of each sub image is determined by the ratio of pixels to a frame image. Experimental results show that this algorithm can improve the contrast of infrared video where night target is not obvious which lead to a dim scene, and reduce the negative effect given by the overexposed pixels adaptively in a certain range.
Density-based cluster algorithms for the identification of core sets
NASA Astrophysics Data System (ADS)
Lemke, Oliver; Keller, Bettina G.
2016-10-01
The core-set approach is a discretization method for Markov state models of complex molecular dynamics. Core sets are disjoint metastable regions in the conformational space, which need to be known prior to the construction of the core-set model. We propose to use density-based cluster algorithms to identify the cores. We compare three different density-based cluster algorithms: the CNN, the DBSCAN, and the Jarvis-Patrick algorithm. While the core-set models based on the CNN and DBSCAN clustering are well-converged, constructing core-set models based on the Jarvis-Patrick clustering cannot be recommended. In a well-converged core-set model, the number of core sets is up to an order of magnitude smaller than the number of states in a conventional Markov state model with comparable approximation error. Moreover, using the density-based clustering one can extend the core-set method to systems which are not strongly metastable. This is important for the practical application of the core-set method because most biologically interesting systems are only marginally metastable. The key point is to perform a hierarchical density-based clustering while monitoring the structure of the metric matrix which appears in the core-set method. We test this approach on a molecular-dynamics simulation of a highly flexible 14-residue peptide. The resulting core-set models have a high spatial resolution and can distinguish between conformationally similar yet chemically different structures, such as register-shifted hairpin structures.
Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs
NASA Astrophysics Data System (ADS)
Choi, Woo-Yong; Chatterjee, Mainak
2015-03-01
With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.
K-Means Re-Clustering-Algorithmic Options with Quantifiable Performance Comparisons
Meyer, A W; Paglieroni, D; Asteneh, C
2002-12-17
This paper presents various architectural options for implementing a K-Means Re-Clustering algorithm suitable for unsupervised segmentation of hyperspectral images. Performance metrics are developed based upon quantitative comparisons of convergence rates and segmentation quality. A methodology for making these comparisons is developed and used to establish K values that produce the best segmentations with minimal processing requirements. Convergence rates depend on the initial choice of cluster centers. Consequently, this same methodology may be used to evaluate the effectiveness of different initialization techniques.
What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm.
Raykov, Yordan P; Boukouvalas, Alexis; Baig, Fahd; Little, Max A
The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.
What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm
Baig, Fahd; Little, Max A.
2016-01-01
The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism. PMID:27669525
Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
NASA Astrophysics Data System (ADS)
Singh, Sudhakar; Garg, Rakhi; Mishra, P. K.
2015-10-01
Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm. We emphasize and investigate the significance of these three data structures for Apriori algorithm on Hadoop cluster, which has not been given attention yet. Experiments are carried out on both real life and synthetic datasets which shows that hash table trie data structures performs far better than trie and hash tree in terms of execution time. Moreover the performance in case of hash tree becomes worst.
Climatological analyses of LMA data with an open-source lightning flash-clustering algorithm
NASA Astrophysics Data System (ADS)
Fuchs, Brody R.; Bruning, Eric C.; Rutledge, Steven A.; Carey, Lawrence D.; Krehbiel, Paul R.; Rison, William
2016-07-01
Approximately 63 million lightning flashes have been identified and analyzed from multiple years of Washington, D. C., northern Alabama, and northeast Colorado lightning mapping array (LMA) data using an open-source flash-clustering algorithm. LMA networks detect radiation produced by lightning breakdown processes, allowing for high-resolution mapping of lightning flashes. Similar to other existing clustering algorithms, the algorithm described herein groups lightning-produced radiation sources by space and time to estimate total flash counts and information about each detected flash. Various flash characteristics and their sensitivity to detection efficiency are investigated to elucidate biases in the algorithm, detail detection efficiencies of various LMAs, and guide future improvements. Furthermore, flash density values in each region are compared to corresponding satellite estimates. While total flash density values produced by the algorithm in Washington, D. C. ( 20 flashes km-2 yr-1), and Alabama ( 35 flashes km-2 yr-1) are within 50% of satellite estimates, LMA-based estimates are approximately a factor of 3 larger (50 flashes km-2 yr-1) than satellite estimates in northeast Colorado. Accordingly, estimates of the ratio of in-cloud to cloud-to-ground flashes near the LMA network ( 20) are approximately a factor of 3 larger than satellite estimates in Colorado. These large differences between estimates may be related to the distinct environment conducive to intense convection, low-altitude flashes, and unique charge structures in northeast Colorado.
An Auto-Recognizing System for Dice Games Using a Modified Unsupervised Grey Clustering Algorithm
Huang, Kuo-Yi
2008-01-01
In this paper, a novel identification method based on a machine vision system is proposed to recognize the score of dice. The system employs image processing techniques, and the modified unsupervised grey clustering algorithm (MUGCA) to estimate the location of each die and identify the spot number accurately and effectively. The proposed algorithms are substituted for manual recognition. From the experimental results, it is found that this system is excellent due to its good capabilities which include flexibility, high speed, and high accuracy. PMID:27879761
Phillips, W E; Velthuizen, R P; Phuphanich, S; Hall, L O; Clarke, L P; Silbiger, M L
1995-01-01
The application of a raw data-based, operator-independent MR segmentation technique to differentiate boundaries of tumor from edema or hemorrhage is demonstrated. A case of a glioblastoma multiforme with gross and histopathologic correlation is presented. The MR image data set was segmented into tissue classes based on three different MR weighted image parameters (T1-, proton density-, and T2-weighted) using unsupervised fuzzy c-means (FCM) clustering algorithm technique for pattern recognition. A radiological examination of the MR images and correlation with fuzzy clustering segmentations was performed. Results were confirmed by gross and histopathology which, to the best of our knowledge, reports the first application of this demanding approach. Based on the results of neuropathologic correlation, the application of FCM MR image segmentation to several MR images of a glioblastoma multiforme represents a viable technique for displaying diagnostically relevant tissue contrast information used in 3D volume reconstruction. With this technique, it is possible to generate segmentation images that display clinically important neuroanatomic and neuropathologic tissue contrast information from raw MR image data.
Tong, Zhen; Pu, Lixin; Dong, Fangjie
2013-08-01
As a common malignant tumor, breast cancer has seriously affected women's physical and psychological health even threatened their lives. Breast cancer has even begun to show a gradual trend of high incidence in some places in the world. As a kind of common pathological assist diagnosis technique, immunohistochemical technique plays an important role in the diagnosis of breast cancer. Usually, Pathologists isolate positive cells from the stained specimen which were processed by immunohistochemical technique and calculate the ratio of positive cells which is a core indicator of breast cancer in diagnosis. In this paper, we present a new algorithm which was based on modified watershed algorithm and concavity points searching to identify the positive cells and segment the clustered cells automatically, and then realize automatic counting. By comparison of the results of our experiments with those of other methods, our method can exactly segment the clustered cells without losing any geometrical cell features and give the exact number of separating cells.
Ishii, Satoshi; Kadota, Koji; Senoo, Keishi
2009-09-01
DNA fingerprinting analysis such as amplified ribosomal DNA restriction analysis (ARDRA), repetitive extragenic palindromic PCR (rep-PCR), ribosomal intergenic spacer analysis (RISA), and denaturing gradient gel electrophoresis (DGGE) are frequently used in various fields of microbiology. The major difficulty in DNA fingerprinting data analysis is the alignment of multiple peak sets. We report here an R program for a clustering-based peak alignment algorithm, and its application to analyze various DNA fingerprinting data, such as ARDRA, rep-PCR, RISA, and DGGE data. The results obtained by our clustering algorithm and by BioNumerics software showed high similarity. Since several R packages have been established to statistically analyze various biological data, the distance matrix obtained by our R program can be used for subsequent statistical analyses, some of which were not previously performed but are useful in DNA fingerprinting studies.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-05-21
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.
Numerical linked-cluster algorithms. I. Spin systems on square, triangular, and kagomé lattices.
Rigol, Marcos; Bryant, Tyler; Singh, Rajiv R P
2007-06-01
We discuss recently introduced numerical linked-cluster (NLC) algorithms that allow one to obtain temperature-dependent properties of quantum lattice models, in the thermodynamic limit, from exact diagonalization of finite clusters. We present studies of thermodynamic observables for spin models on square, triangular, and kagomé lattices. Results for several choices of clusters and extrapolations methods, that accelerate the convergence of NLCs, are presented. We also include a comparison of NLC results with those obtained from exact analytical expressions (where available), high-temperature expansions (HTE), exact diagonalization (ED) of finite periodic systems, and quantum Monte Carlo simulations. For many models and properties NLC results are substantially more accurate than HTE and ED.
A spectral clustering search algorithm for predicting shallow landslide size and location
NASA Astrophysics Data System (ADS)
Bellugi, Dino; Milledge, David G.; Dietrich, William E.; McKean, Jim A.; Perron, J. Taylor; Sudderth, Erik B.; Kazian, Brian
2015-02-01
The potential hazard and geomorphic significance of shallow landslides depend on their location and size. Commonly applied one-dimensional stability models do not include lateral resistances and cannot predict landslide size. Multidimensional models must be applied to specific geometries, which are not known a priori, and testing all possible geometries is computationally prohibitive. We present an efficient deterministic search algorithm based on spectral graph theory and couple it with a multidimensional stability model to predict discrete landslides in applications at scales broader than a single hillslope using gridded spatial data. The algorithm is general, assuming only that instability results when driving forces acting on a cluster of cells exceed the resisting forces on its margins and that clusters behave as rigid blocks with a failure plane at the soil-bedrock interface. This algorithm recovers predefined clusters of unstable cells of varying shape and size on a synthetic landscape, predicts the size, location, and shape of an observed shallow landslide using field-measured physical parameters, and is robust to modest changes in input parameters. The search algorithm identifies patches of potential instability within large areas of stable landscape. Within these patches will be many different combinations of cells with a Factor of Safety less than one, suggesting that subtle variations in local conditions (e.g., pore pressure and root strength) may determine the ultimate form and exact location at a specific site. Nonetheless, the tests presented here suggest that the search algorithm enables the prediction of shallow landslide size as well as location across landscapes.
Evaluation of particle clustering algorithms in the prediction of brownout dust clouds
NASA Astrophysics Data System (ADS)
Govindarajan, Bharath Madapusi
2011-07-01
A study of three Lagrangian particle clustering methods has been conducted with application to the problem of predicting brownout dust clouds that develop when rotorcraft land over surfaces covered with loose sediment. A significant impediment in performing such particle modeling simulations is the extremely large number of particles needed to obtain dust clouds of acceptable fidelity. Computing the motion of each and every individual sediment particle in a dust cloud (which can reach into tens of billions per cubic meter) is computationally prohibitive. The reported work involved the development of computationally efficient clustering algorithms that can be applied to the simulation of dilute gas-particle suspensions at low Reynolds numbers of the relative particle motion. The Gaussian distribution, k-means and Osiptsov's clustering methods were studied in detail to highlight the nuances of each method for a prototypical flow field that mimics the highly unsteady, two-phase vortical particle flow obtained when rotorcraft encounter brownout conditions. It is shown that although clustering algorithms can be problem dependent and have bounds of applicability, they offer the potential to significantly reduce computational costs while retaining the overall accuracy of a brownout dust cloud solution.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms
NASA Astrophysics Data System (ADS)
Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel
2016-04-01
Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
Parallel OSEM Reconstruction Algorithm for Fully 3-D SPECT on a Beowulf Cluster.
Rong, Zhou; Tianyu, Ma; Yongjie, Jin
2005-01-01
In order to improve the computation speed of ordered subset expectation maximization (OSEM) algorithm for fully 3-D single photon emission computed tomography (SPECT) reconstruction, an experimental beowulf-type cluster was built and several parallel reconstruction schemes were described. We implemented a single-program-multiple-data (SPMD) parallel 3-D OSEM reconstruction algorithm based on message passing interface (MPI) and tested it with combinations of different number of calculating processors and different size of voxel grid in reconstruction (64×64×64 and 128×128×128). Performance of parallelization was evaluated in terms of the speedup factor and parallel efficiency. This parallel implementation methodology is expected to be helpful to make fully 3-D OSEM algorithms more feasible in clinical SPECT studies.
A new concept of wildland-urban interface based on city clustering algorithm
NASA Astrophysics Data System (ADS)
Kanevski, M.; Champendal, A.; Vega Orozco, C.; Tonini, M.; Conedera, M.
2012-04-01
Wildland-Urban-Interface (WUI) is a widely used term in the context of wild and forest fires to indicate areas where human infrastructures interact with wildland/forest areas. Many complex problems are associated to the WUI; but the most relevant ones are those related to forest fire hazard and management in dense populated areas where fire regime is dominated by anthropogenic-induced ignition fires. This coexistence enhances both anthropogenic-ignition sources and flammable fuels. Furthermore, the growing trend of the WUI and global change effects may even worsening the situation in the near future. Therefore, many studies are dedicated to the WUI problem, focusing on refinement of its definition, development of mapping methods, implementation of measures into specific fire management plans and the validation of the proposed approaches. The present study introduces a new concept of WUI based on city clustering algorithm (CCA) introduced in Rosenfeld et al., 2008. CCA was proposed as an automatic tool for studying the definition of cities and their distribution. The algorithm uses demographic data - either on a regular or non-regular grid in space - where a city (urban zone) is detected as a cluster of connected populated cells with maximal size. In the present study the CCA is proposed as a tool to develop a new concept of population dynamic analysis crucial to define and to localise WUI. The real case study is based on demographic/census data - organised in a regular grid with a resolution of 100 m and the forest fire ignition points database from canton Ticino, Switzerland. By changing spatial scales of demographic cells the relationships between urban zones (demographic clusters) and forest fire events were statistically analyzed. Corresponding scaling laws were used to understand the interaction between urban zones and forest fires. The first results are good and indicate that the method can be applied to define WUI in an innovative way. Keywords: forest fires
NASA Astrophysics Data System (ADS)
Tramacere, A.; Paraficz, D.; Dubath, P.; Kneib, J.-P.; Courbin, F.
2016-12-01
We present a study on galaxy detection and shape classification using topometric clustering algorithms. We first use the DBSCAN algorithm to extract, from CCD frames, groups of adjacent pixels with significant fluxes and we then apply the DENCLUE algorithm to separate the contributions of overlapping sources. The DENCLUE separation is based on the localization of pattern of local maxima, through an iterative algorithm, which associates each pixel to the closest local maximum. Our main classification goal is to take apart elliptical from spiral galaxies. We introduce new sets of features derived from the computation of geometrical invariant moments of the pixel group shape and from the statistics of the spatial distribution of the DENCLUE local maxima patterns. Ellipticals are characterized by a single group of local maxima, related to the galaxy core, while spiral galaxies have additional groups related to segments of spiral arms. We use two different supervised ensemble classification algorithms: Random Forest and Gradient Boosting. Using a sample of ≃24 000 galaxies taken from the Galaxy Zoo 2 main sample with spectroscopic redshifts, and we test our classification against the Galaxy Zoo 2 catalogue. We find that features extracted from our pipeline give, on average, an accuracy of ≃93 per cent, when testing on a test set with a size of 20 per cent of our full data set, with features deriving from the angular distribution of density attractor ranking at the top of the discrimination power.
Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm
NASA Technical Reports Server (NTRS)
Mitra, Sunanda; Pemmaraju, Surya
1992-01-01
Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.
Detection and clustering of features in aerial images by neuron network-based algorithm
NASA Astrophysics Data System (ADS)
Vozenilek, Vit
2015-12-01
The paper presents the algorithm for detection and clustering of feature in aerial photographs based on artificial neural networks. The presented approach is not focused on the detection of specific topographic features, but on the combination of general features analysis and their use for clustering and backward projection of clusters to aerial image. The basis of the algorithm is a calculation of the total error of the network and a change of weights of the network to minimize the error. A classic bipolar sigmoid was used for the activation function of the neurons and the basic method of backpropagation was used for learning. To verify that a set of features is able to represent the image content from the user's perspective, the web application was compiled (ASP.NET on the Microsoft .NET platform). The main achievements include the knowledge that man-made objects in aerial images can be successfully identified by detection of shapes and anomalies. It was also found that the appropriate combination of comprehensive features that describe the colors and selected shapes of individual areas can be useful for image analysis.
Probability-changing cluster algorithm for two-dimensional XY and clock models
NASA Astrophysics Data System (ADS)
Tomita, Yusuke; Okabe, Yutaka
2002-05-01
We extend the newly proposed probability-changing cluster (PCC) Monte Carlo algorithm to the study of systems with the vector order parameter. Wolff's idea of the embedded cluster formalism is used for assigning clusters. The Kosterlitz-Thouless (KT) transitions for the two-dimensional (2D) XY and q-state clock models are studied by using the PCC algorithm. Combined with the finite-size scaling analysis based on the KT form of the correlation length, ξ~exp(c/(T/TKT-1)), we determine the KT transition temperature and the decay exponent η as TKT=0.8933(6) and η=0.243(4) for the 2D XY model. We investigate two transitions of the KT type for the 2D q-state clock models with q=6,8,12 and confirm the prediction of η=4/q2 at T1, the low-temperature critical point between the ordered and XY-like phases, systematically.
Crowded Cluster Cores. Algorithms for Deblending in Dark Energy Survey Images
Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J.; Rykoff, Eli; Song, Jeeseon
2015-10-26
Deep optical images are often crowded with overlapping objects. We found that this is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores in Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. Our paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.
Crowded Cluster Cores. Algorithms for Deblending in Dark Energy Survey Images
Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; ...
2015-10-26
Deep optical images are often crowded with overlapping objects. We found that this is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores inmore » Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. Our paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.« less
A fuzzy clustering based segmentation system as support to diagnosis in medical imaging.
Masulli, F; Schenone, A
1999-06-01
In medical imaging uncertainty is widely present in data, because of the noise in acquisition and of the partial volume effects originating from the low resolution of sensors. In particular, borders between tissues are not exactly defined and memberships in the boundary regions are intrinsically fuzzy. Therefore, computer assisted unsupervised fuzzy clustering methods turn out to be particularly suitable for handling a decision making process concerning segmentation of multimodal medical images. By using the possibilistic c-means algorithm as a refinement of a neural network based clustering algorithm named capture effect neural network, we developed the possibilistic neuro fuzzy c-means algorithm (PNFCM). In this paper the PNFCM has been applied to two different multimodal data sets and the results have been compared to those obtained by using the classical fuzzy c-means algorithm. Furthermore, a discussion is presented about the role of fuzzy clustering as a support to diagnosis in medical imaging.
ERIC Educational Resources Information Center
Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei
2013-01-01
This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…
An improved scheduling algorithm for 3D cluster rendering with platform LSF
NASA Astrophysics Data System (ADS)
Xu, Wenli; Zhu, Yi; Zhang, Liping
2013-10-01
High-quality photorealistic rendering of 3D modeling needs powerful computing systems. On this demand highly efficient management of cluster resources develops fast to exert advantages. This paper is absorbed in the aim of how to improve the efficiency of 3D rendering tasks in cluster. It focuses research on a dynamic feedback load balance (DFLB) algorithm, the work principle of load sharing facility (LSF) and optimization of external scheduler plug-in. The algorithm can be applied into match and allocation phase of a scheduling cycle. Candidate hosts is prepared in sequence in match phase. And the scheduler makes allocation decisions for each job in allocation phase. With the dynamic mechanism, new weight is assigned to each candidate host for rearrangement. The most suitable one will be dispatched for rendering. A new plugin module of this algorithm has been designed and integrated into the internal scheduler. Simulation experiments demonstrate the ability of improved plugin module is superior to the default one for rendering tasks. It can help avoid load imbalance among servers, increase system throughput and improve system utilization.
Development of a Genetic Algorithm to Automate Clustering of a Dependency Structure Matrix
NASA Technical Reports Server (NTRS)
Rogers, James L.; Korte, John J.; Bilardo, Vincent J.
2006-01-01
Much technology assessment and organization design data exists in Microsoft Excel spreadsheets. Tools are needed to put this data into a form that can be used by design managers to make design decisions. One need is to cluster data that is highly coupled. Tools such as the Dependency Structure Matrix (DSM) and a Genetic Algorithm (GA) can be of great benefit. However, no tool currently combines the DSM and a GA to solve the clustering problem. This paper describes a new software tool that interfaces a GA written as an Excel macro with a DSM in spreadsheet format. The results of several test cases are included to demonstrate how well this new tool works.
CLUSTAG & WCLUSTAG: Hierarchical Clustering Algorithms for Efficient Tag-SNP Selection
NASA Astrophysics Data System (ADS)
Ao, Sio-Iong
More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clustering algorithms have been proposed for efficient tag SNP selection.
Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal
2015-08-13
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach.
Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Meanie3D - a mean-shift based, multivariate, multi-scale clustering and tracking algorithm
NASA Astrophysics Data System (ADS)
Simon, Jürgen-Lorenz; Malte, Diederich; Silke, Troemel
2014-05-01
Project OASE is the one of 5 work groups at the HErZ (Hans Ertel Centre for Weather Research), an ongoing effort by the German weather service (DWD) to further research at Universities concerning weather prediction. The goal of project OASE is to gain an object-based perspective on convective events by identifying them early in the onset of convective initiation and follow then through the entire lifecycle. The ability to follow objects in this fashion requires new ways of object definition and tracking, which incorporate all the available data sets of interest, such as Satellite imagery, weather Radar or lightning counts. The Meanie3D algorithm provides the necessary tool for this purpose. Core features of this new approach to clustering (object identification) and tracking are the ability to identify objects using the mean-shift algorithm applied to a multitude of variables (multivariate), as well as the ability to detect objects on various scales (multi-scale) using elements of Scale-Space theory. The algorithm works in 2D as well as 3D without modifications. It is an extension of a method well known from the field of computer vision and image processing, which has been tailored to serve the needs of the meteorological community. In spite of the special application to be demonstrated here (like convective initiation), the algorithm is easily tailored to provide clustering and tracking for a wide class of data sets and problems. In this talk, the demonstration is carried out on two of the OASE group's own composite sets. One is a 2D nationwide composite of Germany including C-Band Radar (2D) and Satellite information, the other a 3D local composite of the Bonn/Jülich area containing a high-resolution 3D X-Band Radar composite.
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-01-01
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network’s running and the degree of candidate nodes’ effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-12-10
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network's running and the degree of candidate nodes' effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime.
Fong, Simon
2012-01-01
Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm. PMID:22619492
MSClust: A Multi-Seeds Based Clustering Algorithm for microbiome profiling using 16S rRNA Sequence
Chen, Wei; Cheng, Yongmei; Zhang, Clarence; Zhang, Shaowu; Zhao, Hongyu
2013-01-01
Recent developments of next generation sequencing technologies have led to rapid accumulation of 16s rRNA sequences for microbiome profiling. One key step in data processing is to cluster short sequences into operational taxonomic units (OTUs). Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency, where inference accuracy is often sacrificed to accommodate the need to analyze large numbers of sequences. Inspired by the hierarchical clustering method and a modified greedy network clustering algorithm, we propose a novel multi-seeds based heuristic clustering method, named MSClust, for OTU inference. MSClust first adaptively selects multi-seeds instead of one seed for each candidate cluster, and the reads are then processed using a greedy clustering strategy. Through many numerical examples, we demonstrate that MSClust enjoys less memory usage, and better biological accuracy compared to existing heuristic clustering methods while preserving efficiency and scalability. PMID:23899776
Yousefi-Banaem, Hossein; Rabbani, Hossein; Adibi, Peyman
2016-01-01
Barrett's mucosa is one of the most important diseases in upper gastrointestinal system that caused by gastro-esophagus reflux. If left untreated, the disease will cause distal esophagus and gastric cardia adenocarcinoma. The malignancy risk is very high in short segment Barrett's mucosa. Therefore, lesion area segmentation can improve specialist decision for treatment. In this paper, we proposed a combined fuzzy method with active models for Barrett's mucosa segmentation. In this study, we applied three methods for special area segmentation and determination. For whole disease area segmentation, we applied the hybrid fuzzy based level set method (LSM). Morphological algorithms were used for gastroesophageal junction determination, and we discriminated Barrett's mucosa from break by applying Chan-Vase method. Fuzzy c-mean and LSMs fail to segment this type of medical image due to weak boundaries. In contrast, the full automatic hybrid method with correlation approach that has used in this paper segmented the metaplasia area in the endoscopy image with desirable accuracy. The presented approach omits the manually desired cluster selection step that needed the operator manipulation. Obtained results convinced us that this approach is suitable for esophagus metaplasia segmentation. PMID:28028499
Yousefi-Banaem, Hossein; Rabbani, Hossein; Adibi, Peyman
2016-01-01
Barrett's mucosa is one of the most important diseases in upper gastrointestinal system that caused by gastro-esophagus reflux. If left untreated, the disease will cause distal esophagus and gastric cardia adenocarcinoma. The malignancy risk is very high in short segment Barrett's mucosa. Therefore, lesion area segmentation can improve specialist decision for treatment. In this paper, we proposed a combined fuzzy method with active models for Barrett's mucosa segmentation. In this study, we applied three methods for special area segmentation and determination. For whole disease area segmentation, we applied the hybrid fuzzy based level set method (LSM). Morphological algorithms were used for gastroesophageal junction determination, and we discriminated Barrett's mucosa from break by applying Chan-Vase method. Fuzzy c-mean and LSMs fail to segment this type of medical image due to weak boundaries. In contrast, the full automatic hybrid method with correlation approach that has used in this paper segmented the metaplasia area in the endoscopy image with desirable accuracy. The presented approach omits the manually desired cluster selection step that needed the operator manipulation. Obtained results convinced us that this approach is suitable for esophagus metaplasia segmentation.
Chen, Deng-kai; Gu, Rong; Gu, Yu-feng; Yu, Sui-huai
2016-01-01
Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design. PMID:27630709
Yang, Yan-Pu; Chen, Deng-Kai; Gu, Rong; Gu, Yu-Feng; Yu, Sui-Huai
2016-01-01
Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design.
Rodríguez-Ramilo, Silvia T; Wang, Jinliang
2012-09-01
The inference of population genetic structures is essential in many research areas in population genetics, conservation biology and evolutionary biology. Recently, unsupervised Bayesian clustering algorithms have been developed to detect a hidden population structure from genotypic data, assuming among others that individuals taken from the population are unrelated. Under this assumption, markers in a sample taken from a subpopulation can be considered to be in Hardy-Weinberg and linkage equilibrium. However, close relatives might be sampled from the same subpopulation, and consequently, might cause Hardy-Weinberg and linkage disequilibrium and thus bias a population genetic structure analysis. In this study, we used simulated and real data to investigate the impact of close relatives in a sample on Bayesian population structure analysis. We also showed that, when close relatives were identified by a pedigree reconstruction approach and removed, the accuracy of a population genetic structure analysis can be greatly improved. The results indicate that unsupervised Bayesian clustering algorithms cannot be used blindly to detect genetic structure in a sample with closely related individuals. Rather, when closely related individuals are suspected to be frequent in a sample, these individuals should be first identified and removed before conducting a population structure analysis.
Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun
2016-01-01
The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks. PMID:27754380
Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun
2016-10-13
The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks.
A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm
de Brito, Daniel M.; Maracaja-Coutinho, Vinicius; de Farias, Savio T.; Batista, Leonardo V.; do Rêgo, Thaís G.
2016-01-01
Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP—Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657
KANTS: a stigmergic ant algorithm for cluster analysis and swarm art.
Fernandes, Carlos M; Mora, Antonio M; Merelo, Juan J; Rosa, Agostinho C
2014-06-01
KANTS is a swarm intelligence clustering algorithm inspired by the behavior of social insects. It uses stigmergy as a strategy for clustering large datasets and, as a result, displays a typical behavior of complex systems: self-organization and global patterns emerging from the local interaction of simple units. This paper introduces a simplified version of KANTS and describes recent experiments with the algorithm in the context of a contemporary artistic and scientific trend called swarm art, a type of generative art in which swarm intelligence systems are used to create artwork or ornamental objects. KANTS is used here for generating color drawings from the input data that represent real-world phenomena, such as electroencephalogram sleep data. However, the main proposal of this paper is an art project based on well-known abstract paintings, from which the chromatic values are extracted and used as input. Colors and shapes are therefore reorganized by KANTS, which generates its own interpretation of the original artworks. The project won the 2012 Evolutionary Art, Design, and Creativity Competition.
A contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation
Theiler, J.; Gisler, G.
1997-07-01
The recent and continuing construction of multi and hyper spectral imagers will provide detailed data cubes with information in both the spatial and spectral domain. This data shows great promise for remote sensing applications ranging from environmental and agricultural to national security interests. The reduction of this voluminous data to useful intermediate forms is necessary both for downlinking all those bits and for interpreting them. Smart onboard hardware is required, as well as sophisticated earth bound processing. A segmented image (in which the multispectral data in each pixel is classified into one of a small number of categories) is one kind of intermediate form which provides some measure of data compression. Traditional image segmentation algorithms treat pixels independently and cluster the pixels according only to their spectral information. This neglects the implicit spatial information that is available in the image. We will suggest a simple approach; a variant of the standard k-means algorithm which uses both spatial and spectral properties of the image. The segmented image has the property that pixels which are spatially contiguous are more likely to be in the same class than are random pairs of pixels. This property naturally comes at some cost in terms of the compactness of the clusters in the spectral domain, but we have found that the spatial contiguity and spectral compactness properties are nearly orthogonal, which means that we can make considerable improvements in the one with minimal loss in the other.
Smith, Edward M; Littrell, Jack; Olivier, Michael
2007-12-01
High-throughput SNP genotyping platforms use automated genotype calling algorithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been optimized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be advisable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author.
Jiang, Peng; Xu, Yiming; Wu, Feng
2016-01-01
Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption. PMID:26784193
Oña, Ofelia B.; Ferraro, Marta B.; Facelli, Julio C.
2010-01-01
The characterization and prediction of the structures of metal silicon clusters is important for nanotechnology research because these clusters can be used as building blocks for nano devices, integrated circuits and solar cells. Several authors have postulated that there is a transition between exo to endo absorption of Cu in Sin clusters and showed that for n larger than 9 it is possible to find endohedral clusters. Unfortunately, no global searchers have confirmed this observation, which is based on local optimizations of plausible structures. Here we use parallel Genetic Algorithms (GA), as implemented in our MGAC software, directly coupled with DFT energy calculations to show that the global search of CuSin cluster structures does not find endohedral clusters for n < 8 but finds them for n ≥ 10. PMID:21785526
`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny
NASA Astrophysics Data System (ADS)
Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila
2010-10-01
Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.
NASA Astrophysics Data System (ADS)
Cherba, David M.
A set of Genetic Algorithm (GA) operators based on spatial location concepts will provide improved performance for a class of NP hard search problems in N dimensional spaces. A set of spatial operators for use with genetic algorithms is proposed for a class of problems with real-valued genes that consist of N-dimensional homogeneous vectors. Evolutionary computation is capable of providing solutions to problems that would be intractable using more conventional methods. A subset of these problems is represented in real-valued three dimensional spaces or other more complex vector spaces. This thesis addresses a number of issues related to the natural influences that adjacent locations in these spaces have on the fitness functions used in genetic algorithms. A subset of building blocks (schema) will be utilized based on these natural influences. It will be shown that these operators can be described by a building block style of theory that supports the experiment results. Further, the spatial base operators naturally preserve the interactions between genes for this class of problems. Genes have a natural influence on each other based on proximity. To be an effective genetic algorithm, operators need to take these proximity effects into consideration in order to preserve good contributions to fitness. Failure to utilize these spatial relationships will lead to very poor performance of the genetic algorithm or require statistical methods to try to capture the relationships. As a demonstration of these spatial operators, this dissertation will focus on the conformation of molecular clusters, where each atom's location represents a gene with real-valued coordinates. Further, the algorithm presented will work from unlabeled distance information available from experiments with limited preparation. A set of theories will be presented that form the basis for prediction of operator effectiveness, population size and convergence for this class of problems. The theory will be
NASA Astrophysics Data System (ADS)
Bitter, Ingmar; Brown, John E.; Brickman, Daniel; Summers, Ronald M.
2004-04-01
The presented method significantly reduces the time necessary to validate a computed tomographic colonography (CTC) computer aided detection (CAD) algorithm of colonic polyps applied to a large patient database. As the algorithm is being developed on Windows PCs and our target, a Beowulf cluster, is running on Linux PCs, we made the application dual platform compatible using a single source code tree. To maintain, share, and deploy source code, we used CVS (concurrent versions system) software. We built the libraries from their sources for each operating system. Next, we made the CTC CAD algorithm dual-platform compatible and validate that both Windows and Linux produced the same results. Eliminating system dependencies was mostly achieved using the Qt programming library, which encapsulates most of the system dependent functionality in order to present the same interface on either platform. Finally, we wrote scripts to execute the CTC CAD algorithm in parallel. Running hundreds of simultaneous copies of the CTC CAD algorithm on a Beowulf cluster computing network enables execution in less than four hours on our entire collection of over 2400 CT scans, as compared to a month a single PC. As a consequence, our complete patient database can be processed daily, boosting research productivity. Large scale validation of a computer aided polyp detection algorithm for CT colonography using cluster computing significantly improves the round trip time of algorithm improvement and revalidation.
Blood vessel extraction and optic disc removal using curvelet transform and kernel fuzzy c-means.
Kar, Sudeshna Sil; Maity, Santi P
2016-03-01
This paper proposes an automatic blood vessel extraction method on retinal images using matched filtering in an integrated system design platform that involves curvelet transform and kernel based fuzzy c-means. Since curvelet transform represents the lines, the edges and the curvatures very well and in compact form (by less number of coefficients) compared to other multi-resolution techniques, this paper uses curvelet transform for enhancement of the retinal vasculature. Matched filtering is then used to intensify the blood vessels' response which is further employed by kernel based fuzzy c-means algorithm that extracts the vessel silhouette from the background through non-linear mapping. For pathological images, in addition to matched filtering, Laplacian of Gaussian filter is also employed to distinguish the step and the ramp like signal from that of vessel structure. To test the efficacy of the proposed method, the algorithm has also been applied to images in presence of additive white Gaussian noise where the curvelet transform has been used for image denoising. Performance is evaluated on publicly available DRIVE, STARE and DIARETDB1 databases and is compared with the large number of existing blood vessel extraction methodologies. Simulation results demonstrate that the proposed method is very much efficient in detecting the long and the thick as well as the short and the thin vessels with an average accuracy of 96.16% for the DRIVE and 97.35% for the STARE database wherein the existing methods fail to extract the tiny and the thin vessels.
NASA Astrophysics Data System (ADS)
Altarelli, F.; Monasson, R.; Zamponi, F.
2008-01-01
We study the performances of stochastic heuristic search algorithms on Uniquely Extendible Constraint Satisfaction Problems with random inputs. We show that, for any heuristic preserving the Poissonian nature of the underlying instance, the (heuristic-dependent) largest ratio αa of constraints per variables for which a search algorithm is likely to find solutions is smaller than the critical ratio αd above which solutions are clustered and highly correlated. In addition we show that the clustering ratio can be reached when the number k of variables per constraints goes to infinity by the so-called Generalized Unit Clause heuristic.
Wu, Jia-Rui; Guo, Wei-Xian; Zhang, Xiao-Meng; Yang, Bing; Zhang, Bing
2014-02-01
Based on the data mining methods of association rules and clustering algorithm, the 188 prescriptions for cough that built by Yan Zhenghua were collected and analyzed to get the frequency of drug usage and the relationship between drugs. From which we could conclude the experiences of Yan Zhenghua for the treatment of cough. The results of the analysis were that 20 core combinations were dig out, such as Bambusae Caulis in Taenias-Almond-Sactmarsh Aster. And there were 10 new prescriptions were found out, such as Sactmarsh Aster-Scutellariae Radix-Album Viscum-Bambusae Caulis in Taenian-Eriobotryae Folium. The results of the analysis were proved that Yan Zhenghua was good at curing cough by using the traditional Chinese medicine that can dispel wind and heat from the body, and remove heat from the lung to relieve cough.
2016-01-01
The early diagnosis of breast cancer is an important step in a fight against the disease. Machine learning techniques have shown promise in improving our understanding of the disease. As medical datasets consist of data points which cannot be precisely assigned to a class, fuzzy methods have been useful for studying of these datasets. Sometimes breast cancer datasets are described by categorical features. Many fuzzy clustering algorithms have been developed for categorical datasets. However, in most of these methods Hamming distance is used to define the distance between the two categorical feature values. In this paper, we use a probabilistic distance measure for the distance computation among a pair of categorical feature values. Experiments demonstrate that the distance measure performs better than Hamming distance for Wisconsin breast cancer data. PMID:27022504
OpenACC programs of the Swendsen-Wang multi-cluster spin flip algorithm
NASA Astrophysics Data System (ADS)
Komura, Yukihiro
2015-12-01
We present sample OpenACC programs of the Swendsen-Wang multi-cluster spin flip algorithm. OpenACC is a directive-based programming model for accelerators without requiring modification to the underlying CPU code itself. In this paper, we deal with the classical spin models as with the sample CUDA programs (Komura and Okabe, 2014), that is, two-dimensional (2D) Ising model, three-dimensional (3D) Ising model, 2D Potts model, 3D Potts model, 2D XY model and 3D XY model. We explain the details of sample OpenACC programs and compare the performance of the present OpenACC implementations with that of the CUDA implementations for the 2D and 3D Ising models and the 2D and 3D XY models.
NASA Astrophysics Data System (ADS)
Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok
2015-01-01
This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, 'Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.
Adham, Manal T; Bentley, Peter J
2016-08-01
This paper proposes and evaluates a solution to the truck redistribution problem prominent in London's Santander Cycle scheme. Due to the complexity of this NP-hard combinatorial optimisation problem, no efficient optimisation techniques are known to solve the problem exactly. This motivates our use of the heuristic Artificial Ecosystem Algorithm (AEA) to find good solutions in a reasonable amount of time. The AEA is designed to take advantage of highly distributed computer architectures and adapt to changing problems. In the AEA a problem is first decomposed into its relative sub-components; they then evolve solution building blocks that fit together to form a single optimal solution. Three variants of the AEA centred on evaluating clustering methods are presented: the baseline AEA, the community-based AEA which groups stations according to journey flows, and the Adaptive AEA which actively modifies clusters to cater for changes in demand. We applied these AEA variants to the redistribution problem prominent in bike share schemes (BSS). The AEA variants are empirically evaluated using historical data from Santander Cycles to validate the proposed approach and prove its potential effectiveness.
Study of cluster reconstruction and track fitting algorithms for CGEM-IT at BESIII
NASA Astrophysics Data System (ADS)
Guo, Yue; Wang, Liang-Liang; Ju, Xu-Dong; Wu, Ling-Hui; Xiu, Qing-Lei; Wang, Hai-Xia; Dong, Ming-Yi; Hu, Jing-Ran; Li, Wei-Dong; Li, Wei-Guo; Liu, Huai-Min; Qun, Ou-Yang; Shen, Xiao-Yan; Yuan, Ye; Zhang, Yao
2016-01-01
Considering the effects of aging on the existing Inner Drift Chamber (IDC) of BESIII, a GEM-based inner tracker, the Cylindrical-GEM Inner Tracker (CGEM-IT), is proposed to be designed and constructed as an upgrade candidate for the IDC. This paper introduces a full simulation package for the CGEM-IT with a simplified digitization model, and describes the development of software for cluster reconstruction and track fitting, using a track fitting algorithm based on the Kalman filter method. Preliminary results for the reconstruction algorithms which are obtained using a Monte Carlo sample of single muon events in the CGEM-IT, show that the CGEM-IT has comparable momentum resolution and transverse vertex resolution to the IDC, and a better z-direction resolution than the IDC. Supported by National Key Basic Research Program of China (2015CB856700), National Natural Science Foundation of China (11205184, 11205182) and Joint Funds of National Natural Science Foundation of China (U1232201)
NASA Astrophysics Data System (ADS)
Wu, Tin-Yu; Chang, Tse; Chu, Teng-Hao
2017-02-01
Many data mining adopts the form of Artificial Neural Network (ANN) to solve many problems, many problems will be involved in the process of training Artificial Neural Network, such as the number of samples with volume label, the time and performance of training, the number of hidden layers and Transfer function, if the compared data results are not expected, it cannot be known clearly that which dimension causes the deviation, the main reason is that Artificial Neural Network trains compared results through the form of modifying weight, and it is not a kind of training to improve the original algorithm for the extraction algorithm of image, but tend to obtain correct value aimed at the result plus the weigh; in terms of these problems, this paper will mainly put forward a method to assist in the image data analysis of Artificial Neural Network; normally, a parameter will be set as the value to extract feature vector during processing the image, which will be considered by us as weight, the experiment will use the value extracted from feature point of Speeded Up Robust Features (SURF) Image as the basis for training, SURF itself can extract different feature points according to extracted values, we will make initial semi-supervised clustering according to these values, and use Modified K - on his Neighbors (MFKNN) as training and classification, the matching mode of unknown images is not one-to-one complete comparison, but only compare group Centroid, its main purpose is to save its efficiency and speed up, and its retrieved data results will be observed and analyzed eventually; the method is mainly to make clustering and classification with the use of the nature of image feature point to give values to groups with high error rate to produce new feature points and put them into Input Layer of Artificial Neural Network for training, and finally comparative analysis is made with Back-Propagation Neural Network (BPN) of Genetic Algorithm-Artificial Neural Network
NASA Technical Reports Server (NTRS)
Dasarathy, B. V.
1976-01-01
An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.
Wang, Xueyi
2011-01-01
The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 106 records and 104 dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces. PMID:22247818
Rough-fuzzy clustering for grouping functionally similar genes from microarray data.
Maji, Pradipta; Paul, Sushmita
2013-01-01
Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets.
Liu, Song; Zhu, Lizhe; Sheong, Fu Kit; Wang, Wei; Huang, Xuhui
2017-01-30
We present an efficient density-based adaptive-resolution clustering method APLoD for analyzing large-scale molecular dynamics (MD) trajectories. APLoD performs the k-nearest-neighbors search to estimate the density of MD conformations in a local fashion, which can group MD conformations in the same high-density region into a cluster. APLoD greatly improves the popular density peaks algorithm by reducing the running time and the memory usage by 2-3 orders of magnitude for systems ranging from alanine dipeptide to a 370-residue Maltose-binding protein. In addition, we demonstrate that APLoD can produce clusters with various sizes that are adaptive to the underlying density (i.e., larger clusters at low-density regions, while smaller clusters at high-density regions), which is a clear advantage over other popular clustering algorithms including k-centers and k-medoids. We anticipate that APLoD can be widely applied to split ultra-large MD datasets containing millions of conformations for subsequent construction of Markov State Models. © 2016 Wiley Periodicals, Inc.
Marchal, Rémi; Carbonnière, Philippe; Pouchan, Claude
2015-01-22
The study of atomic clusters has become an increasingly active area of research in the recent years because of the fundamental interest in studying a completely new area that can bridge the gap between atomic and solid state physics. Due to their specific properties, such compounds are of great interest in the field of nanotechnology [1,2]. Here, we would present our GSAM algorithm based on a DFT exploration of the PES to find the low lying isomers of such compounds. This algorithm includes the generation of an intial set of structure from which the most relevant are selected. Moreover, an optimization process, called raking optimization, able to discard step by step all the non physically reasonnable configurations have been implemented to reduce the computational cost of this algorithm. Structural properties of Ga{sub n}Asm clusters will be presented as an illustration of the method.
NASA Technical Reports Server (NTRS)
Dasarathy, B. V.
1976-01-01
Learning of discriminant hyperplanes in imperfectly supervised or unsupervised training sample sets with unreliably labeled samples along the fuzzy joint boundaries between sample clusters is discussed, with the discriminant hyperplane designed to be a least-squares fit to the unreliably labeled data points. (Samples along the fuzzy boundary jump back and forth from one cluster to the other in recursive cluster stabilization and are considered unreliably labeled.) Minimization of the distances of these unreliably labeled samples from the hyperplanes does not sacrifice the ability to discriminate between classes represented by reliably labeled subsets of samples. An equivalent unconstrained linear inequality problem is formulated and algorithms for its solution are indicated. Landsat earth sensing data were used in confirming the validity and computational feasibility of the approach, which should be useful in deriving discriminant hyperplanes separating clusters with fuzzy boundaries, given supervised training sample sets with unreliably labeled boundary samples.
An Improved Source-Scanning Algorithm for Locating Earthquake Clusters or Aftershock Sequences
NASA Astrophysics Data System (ADS)
Liao, Y.; Kao, H.; Hsu, S.
2010-12-01
The Source-scanning Algorithm (SSA) was originally introduced in 2004 to locate non-volcanic tremors. Its application was later expanded to the identification of earthquake rupture planes and the near-real-time detection and monitoring of landslides and mud/debris flows. In this study, we further improve SSA for the purpose of locating earthquake clusters or aftershock sequences when only a limited number of waveform observations are available. The main improvements include the application of a ground motion analyzer to separate P and S waves, the automatic determination of resolution based on the grid size and time step of the scanning process, and a modified brightness function to utilize constraints from multiple phases. Specifically, the improved SSA (named as ISSA) addresses two major issues related to locating earthquake clusters/aftershocks. The first one is the massive amount of both time and labour to locate a large number of seismic events manually. And the second one is to efficiently and correctly identify the same phase across the entire recording array when multiple events occur closely in time and space. To test the robustness of ISSA, we generate synthetic waveforms consisting of 3 separated events such that individual P and S phases arrive at different stations in different order, thus making correct phase picking nearly impossible. Using these very complicated waveforms as the input, the ISSA scans all model space for possible combination of time and location for the existence of seismic sources. The scanning results successfully associate various phases from each event at all stations, and correctly recover the input. To further demonstrate the advantage of ISSA, we apply it to the waveform data collected by a temporary OBS array for the aftershock sequence of an offshore earthquake southwest of Taiwan. The overall signal-to-noise ratio is inadequate for locating small events; and the precise arrival times of P and S phases are difficult to
NASA Astrophysics Data System (ADS)
Huang, Zhipeng; Gao, Lihong; Wang, Yangwei; Wang, Fuchi
2016-09-01
The Johnson-Cook (J-C) constitutive model is widely used in the finite element simulation, as this model shows the relationship between stress and strain in a simple way. In this paper, a cluster global optimization algorithm is proposed to determine the J-C constitutive model parameters of materials. A set of assumed parameters is used for the accuracy verification of the procedure. The parameters of two materials (401 steel and 823 steel) are determined. Results show that the procedure is reliable and effective. The relative error between the optimized and assumed parameters is no more than 4.02%, and the relative error between the optimized and assumed stress is 0.2% × 10-5. The J-C constitutive parameters can be determined more precisely and quickly than the traditional manual procedure. Furthermore, all the parameters can be simultaneously determined using several curves under different experimental conditions. A strategy is also proposed to accurately determine the constitutive parameters.
Fleisch, Markus C.; Maxell, Christopher A.; Kuper, Claudia K.; Brown, Erika T.; Parvin, Bahram; Barcellos-Hoff, Mary-Helen; Costes,Sylvain V.
2006-03-08
Centrosomes are small organelles that organize the mitoticspindle during cell division and are also involved in cell shape andpolarity. Within epithelial tumors, such as breast cancer, and somehematological tumors, centrosome abnormalities (CA) are common, occurearly in disease etiology, and correlate with chromosomal instability anddisease stage. In situ quantification of CA by optical microscopy ishampered by overlap and clustering of these organelles, which appear asfocal structures. CA has been frequently associated with Tp53 status inpremalignant lesions and tumors. Here we describe an approach toaccurately quantify centrosomes in tissue sections and tumors.Considering proliferation and baseline amplification rate the resultingpopulation based ratio of centrosomes per nucleus allow the approximationof the proportion of cells with CA. Using this technique we show that20-30 percent of cells have amplified centrosomes in Tp53 null mammarytumors. Combining fluorescence detection, deconvolution microscopy and amathematical algorithm applied to a maximum intensity projection we showthat this approach is superior to traditional investigator based visualanalysis or threshold-based techniques.
BoCluSt: Bootstrap Clustering Stability Algorithm for Community Detection.
Garcia, Carlos
2016-01-01
The identification of modules or communities in sets of related variables is a key step in the analysis and modeling of biological systems. Procedures for this identification are usually designed to allow fast analyses of very large datasets and may produce suboptimal results when these sets are of a small to moderate size. This article introduces BoCluSt, a new, somewhat more computationally intensive, community detection procedure that is based on combining a clustering algorithm with a measure of stability under bootstrap resampling. Both computer simulation and analyses of experimental data showed that BoCluSt can outperform current procedures in the identification of multiple modules in data sets with a moderate number of variables. In addition, the procedure provides users with a null distribution of results to evaluate the support for the existence of community structure in the data. BoCluSt takes individual measures for a set of variables as input, and may be a valuable and robust exploratory tool of network analysis, as it provides 1) an estimation of the best partition of variables into modules, 2) a measure of the support for the existence of modular structures, and 3) an overall description of the whole structure, which may reveal hierarchical modular situations, in which modules are composed of smaller sub-modules.
Yu, Shanen; Liu, Shuai; Jiang, Peng
2016-01-01
Most existing deployment algorithms for event coverage in underwater wireless sensor networks (UWSNs) usually do not consider that network communication has non-uniform characteristics on three-dimensional underwater environments. Such deployment algorithms ignore that the nodes are distributed at different depths and have different probabilities for data acquisition, thereby leading to imbalances in the overall network energy consumption, decreasing the network performance, and resulting in poor and unreliable late network operation. Therefore, in this study, we proposed an uneven cluster deployment algorithm based network layered for event coverage. First, according to the energy consumption requirement of the communication load at different depths of the underwater network, we obtained the expected value of deployment nodes and the distribution density of each layer network after theoretical analysis and deduction. Afterward, the network is divided into multilayers based on uneven clusters, and the heterogeneous communication radius of nodes can improve the network connectivity rate. The recovery strategy is used to balance the energy consumption of nodes in the cluster and can efficiently reconstruct the network topology, which ensures that the network has a high network coverage and connectivity rate in a long period of data acquisition. Simulation results show that the proposed algorithm improves network reliability and prolongs network lifetime by significantly reducing the blind movement of overall network nodes while maintaining a high network coverage and connectivity rate. PMID:27973448
NASA Astrophysics Data System (ADS)
Rastogi, Richa; Londhe, Ashutosh; Srivastava, Abhishek; Sirasala, Kirannmayi M.; Khonde, Kiran
2017-03-01
In this article, a new scalable 3D Kirchhoff depth migration algorithm is presented on state of the art multicore CPU based cluster. Parallelization of 3D Kirchhoff depth migration is challenging due to its high demand of compute time, memory, storage and I/O along with the need of their effective management. The most resource intensive modules of the algorithm are traveltime calculations and migration summation which exhibit an inherent trade off between compute time and other resources. The parallelization strategy of the algorithm largely depends on the storage of calculated traveltimes and its feeding mechanism to the migration process. The presented work is an extension of our previous work, wherein a 3D Kirchhoff depth migration application for multicore CPU based parallel system had been developed. Recently, we have worked on improving parallel performance of this application by re-designing the parallelization approach. The new algorithm is capable to efficiently migrate both prestack and poststack 3D data. It exhibits flexibility for migrating large number of traces within the available node memory and with minimal requirement of storage, I/O and inter-node communication. The resultant application is tested using 3D Overthrust data on PARAM Yuva II, which is a Xeon E5-2670 based multicore CPU cluster with 16 cores/node and 64 GB shared memory. Parallel performance of the algorithm is studied using different numerical experiments and the scalability results show striking improvement over its previous version. An impressive 49.05X speedup with 76.64% efficiency is achieved for 3D prestack data and 32.00X speedup with 50.00% efficiency for 3D poststack data, using 64 nodes. The results also demonstrate the effectiveness and robustness of the improved algorithm with high scalability and efficiency on a multicore CPU cluster.
Cloud classification from satellite data using a fuzzy sets algorithm: A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1988-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
Cloud classification from satellite data using a fuzzy sets algorithm - A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1989-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine like areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
Yang, Liu; Lu, Yinzhi; Zhong, Yuanchang; Wu, Xuegang; Yang, Simon X.
2015-01-01
Energy resource limitation is a severe problem in traditional wireless sensor networks (WSNs) because it restricts the lifetime of network. Recently, the emergence of energy harvesting techniques has brought with them the expectation to overcome this problem. In particular, it is possible for a sensor node with energy harvesting abilities to work perpetually in an Energy Neutral state. In this paper, a Multi-hop Energy Neutral Clustering (MENC) algorithm is proposed to construct the optimal multi-hop clustering architecture in energy harvesting WSNs, with the goal of achieving perpetual network operation. All cluster heads (CHs) in the network act as routers to transmit data to base station (BS) cooperatively by a multi-hop communication method. In addition, by analyzing the energy consumption of intra- and inter-cluster data transmission, we give the energy neutrality constraints. Under these constraints, every sensor node can work in an energy neutral state, which in turn provides perpetual network operation. Furthermore, the minimum network data transmission cycle is mathematically derived using convex optimization techniques while the network information gathering is maximal. Simulation results show that our protocol can achieve perpetual network operation, so that the consistent data delivery is guaranteed. In addition, substantial improvements on the performance of network throughput are also achieved as compared to the famous traditional clustering protocol LEACH and recent energy harvesting aware clustering protocols. PMID:26712764
Yang, Liu; Lu, Yinzhi; Zhong, Yuanchang; Wu, Xuegang; Yang, Simon X
2015-12-26
Energy resource limitation is a severe problem in traditional wireless sensor networks (WSNs) because it restricts the lifetime of network. Recently, the emergence of energy harvesting techniques has brought with them the expectation to overcome this problem. In particular, it is possible for a sensor node with energy harvesting abilities to work perpetually in an Energy Neutral state. In this paper, a Multi-hop Energy Neutral Clustering (MENC) algorithm is proposed to construct the optimal multi-hop clustering architecture in energy harvesting WSNs, with the goal of achieving perpetual network operation. All cluster heads (CHs) in the network act as routers to transmit data to base station (BS) cooperatively by a multi-hop communication method. In addition, by analyzing the energy consumption of intra- and inter-cluster data transmission, we give the energy neutrality constraints. Under these constraints, every sensor node can work in an energy neutral state, which in turn provides perpetual network operation. Furthermore, the minimum network data transmission cycle is mathematically derived using convex optimization techniques while the network information gathering is maximal. Simulation results show that our protocol can achieve perpetual network operation, so that the consistent data delivery is guaranteed. In addition, substantial improvements on the performance of network throughput are also achieved as compared to the famous traditional clustering protocol LEACH and recent energy harvesting aware clustering protocols.
NASA Astrophysics Data System (ADS)
Valaparla, Sunil K.; Peng, Qi; Gao, Feng; Clarke, Geoffrey D.
2014-03-01
Accurate measurements of human body fat distribution are desirable because excessive body fat is associated with impaired insulin sensitivity, type 2 diabetes mellitus (T2DM) and cardiovascular disease. In this study, we hypothesized that the performance of water suppressed (WS) MRI is superior to non-water suppressed (NWS) MRI for volumetric assessment of abdominal subcutaneous (SAT), intramuscular (IMAT), visceral (VAT), and total (TAT) adipose tissues. We acquired T1-weighted images on a 3T MRI system (TIM Trio, Siemens), which was analyzed using semi-automated segmentation software that employs a fuzzy c-means (FCM) clustering algorithm. Sixteen contiguous axial slices, centered at the L4-L5 level of the abdomen, were acquired in eight T2DM subjects with water suppression (WS) and without (NWS). Histograms from WS images show improved separation of non-fatty tissue pixels from fatty tissue pixels, compared to NWS images. Paired t-tests of WS versus NWS showed a statistically significant lower volume of lipid in the WS images for VAT (145.3 cc less, p=0.006) and IMAT (305 cc less, p<0.001), but not SAT (14.1 cc more, NS). WS measurements of TAT also resulted in lower fat volumes (436.1 cc less, p=0.002). There is strong correlation between WS and NWS quantification methods for SAT measurements (r=0.999), but poorer correlation for VAT studies (r=0.845). These results suggest that NWS pulse sequences may overestimate adipose tissue volumes and that WS pulse sequences are more desirable due to the higher contrast generated between fatty and non-fatty tissues.
DeMaere, Matthew Z.
2016-01-01
Background Chromosome conformation capture, coupled with high throughput DNA sequencing in protocols like Hi-C and 3C-seq, has been proposed as a viable means of generating data to resolve the genomes of microorganisms living in naturally occuring environments. Metagenomic Hi-C and 3C-seq datasets have begun to emerge, but the feasibility of resolving genomes when closely related organisms (strain-level diversity) are present in the sample has not yet been systematically characterised. Methods We developed a computational simulation pipeline for metagenomic 3C and Hi-C sequencing to evaluate the accuracy of genomic reconstructions at, above, and below an operationally defined species boundary. We simulated datasets and measured accuracy over a wide range of parameters. Five clustering algorithms were evaluated (2 hard, 3 soft) using an adaptation of the extended B-cubed validation measure. Results When all genomes in a sample are below 95% sequence identity, all of the tested clustering algorithms performed well. When sequence data contains genomes above 95% identity (our operational definition of strain-level diversity), a naive soft-clustering extension of the Louvain method achieves the highest performance. Discussion Previously, only hard-clustering algorithms have been applied to metagenomic 3C and Hi-C data, yet none of these perform well when strain-level diversity exists in a metagenomic sample. Our simple extension of the Louvain method performed the best in these scenarios, however, accuracy remained well below the levels observed for samples without strain-level diversity. Strain resolution is also highly dependent on the amount of available 3C sequence data, suggesting that depth of sequencing must be carefully considered during experimental design. Finally, there appears to be great scope to improve the accuracy of strain resolution through further algorithm development. PMID:27843713
NASA Astrophysics Data System (ADS)
Nonomura, Yoshihiko
2014-11-01
Nonequilibrium relaxation behaviors in the Ising model on a square lattice based on the Wolff algorithm are totally different from those based on local-update algorithms. In particular, the critical relaxation is described by the stretched-exponential decay. We propose a novel scaling procedure to connect nonequilibrium and equilibrium behaviors continuously, and find that the stretched-exponential scaling region in the Wolff algorithm is as wide as the power-law scaling region in local-update algorithms. We also find that relaxation to the spontaneous magnetization in the ordered phase is characterized by the exponential decay, not the stretched-exponential decay based on local-update algorithms.
Li, Weizhong [San Diego Supercomputer Center
2016-07-12
San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
Single-cluster algorithm for the site-bond-correlated Ising model
NASA Astrophysics Data System (ADS)
Campos, P. R. A.; Onody, R. N.
1997-12-01
We extend the Wolff algorithm to include correlated spin interactions in diluted magnetic systems. This algorithm is applied to study the site-bond-correlated Ising model on a two-dimensional square lattice. We use a finite-size scaling procedure to obtain the phase diagram in the temperature-concentration space. We also have verified that the autocorrelation time diminishes in the presence of dilution and correlation, showing that the Wolff algorithm performs even better in such situations.
A pixel-based color image segmentation using support vector machine and fuzzy C-means.
Wang, Xiang-Yang; Zhang, Xian-Jin; Yang, Hong-Ying; Bu, Juan
2012-09-01
Image segmentation is an important tool in image processing and can serve as an efficient front end to sophisticated algorithms and thereby simplify subsequent processing. In this paper, we present a pixel-based color image segmentation using Support Vector Machine (SVM) and Fuzzy C-Means (FCM). Firstly, the pixel-level color feature and texture feature of the image, which is used as input of the SVM model (classifier), are extracted via the local spatial similarity measure model and Steerable filter. Then, the SVM model (classifier) is trained by using FCM with the extracted pixel-level features. Finally, the color image is segmented with the trained SVM model (classifier). This image segmentation can not only take full advantage of the local information of the color image but also the ability of the SVM classifier. Experimental evidence shows that the proposed method has a very effective computational behavior and effectiveness, and decreases the time and increases the quality of color image segmentation in comparison with the state-of-the-art segmentation methods recently proposed in the literature.
Tsai, Ming-Hui; Huang, Yueh-Min
2014-11-18
Wireless sensor networks (WSNs) have emerged as a promising solution for various applications due to their low cost and easy deployment. Typically, their limited power capability, i.e., battery powered, make WSNs encounter the challenge of extension of network lifetime. Many hierarchical protocols show better ability of energy efficiency in the literature. Besides, data reduction based on the correlation of sensed readings can efficiently reduce the amount of required transmissions. Therefore, we use a sub-clustering procedure based on spatial data correlation to further separate the hierarchical (clustered) architecture of a WSN. The proposed algorithm (2TC-cor) is composed of two procedures: the prediction model construction procedure and the sub-clustering procedure. The energy conservation benefits by the reduced transmissions, which are dependent on the prediction model. Also, the energy can be further conserved because of the representative mechanism of sub-clustering. As presented by simulation results, it shows that 2TC-cor can effectively conserve energy and monitor accurately the environment within an acceptable level.
Balouchestani, Mohammadreza; Krishnan, Sridhar
2014-01-01
Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.
NASA Astrophysics Data System (ADS)
Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.
2014-06-01
The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test Γ coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.
Chen, Wei-Chen; Ostrouchov, George; Pugmire, Dave; Prabhat,; Wehner, Michael
2013-01-01
We develop a parallel EM algorithm for multivariate Gaussian mixture models and use it to perform model-based clustering of a large climate data set. Three variants of the EM algorithm are reformulated in parallel and a new variant that is faster is presented. All are implemented using the single program, multiple data (SPMD) programming model, which is able to take advantage of the combined collective memory of large distributed computer architectures to process larger data sets. Displays of the estimated mixture model rather than the data allow us to explore multivariate relationships in a way that scales to arbitrary size data. We study the performance of our methodology on simulated data and apply our methodology to a high resolution climate dataset produced by the community atmosphere model (CAM5). This article has supplementary material online.
Abedini, Mohammad; Moradi, Mohammad H; Hosseinian, S M
2016-03-01
This paper proposes a novel method to address reliability and technical problems of microgrids (MGs) based on designing a number of self-adequate autonomous sub-MGs via adopting MGs clustering thinking. In doing so, a multi-objective optimization problem is developed where power losses reduction, voltage profile improvement and reliability enhancement are considered as the objective functions. To solve the optimization problem a hybrid algorithm, named HS-GA, is provided, based on genetic and harmony search algorithms, and a load flow method is given to model different types of DGs as droop controller. The performance of the proposed method is evaluated in two case studies. The results provide support for the performance of the proposed method.
Pillió, Zoltán; Tajti, Attila; Szalay, Péter G
2012-09-11
A new algorithm is presented for the calculation of the ladder-type term of the coupled cluster singles and doubles (CCSD) equations using two-electron integrals in atomic orbital (AO) basis. The method is based on an orbital grouping scheme, which results in an optimal partitioning of the AO integral matrix into sparse and dense blocks allowing efficient matrix multiplication. Carefully chosen numerical tests have been performed to analyze the performance of all aspects of the new algorithm. It is shown that the suggested scheme allows an efficient utilization of modern highly parallel architectures and devices in CCSD calculations. Details of the implementation in the development version of CFOUR quantum chemical program package are also presented.
A new algorithm to find earthquake clusters using neighboring cell connection and rate analysis.
NASA Astrophysics Data System (ADS)
Peng, W.; Toda, S.
2015-12-01
To study earthquake interaction, it is important to objectively find a group of earthquakes occurred closely in space and time. Earthquake clusters are chosen with previous techniques that characterize them as mainshock-aftershock sequences or swarm sequences by empirical laws (e.g., Omori-Utsu; ETAS) or direct assumptions about physical processes such as stress transfer, transient stress loading, and fluid migration. Recent papers instead proposed non-parameterized techniques such as a kernel-based smoothing method. The cumulative rate clustering method (CURATE, Jacobs et al., 2013) is one of the approaches without any direct assumptions. The CURATE method was applied in New Zealand and provided a good result for selecting the swarm sequences comparing with the ETAS model. However, it is still difficult to choose a proper confined area and a time interval for extracting sequences from the catalog. To avoid arbitrariness in space and time parameters, here we propose a new method modifying the CURATE approach. We first identify the spatial clusters by looking into the spatial distribution in a 2-D cell-gridded map. The spatial clusters defined as multiple neighboring cells, each of which contains at least one earthquake in a time period T. From the selected spatial clusters, we then evaluate temporal clustering which is defined as a transient increase of seismicity rate comparing to the rate before the target event. We tested this method focusing on shallow crustal seismicity, northern Honshu, Japan. We chose the parameter range from T = 1 to 100 days and cell size = 0.01°to 0.1°. As a result, the number of the clusters increase with longer T and larger cell size. By choosing the T = 30 days and cell size = 0.05°, we successfully selected the long-lasting aftershock sequences associated with the 2004 M6.8 Chuetsu and 2007 M6.8 Chuetsu-oki earthquakes, while other empirical models and CURATE method failed to decluster.
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2013-01-01
We present multiple GPU computing with the common unified device architecture (CUDA) for the Swendsen-Wang multi-cluster algorithm of two-dimensional (2D) q-state Potts model. Extending our algorithm for single GPU computing [Y. Komura, Y. Okabe, GPU-based Swendsen-Wang multi-cluster algorithm for the simulation of two-dimensional classical spin systems, Comput. Phys. Comm. 183 (2012) 1155-1161], we realize the GPU computation of the Swendsen-Wang multi-cluster algorithm for multiple GPUs. We implement our code on the large-scale open science supercomputer TSUBAME 2.0, and test the performance and the scalability of the simulation of the 2D Potts model. The performance on Tesla M2050 using 256 GPUs is obtained as 37.3 spin flips per a nano second for the q=2 Potts model (Ising model) at the critical temperature with the linear system size L=65536.
NASA Astrophysics Data System (ADS)
Cazade, Pierre-André; Zheng, Wenwei; Prada-Gracia, Diego; Berezovska, Ganna; Rao, Francesco; Clementi, Cecilia; Meuwly, Markus
2015-01-01
The ligand migration network for O2-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.
Cazade, Pierre-André; Zheng, Wenwei; Prada-Gracia, Diego; Berezovska, Ganna; Rao, Francesco; Clementi, Cecilia; Meuwly, Markus
2015-01-14
The ligand migration network for O2-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.
NASA Astrophysics Data System (ADS)
Best, Andrew; Kapalo, Katelynn A.; Warta, Samantha F.; Fiore, Stephen M.
2016-05-01
Human-robot teaming largely relies on the ability of machines to respond and relate to human social signals. Prior work in Social Signal Processing has drawn a distinction between social cues (discrete, observable features) and social signals (underlying meaning). For machines to attribute meaning to behavior, they must first understand some probabilistic relationship between the cues presented and the signal conveyed. Using data derived from a study in which participants identified a set of salient social signals in a simulated scenario and indicated the cues related to the perceived signals, we detail a learning algorithm, which clusters social cue observations and defines an "N-Most Likely States" set for each cluster. Since multiple signals may be co-present in a given simulation and a set of social cues often maps to multiple social signals, the "N-Most Likely States" approach provides a dramatic improvement over typical linear classifiers. We find that the target social signal appears in a "3 most-likely signals" set with up to 85% probability. This results in increased speed and accuracy on large amounts of data, which is critical for modeling social cognition mechanisms in robots to facilitate more natural human-robot interaction. These results also demonstrate the utility of such an approach in deployed scenarios where robots need to communicate with human teammates quickly and efficiently. In this paper, we detail our algorithm, comparative results, and offer potential applications for robot social signal detection and machine-aided human social signal detection.
piClust: a density based piRNA clustering algorithm.
Jung, Inuk; Park, Jong Chan; Kim, Sun
2014-06-01
Piwi-interacting RNAs (piRNAs) are recently discovered, endogenous small non-coding RNAs. piRNAs protect the genome from invasive transposable elements (TE) and sustain integrity of the genome in germ cell lineages. Small RNA-sequencing data can be used to detect piRNA activations in a cell under a specific condition. However, identification of cell specific piRNA activations requires sophisticated computational methods. As of now, there is only one computational method, proTRAC, to locate activated piRNAs from the sequencing data. proTRAC detects piRNA clusters based on a probabilistic analysis with assumption of a uniform distribution. Unfortunately, we were not able to locate activated piRNAs from our proprietary sequencing data in chicken germ cells using proTRAC. With a careful investigation on data sets, we found that a uniform or any statistical distribution for detecting piRNA clusters may not be assumed. Furthermore, small RNA-seq data contains many different types of RNAs which was not carefully taken into account in previous studies. To improve piRNA cluster identification, we developed piClust that uses a density based clustering approach without assumption of any parametric distribution. In previous studies, it is known that piRNAs exhibit a strong tendency of forming piRNA clusters in syntenic regions of the genome. Thus, the density based clustering approach is effective and robust to the existence of non-piRNAs or noise in the data. In experiments with piRNA data from human, mouse, rat and chicken, piClust was able to detect piRNA clusters from total small RNA-seq data from germ cell lines, while proTRAC was not successful. piClust outperformed proTRAC in terms of sensitivity and running time (up to 200 folds). piClust is currently available as a web service at http://epigenomics.snu.ac.kr/piclustweb.
Madsen, Niels K; Godtliebsen, Ian H; Christiansen, Ove
2017-04-07
Vibrational coupled-cluster (VCC) theory provides an accurate method for calculating vibrational spectra and properties of small to medium-sized molecules. Obtaining these properties requires the solution of the non-linear VCC equations which can in some cases be hard to converge depending on the molecule, the basis set, and the vibrational state in question. We present and compare a range of different algorithms for solving the VCC equations ranging from a full Newton-Raphson method to approximate quasi-Newton models using an array of different convergence-acceleration schemes. The convergence properties and computational cost of the algorithms are compared for the optimization of VCC states. This includes both simple ground-state problems and difficult excited states with strong non-linearities. Furthermore, the effects of using tensor-decomposed solution vectors and residuals are investigated and discussed. The results show that for standard ground-state calculations, the conjugate residual with optimal trial vectors algorithm has the shortest time-to-solution although the full Newton-Raphson method converges in fewer macro-iterations. Using decomposed tensors does not affect the observed convergence rates in our test calculations as long as the tensors are decomposed to sufficient accuracy.
Kandalla, Krishna; Subramoni, Hari; Vishnu, Abhinav; Panda, Dhabaleswar K.
2010-04-01
Modern high performance computing systems are being increasingly deployed in a hierarchical fashion with multi-core computing platforms forming the base of the hierarchy. These systems are usually comprised of multiple racks, with each rack consisting of a finite number of chassis, with each chassis having multiple compute nodes or blades, based on multi-core architectures. The networks are also hierarchical with multiple levels of switches. Message exchange operations between processes that belong to different racks involve multiple hops across different switches and this directly affects the performance of collective operations. In this paper, we take on the challenges involved in detecting the topology of large scale InfiniBand clusters and leveraging this knowledge to design efficient topology-aware algorithms for collective operations. We also propose a communication model to analyze the communication costs involved in collective operations on large scale supercomputing systems. We have analyzed the performance characteristics of two collectives, MPI_Gather and MPI_Scatter on such systems and we have proposed topology-aware algorithms for these operations. Our experimental results have shown that the proposed algorithms can improve the performance of these collective operations by almost 54% at the micro-benchmark level.
Nagwani, Naresh Kumar; Deo, Shirish V.
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
Nagwani, Naresh Kumar; Deo, Shirish V
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm.
NASA Astrophysics Data System (ADS)
Khehra, Baljit Singh; Pharwaha, Amar Partap Singh
2016-06-01
Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.
Community Detection Algorithm Combining Stochastic Block Model and Attribute Data Clustering
NASA Astrophysics Data System (ADS)
Kataoka, Shun; Kobayashi, Takuto; Yasuda, Muneki; Tanaka, Kazuyuki
2016-11-01
We propose a new algorithm to detect the community structure in a network that utilizes both the network structure and vertex attribute data. Suppose we have the network structure together with the vertex attribute data, that is, the information assigned to each vertex associated with the community to which it belongs. The problem addressed this paper is the detection of the community structure from the information of both the network structure and the vertex attribute data. Our approach is based on the Bayesian approach that models the posterior probability distribution of the community labels. The detection of the community structure in our method is achieved by using belief propagation and an EM algorithm. We numerically verified the performance of our method using computer-generated networks and real-world networks.
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.
Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le
2013-01-01
Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real
Classification of excessive domestic water consumption using Fuzzy Clustering Method
NASA Astrophysics Data System (ADS)
Zairi Zaidi, A.; Rasmani, Khairul A.
2016-08-01
Demand for clean and treated water is increasing all over the world. Therefore it is crucial to conserve water for better use and to avoid unnecessary, excessive consumption or wastage of this natural resource. Classification of excessive domestic water consumption is a difficult task due to the complexity in determining the amount of water usage per activity, especially as the data is known to vary between individuals. In this study, classification of excessive domestic water consumption is carried out using a well-known Fuzzy C-Means (FCM) clustering algorithm. Consumer data containing information on daily, weekly and monthly domestic water usage was employed for the purpose of classification. Using the same dataset, the result produced by the FCM clustering algorithm is compared with the result obtained from a statistical control chart. The finding of this study demonstrates the potential use of the FCM clustering algorithm for the classification of domestic consumer water consumption data.
NASA Astrophysics Data System (ADS)
Hazenberg, P.; Torfs, P. J. J. F.; Leijnse, H.; Uijlenhoet, R.
2012-04-01
Over the last decades the amount of spatial geographic data obtained from satellite and radar remote sensing, geographical and other types of spatial information has increased tremendously, making it impossible for a user to examine all in detail. Therefore, a considerable amount of research has focused on smart and efficient solutions to segment a spatial image into its dominant regions, extracting most essential information. The current research presents a new spatial image cluster identification method. The delineation of clusters is performed in two separate steps. First, we identify a regions outer contour using the properties of a rotating carpenter square. Secondly, we define all inner pixels belonging to a cluster based on the same principle, excluding inner contour regions if necessary. As such, a cluster identification method will be presented which has considerable similarity to some of the tracing type and connected component image segmentation algorithms developed in the literature during the last decade. However, since the characteristic shape of a carpenter square can easily be extended, the algorithm presented here does not strictly label neighboring pixels to the same component only. On the contrary, our algorithm is able to connect non-neighboring pixels for varying pixel distances as well. In addition, since our algorithm takes a continuous grid as input, it is possible to define transition pixels, that connect pixels that belong to a given cluster. Therefore, this newly developed algorithm presents a link between the traditional image segmentation methods implemented on binary grids and the partitional density and grid-based cluster identification methods that use continuous datasets. We will demonstrate the impact of this new cluster identification method for a number of typical geophysical cases ranging from global drought identification to weather radar based precipitation cell delineation.
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2016-03-01
We present new versions of sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. In this update, we add the method of GPU-based cluster-labeling algorithm without the use of conventional iteration (Komura, 2015) to those programs. For high-precision calculations, we also add a random-number generator in the cuRAND library. Moreover, we fix several bugs and remove the extra usage of shared memory in the kernel functions.
Wu, Jia-Rui; Guo, Wei-Xian; Zhang, Xiao-Meng; Huang, Xiu-Qin; Yang, Bing
2014-02-01
Analyzed the prescriptions for phlegm retention syndrome that built by Ma Peizhi by the association rules and clustering algorithm, the frequency of drug usage and the relationship between drugs could be get. And from that we could conclude the experiences for phlegm retention syndrome of Ma Peizhi of menghe medical genre. The results of the analysis were that 18 core combinations were dig out, such as Citri Exocarpium Rubrum-Eriobotryae Folium-Citri Reticulatae Pericarpium. And there were 9 new prescriptions were found out such as Aurantii Fructus-Citri Exocarpium Rubium-Eriobotryae Folium-Citri Reticulatae Pericarpium. The results of the analysis were proved that Ma Peizhi of Menghe Medical Genre was good at curing phlegm retention syndrome by using the traditional Chinese medicine of mild and light, such as the medicines of mild tonification, and clearing damp and promoting diuresis.
Parallel SOR Iterative Algorithms and Performance Evaluation on a Linux Cluster
2005-06-01
Red - Black two-color SOR implementation. Two other iterative methods , Jacobi method is preferred. Yanheh [4] showed that the and Gauss - Seidel (G-S...The optimal value of co lies in (0, 2). The choice 40 J +" of co = 1 corresponds to the Gauss - Seidel - j.1)( - 11 iteration. 2.2 Red - Black SOR...paper, a parallel algorithm for the structure of a matrix or a grid. However, the red - black SOR method with domain decomposition is multi-color
A new method based on Dempster-Shafer theory and fuzzy c-means for brain MRI segmentation
NASA Astrophysics Data System (ADS)
Liu, Jie; Lu, Xi; Li, Yunpeng; Chen, Xiaowu; Deng, Yong
2015-10-01
In this paper, a new method is proposed to decrease sensitiveness to motion noise and uncertainty in magnetic resonance imaging (MRI) segmentation especially when only one brain image is available. The method is approached with considering spatial neighborhood information by fusing the information of pixels with their neighbors with Dempster-Shafer (DS) theory. The basic probability assignment (BPA) of each single hypothesis is obtained from the membership function of applying fuzzy c-means (FCM) clustering to the gray levels of the MRI. Then multiple hypotheses are generated according to the single hypothesis. Then we update the objective pixel’s BPA by fusing the BPA of the objective pixel and those of its neighbors to get the final result. Some examples in MRI segmentation are demonstrated at the end of the paper, in which our method is compared with some previous methods. The results show that the proposed method is more effective than other methods in motion-blurred MRI segmentation.
Cickovski, Trevor; Flor, Tiffany; Irving-Sachs, Galen; Novikov, Philip; Parda, James; Narasimhan, Giri
2015-01-01
In order to make multiple copies of a target sequence in the laboratory, the technique of Polymerase Chain Reaction (PCR) requires the design of "primers", which are short fragments of nucleotides complementary to the flanking regions of the target sequence. If the same primer is to amplify multiple closely related target sequences, then it is necessary to make the primers "degenerate", which would allow it to hybridize to target sequences with a limited amount of variability that may have been caused by mutations. However, the PCR technique can only allow a limited amount of degeneracy, and therefore the design of degenerate primers requires the identification of reasonably well-conserved regions in the input sequences. We take an existing algorithm for designing degenerate primers that is based on clustering and parallelize it in a web-accessible software package GPUDePiCt, using a shared memory model and the computing power of Graphics Processing Units (GPUs). We test our implementation on large sets of aligned sequences from the human genome and show a multi-fold speedup for clustering using our hybrid GPU/CPU implementation over a pure CPU approach for these sequences, which consist of more than 7,500 nucleotides. We also demonstrate that this speedup is consistent over larger numbers and longer lengths of aligned sequences.
A new time dependent density functional algorithm for large systems and plasmons in metal clusters
Baseggio, Oscar; Fronzoni, Giovanna; Stener, Mauro
2015-07-14
A new algorithm to solve the Time Dependent Density Functional Theory (TDDFT) equations in the space of the density fitting auxiliary basis set has been developed and implemented. The method extracts the spectrum from the imaginary part of the polarizability at any given photon energy, avoiding the bottleneck of Davidson diagonalization. The original idea which made the present scheme very efficient consists in the simplification of the double sum over occupied-virtual pairs in the definition of the dielectric susceptibility, allowing an easy calculation of such matrix as a linear combination of constant matrices with photon energy dependent coefficients. The method has been applied to very different systems in nature and size (from H{sub 2} to [Au{sub 147}]{sup −}). In all cases, the maximum deviations found for the excitation energies with respect to the Amsterdam density functional code are below 0.2 eV. The new algorithm has the merit not only to calculate the spectrum at whichever photon energy but also to allow a deep analysis of the results, in terms of transition contribution maps, Jacob plasmon scaling factor, and induced density analysis, which have been all implemented.
A new time dependent density functional algorithm for large systems and plasmons in metal clusters
NASA Astrophysics Data System (ADS)
Baseggio, Oscar; Fronzoni, Giovanna; Stener, Mauro
2015-07-01
A new algorithm to solve the Time Dependent Density Functional Theory (TDDFT) equations in the space of the density fitting auxiliary basis set has been developed and implemented. The method extracts the spectrum from the imaginary part of the polarizability at any given photon energy, avoiding the bottleneck of Davidson diagonalization. The original idea which made the present scheme very efficient consists in the simplification of the double sum over occupied-virtual pairs in the definition of the dielectric susceptibility, allowing an easy calculation of such matrix as a linear combination of constant matrices with photon energy dependent coefficients. The method has been applied to very different systems in nature and size (from H2 to [Au147]-). In all cases, the maximum deviations found for the excitation energies with respect to the Amsterdam density functional code are below 0.2 eV. The new algorithm has the merit not only to calculate the spectrum at whichever photon energy but also to allow a deep analysis of the results, in terms of transition contribution maps, Jacob plasmon scaling factor, and induced density analysis, which have been all implemented.
Risk Mapping of Cutaneous Leishmaniasis via a Fuzzy C Means-based Neuro-Fuzzy Inference System
NASA Astrophysics Data System (ADS)
Akhavan, P.; Karimi, M.; Pahlavani, P.
2014-10-01
Finding pathogenic factors and how they are spread in the environment has become a global demand, recently. Cutaneous Leishmaniasis (CL) created by Leishmania is a special parasitic disease which can be passed on to human through phlebotomus of vector-born. Studies show that economic situation, cultural issues, as well as environmental and ecological conditions can affect the prevalence of this disease. In this study, Data Mining is utilized in order to predict CL prevalence rate and obtain a risk map. This case is based on effective environmental parameters on CL and a Neuro-Fuzzy system was also used. Learning capacity of Neuro-Fuzzy systems in neural network on one hand and reasoning power of fuzzy systems on the other, make it very efficient to use. In this research, in order to predict CL prevalence rate, an adaptive Neuro-fuzzy inference system with fuzzy inference structure of fuzzy C Means clustering was applied to determine the initial membership functions. Regarding to high incidence of CL in Ilam province, counties of Ilam, Mehran, and Dehloran have been examined and evaluated. The CL prevalence rate was predicted in 2012 by providing effective environmental map and topography properties including temperature, moisture, annual, rainfall, vegetation and elevation. Results indicate that the model precision with fuzzy C Means clustering structure rises acceptable RMSE values of both training and checking data and support our analyses. Using the proposed data mining technology, the pattern of disease spatial distribution and vulnerable areas become identifiable and the map can be used by experts and decision makers of public health as a useful tool in management and optimal decision-making.
NASA Astrophysics Data System (ADS)
Ni’am, M.; Sitanggang, I. S.; Nuryanto, D. E.
2017-01-01
Peat fires in Indonesia could have a negative impact for human life such as the emergence of haze. Therefore, this research aims to analyze concentration of CO and CO2 due to peat fires in Sumatra in 2015 using the Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) and K-Means algorithm. The results of this study indicate that HYSPLIT can be used to obtain the concentration of CO and CO2 from the haze. Clustering using K-Means algorithm produces an average of the highest concentration of CO is 11.1471 μg/m3 and CO2 is 88.5882 μg/m3. Generally, pollutants have an average concentration of 0.0487 μg/m3 for CO and 0.3687 μg/m3 for CO2. The pollutant concentration is contained in 45 525 (95%) positions in haze trajectory and the haze spread starting from Riau to Nanggroe Aceh Darussalam.
NASA Astrophysics Data System (ADS)
Thomsen, Bo; Hansen, Mikkel Bo; Seidler, Peter; Christiansen, Ove
2012-03-01
We report the theory and implementation of vibrational coupled cluster (VCC) damped response functions. From the imaginary part of the damped VCC response function the absorption as function of frequency can be obtained, requiring formally the solution of the now complex VCC response equations. The absorption spectrum can in this formulation be seen as a matrix function of the characteristic VCC Jacobian response matrix. The asymmetric matrix version of the Lanczos method is used to generate a tridiagonal representation of the VCC response Jacobian. Solving the complex response equations in the relevant Lanczos space provides a method for calculating the VCC damped response functions and thereby subsequently the absorption spectra. The convergence behaviour of the algorithm is discussed theoretically and tested for different levels of completeness of the VCC expansion. Comparison is made with results from the recently reported [P. Seidler, M. B. Hansen, W. Györffy, D. Toffoli, and O. Christiansen, J. Chem. Phys. 132, 164105 (2010)] vibrational configuration interaction damped response function calculated using a symmetric Lanczos algorithm. Calculations of IR spectra of oxazole, cyclopropene, and uracil illustrate the usefulness of the new VCC based method.
Palma, G; Zambrano, D
2008-12-01
In this paper we propose a method to study critical systems numerically, which combines collective-mode algorithms and renormalization group on the lattice. This method is an improved version of the Monte Carlo renormalization group in the sense that it has all the advantages of cluster algorithms. As an application we considered the 2D Ising model and studied whether scale invariance or universality are possible underlying mechanisms responsible for the approximate "universal fluctuations" close to a so-called bulk temperature T(L) . "Universal fluctuations" were first proposed in the work of Bramwell, Holdsworth, and Pinton [Nature (London) 396, 552 (1998)] and stated that the probability density function of a global quantity for very dissimilar systems, such as a confined turbulent flow and a two-dimensional (2D) magnetic system, properly normalized to the first two moments, becomes similar to the "universal distribution," originally obtained for magnetization in the 2D XY model in the low-temperature region. The results for the critical exponents and the renormalization-group flow of the probability density function are very accurate and show no evidence to support that the approximate common shape of the PDF should be related to both scale invariance or universal behavior.
Thomsen, Bo; Hansen, Mikkel Bo; Seidler, Peter; Christiansen, Ove
2012-03-28
We report the theory and implementation of vibrational coupled cluster (VCC) damped response functions. From the imaginary part of the damped VCC response function the absorption as function of frequency can be obtained, requiring formally the solution of the now complex VCC response equations. The absorption spectrum can in this formulation be seen as a matrix function of the characteristic VCC Jacobian response matrix. The asymmetric matrix version of the Lanczos method is used to generate a tridiagonal representation of the VCC response Jacobian. Solving the complex response equations in the relevant Lanczos space provides a method for calculating the VCC damped response functions and thereby subsequently the absorption spectra. The convergence behaviour of the algorithm is discussed theoretically and tested for different levels of completeness of the VCC expansion. Comparison is made with results from the recently reported [P. Seidler, M. B. Hansen, W. Györffy, D. Toffoli, and O. Christiansen, J. Chem. Phys. 132, 164105 (2010)] vibrational configuration interaction damped response function calculated using a symmetric Lanczos algorithm. Calculations of IR spectra of oxazole, cyclopropene, and uracil illustrate the usefulness of the new VCC based method.
Muhammad, Durreshahwar; Foret, Jessica; Brady, Siobhan M.; Ducoste, Joel J.; Tuck, James; Long, Terri A.; Williams, Cranos
2015-01-01
Time course transcriptome datasets are commonly used to predict key gene regulators associated with stress responses and to explore gene functionality. Techniques developed to extract causal relationships between genes from high throughput time course expression data are limited by low signal levels coupled with noise and sparseness in time points. We deal with these limitations by proposing the Cluster and Differential Alignment Algorithm (CDAA). This algorithm was designed to process transcriptome data by first grouping genes based on stages of activity and then using similarities in gene expression to predict influential connections between individual genes. Regulatory relationships are assigned based on pairwise alignment scores generated using the expression patterns of two genes and some inferred delay between the regulator and the observed activity of the target. We applied the CDAA to an iron deficiency time course microarray dataset to identify regulators that influence 7 target transcription factors known to participate in the Arabidopsis thaliana iron deficiency response. The algorithm predicted that 7 regulators previously unlinked to iron homeostasis influence the expression of these known transcription factors. We validated over half of predicted influential relationships using qRT-PCR expression analysis in mutant backgrounds. One predicted regulator-target relationship was shown to be a direct binding interaction according to yeast one-hybrid (Y1H) analysis. These results serve as a proof of concept emphasizing the utility of the CDAA for identifying unknown or missing nodes in regulatory cascades, providing the fundamental knowledge needed for constructing predictive gene regulatory networks. We propose that this tool can be used successfully for similar time course datasets to extract additional information and infer reliable regulatory connections for individual genes. PMID:26317202
Bandow, Bernhard; Hartke, Bernd
2006-05-04
For the difficult task of finding global minimum energy structures for molecular clusters of nontrivial size, we present a highly efficient parallel implementation of an evolutionary algorithm. By completely abandoning the traditional concept of generations and by replacing it with a less rigid pool concept, we have managed to eliminate serial bottlenecks completely and can operate the algorithm efficiently on an arbitrary number of parallel processes. Nevertheless, our new algorithm still realizes all of the main features of our old, successful implementation. First tests of the new algorithm are shown for the highly demanding problem of water clusters modeled by a potential with flexible, polarizable monomers (TTM2-F). For this problem, our new algorithm not only reproduces all of the global minima proposed previously in considerably less CPU time but also leads to improved proposals in several cases. These, in turn, qualitatively change our earlier predictions concerning the transitions from all-surface structures to cages with a single interior molecule, and from one to two interior molecules. Furthermore, we compare preliminary results up to n = 105 with locally optimized cuts from several ice modifications. This comparison indicates that relaxed ice structures may start to be competitive already at cluster sizes above n = 90.
A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining
NASA Astrophysics Data System (ADS)
Tsiafoulis, S.; Zorkadis, V. C.; Karras, D. A.
The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ℓ-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.
A binned clustering algorithm to detect high-Z material using cosmic muons
NASA Astrophysics Data System (ADS)
Thomay, C.; Velthuis, J. J.; Baesso, P.; Cussans, D.; Morris, P. A. W.; Steer, C.; Burns, J.; Quillin, S.; Stapleton, M.
2013-10-01
We present a novel approach to the detection of special nuclear material using cosmic rays. Muon Scattering Tomography (MST) is a method for using cosmic muons to scan cargo containers and vehicles for special nuclear material. Cosmic muons are abundant, highly penetrating, not harmful for organic tissue, cannot be screened against, and can easily be detected, which makes them highly suited to the use of cargo scanning. Muons undergo multiple Coulomb scattering when passing through material, and the amount of scattering is roughly proportional to the square of the atomic number Z of the material. By reconstructing incoming and outgoing tracks, we can obtain variables to identify high-Z material. In a real life application, this has to happen on a timescale of 1 min and thus with small numbers of muons. We have built a detector system using resistive plate chambers (RPCs): 12 layers of RPCs allow for the readout of 6 x and 6 y positions, by which we can reconstruct incoming and outgoing tracks. In this work we detail the performance of an algorithm by which we separate high-Z targets from low-Z background, both for real data from our prototype setup and for MC simulation of a cargo container-sized setup. (c) British Crown Owned Copyright 2013/AWE
Jiang, Joe-Air; Chen, Chia-Pang; Chuang, Cheng-Long; Lin, Tzu-Shiang; Tseng, Chwan-Lu; Yang, En-Cheng; Wang, Yung-Chung
2009-01-01
Deployment of wireless sensor networks (WSNs) has drawn much attention in recent years. Given the limited energy for sensor nodes, it is critical to implement WSNs with energy efficiency designs. Sensing coverage in networks, on the other hand, may degrade gradually over time after WSNs are activated. For mission-critical applications, therefore, energy-efficient coverage control should be taken into consideration to support the quality of service (QoS) of WSNs. Usually, coverage-controlling strategies present some challenging problems: (1) resolving the conflicts while determining which nodes should be turned off to conserve energy; (2) designing an optimal wake-up scheme that avoids awakening more nodes than necessary. In this paper, we implement an energy-efficient coverage control in cluster-based WSNs using a Memetic Algorithm (MA)-based approach, entitled CoCMA, to resolve the challenging problems. The CoCMA contains two optimization strategies: a MA-based schedule for sensor nodes and a wake-up scheme, which are responsible to prolong the network lifetime while maintaining coverage preservation. The MA-based schedule is applied to a given WSN to avoid unnecessary energy consumption caused by the redundant nodes. During the network operation, the wake-up scheme awakens sleeping sensor nodes to recover coverage hole caused by dead nodes. The performance evaluation of the proposed CoCMA was conducted on a cluster-based WSN (CWSN) under either a random or a uniform deployment of sensor nodes. Simulation results show that the performance yielded by the combination of MA and wake-up scheme is better than that in some existing approaches. Furthermore, CoCMA is able to activate fewer sensor nodes to monitor the required sensing area. PMID:22408561
A graph-based watershed merging using fuzzy C-means and simulated annealing for image segmentation
NASA Astrophysics Data System (ADS)
Vadiveloo, Mogana; Abdullah, Rosni; Rajeswari, Mandava
2015-12-01
In this paper, we have addressed the issue of over-segmented regions produced in watershed by merging the regions using global feature. The global feature information is obtained from clustering the image in its feature space using Fuzzy C-Means (FCM) clustering. The over-segmented regions produced by performing watershed on the gradient of the image are then mapped to this global information in the feature space. Further to this, the global feature information is optimized using Simulated Annealing (SA). The optimal global feature information is used to derive the similarity criterion to merge the over-segmented watershed regions which are represented by the region adjacency graph (RAG). The proposed method has been tested on digital brain phantom simulated dataset to segment white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF) soft tissues regions. The experiments showed that the proposed method performs statistically better, with average of 95.242% regions are merged, than the immersion watershed and average accuracy improvement of 8.850% in comparison with RAG-based immersion watershed merging using global and local features.
Zheng, Ying; Yeh, Chen-Wei; Yang, Chi-Da; Jang, Shi-Shang; Chu, I-Ming
2007-08-31
Biological information generated by high-throughput technology has made systems approach feasible for many biological problems. By this approach, optimization of metabolic pathway has been successfully applied in the amino acid production. However, in this technique, gene modifications of metabolic control architecture as well as enzyme expression levels are coupled and result in a mixed integer nonlinear programming problem. Furthermore, the stoichiometric complexity of metabolic pathway, along with strong nonlinear behaviour of the regulatory kinetic models, directs a highly rugged contour in the whole optimization problem. There may exist local optimal solutions wherein the same level of production through different flux distributions compared with global optimum. The purpose of this work is to develop a novel stochastic optimization approach-information guided genetic algorithm (IGA) to discover the local optima with different levels of modification of the regulatory loop and production rates. The novelties of this work include the information theory, local search, and clustering analysis to discover the local optima which have physical meaning among the qualified solutions.
Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi
2016-01-01
Motivation: Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. Results: We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5′-end processing and 3′-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. Availability and Implementation: The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA
NASA Astrophysics Data System (ADS)
Wagstaff, Kiri L.
2012-03-01
On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained
NASA Technical Reports Server (NTRS)
Werth, L. F. (Principal Investigator)
1981-01-01
Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,000 color infrared aerial photography. Confusion matrices for the two clustering algorithms were generated and studied to determine which is most applicable to forest and rangeland inventories in future projects. In an unsupervised mode, ISOCLS requires many trial-and-error runs to find the proper parameters to separate desired information classes. CLASSY tells more in a single run concerning the classes that can be separated, shows more promise for forest stratification than ISOCLS, and shows more promise for consistency. One major drawback to CLASSY is that important forest and range classes that are smaller than a minimum cluster size will be combined with other classes. The algorithm requires so much computer storage that only data sets as small as a quadrangle can be used at one time.
NASA Astrophysics Data System (ADS)
Lerner, Michael G.; Meagher, Kristin L.; Carlson, Heather A.
2008-10-01
Use of solvent mapping, based on multiple-copy minimization (MCM) techniques, is common in structure-based drug discovery. The minima of small-molecule probes define locations for complementary interactions within a binding pocket. Here, we present improved methods for MCM. In particular, a Jarvis-Patrick (JP) method is outlined for grouping the final locations of minimized probes into physical clusters. This algorithm has been tested through a study of protein-protein interfaces, showing the process to be robust, deterministic, and fast in the mapping of protein "hot spots." Improvements in the initial placement of probe molecules are also described. A final application to HIV-1 protease shows how our automated technique can be used to partition data too complicated to analyze by hand. These new automated methods may be easily and quickly extended to other protein systems, and our clustering methodology may be readily incorporated into other clustering packages.
Alvarez, Daniel; Hornero, Roberto; Marcos, J Víctor; del Campo, Félix; López, Miguel
2007-01-01
This study is focused on the classification of patients suspected of suffering from obstructive sleep apnea (OSA) by means of cluster analysis. We assessed the diagnostic ability of three clustering algorithms: k-means, hierarchical and fuzzy c-means (FCM). Nonlinear features of blood oxygen saturation (SaO2) from nocturnal oximetry were used as inputs to the clustering methods. Three nonlinear methods were used: approximate entropy (ApEn), central tendency measure (CTM) and Lempel-Ziv (LZ) complexity. A population of 74 subjects (44 OSA positive and 30 OSA negative) was studied. 90.5%, 87.8% and 86.5% accuracies were reached with k-means, hierarchical and FCM algorithms, respectively. The diagnostic accuracy values improved those obtained with each nonlinear method individually. Our results suggest that nonlinear analysis and clustering classification could provide useful information to help in the diagnosis of OSA syndrome.
Generalized fuzzy clustering for segmentation of multi-spectral magnetic resonance images.
He, Renjie; Datta, Sushmita; Sajja, Balasrinivasa Rao; Narayana, Ponnada A
2008-07-01
An integrated approach for multi-spectral segmentation of MR images is presented. This method is based on the fuzzy c-means (FCM) and includes bias field correction and contextual constraints over spatial intensity distribution and accounts for the non-spherical cluster's shape in the feature space. The bias field is modeled as a linear combination of smooth polynomial basis functions for fast computation in the clustering iterations. Regularization terms for the neighborhood continuity of intensity are added into the FCM cost functions. To reduce the computational complexity, the contextual regularizations are separated from the clustering iterations. Since the feature space is not isotropic, distance measure adopted in Gustafson-Kessel (G-K) algorithm is used instead of the Euclidean distance, to account for the non-spherical shape of the clusters in the feature space. These algorithms are quantitatively evaluated on MR brain images using the similarity measures.
Wu, Shandong; Weinstein, Susan P.; Conant, Emily F.; Kontos, Despina
2013-12-15
Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's pairedt-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readers’ manual segmentation, the proposed FCM-Atlas method achieves a correlation ofr = 0
NASA Astrophysics Data System (ADS)
Nandy, Subhajit; Chaudhury, Pinaki; Bhattacharyya, S. P.
2010-06-01
We present a genetic algorithm based investigation of structural fragmentation in dicationic noble gas clusters, Arn+2, Krn+2, and Xen+2, where n denotes the size of the cluster. Dications are predicted to be stable above a threshold size of the cluster when positive charges are assumed to remain localized on two noble gas atoms and the Lennard-Jones potential along with bare Coulomb and ion-induced dipole interactions are taken into account for describing the potential energy surface. Our cutoff values are close to those obtained experimentally [P. Scheier and T. D. Mark, J. Chem. Phys. 11, 3056 (1987)] and theoretically [J. G. Gay and B. J. Berne, Phys. Rev. Lett. 49, 194 (1982)]. When the charges are allowed to be equally distributed over four noble gas atoms in the cluster and the nonpolarization interaction terms are allowed to remain unchanged, our method successfully identifies the size threshold for stability as well as the nature of the channels of dissociation as function of cluster size. In Arn2+, for example, fissionlike fragmentation is predicted for n =55 while for n =43, the predicted outcome is nonfission fragmentation in complete agreement with earlier work [Golberg et al., J. Chem. Phys. 100, 8277 (1994)].
Ganguly Neogi, Soumya; Chaudhury, Pinaki
2014-01-05
In this article, we propose a stochastic search-based method, namely genetic algorithm (GA) and simulated annealing (SA) in conjunction with density functional theory (DFT) to evaluate global and local minimum structures of (TiO2)n clusters with n = 1-12. Once the structures are established, we evaluate the infrared spectroscopic modes, cluster formation energy, vertical excitation energy, vertical ionization potential, vertical electron affinity, highest occupied molecular orbital (HOMO)-lowest unoccupied molecular orbital (LUMO) gaps, and so forth. We show that an initial determination of structure using stochastic techniques (GA/SA), also popularly known as natural algorithms as their working principle mimics certain natural processes, and following it up with density functional calculations lead to high-quality structures for these systems. We have shown that the clusters tend to form three-dimensional networks. We compare our results with the available experimental and theoretical results. The results obtained from SA/GA-DFT technique agree well with available theoretical and experimental data of literature.
Information Clustering Based on Fuzzy Multisets.
ERIC Educational Resources Information Center
Miyamoto, Sadaaki
2003-01-01
Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…
Shenvi, Neil; van Aggelen, Helen; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David
2013-08-07
Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r(4)), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T).
Cazade, Pierre-André; Berezovska, Ganna; Meuwly, Markus; Zheng, Wenwei; Clementi, Cecilia; Prada-Gracia, Diego; Rao, Francesco
2015-01-14
The ligand migration network for O{sub 2}–diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k–means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k–means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.
Adaptive fuzzy leader clustering of complex data sets in pattern recognition
NASA Technical Reports Server (NTRS)
Newton, Scott C.; Pemmaraju, Surya; Mitra, Sunanda
1992-01-01
A modular, unsupervised neural network architecture for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns on-line in a stable and efficient manner. The initial classification is performed in two stages: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from fuzzy C-means system equations for the centroids and the membership values. The AFLC algorithm is applied to the Anderson Iris data and laser-luminescent fingerprint image data. It is concluded that the AFLC algorithm successfully classifies features extracted from real data, discrete or continuous.
Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi
2016-01-01
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers. PMID:27610177
Naim, Iftekhar; Datta, Suprakash; Rebhahn, Jonathan; Cavenaugh, James S; Mosmann, Tim R; Sharma, Gaurav
2014-05-01
We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems.
Naim, Iftekhar; Datta, Suprakash; Rebhahn, Jonathan; Cavenaugh, James S; Mosmann, Tim R; Sharma, Gaurav
2014-01-01
We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems. © 2014 The Authors. Published by Wiley Periodicals Inc. PMID:24677621
Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi
2016-01-01
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers.
Baldauf, Tobias; Smith, Robert E.; Seljak, Uros; Mandelbaum, Rachel
2010-03-15
The clustering of matter on cosmological scales is an essential probe for studying the physical origin and composition of our Universe. To date, most of the direct studies have focused on shear-shear weak lensing correlations, but it is also possible to extract the dark matter clustering by combining galaxy-clustering and galaxy-galaxy-lensing measurements. In order to extract the required information, one must relate the observable galaxy distribution to the underlying dark matter distribution. In this study we develop in detail a method that can constrain the dark matter correlation function from galaxy clustering and galaxy-galaxy-lensing measurements, by focusing on the correlation coefficient between the galaxy and matter overdensity fields. Our goal is to develop an estimator that maximally correlates the two. To generate a mock galaxy catalogue for testing purposes, we use the halo occupation distribution approach applied to a large ensemble of N-body simulations to model preexisting SDSS luminous red galaxy sample observations. Using this mock catalogue, we show that a direct comparison between the excess surface mass density measured by lensing and its corresponding galaxy clustering quantity is not optimal. We develop a new statistic that suppresses the small-scale contributions to these observations and show that this new statistic leads to a cross-correlation coefficient that is within a few percent of unity down to 5h{sup -1} Mpc. Furthermore, the residual incoherence between the galaxy and matter fields can be explained using a theoretical model for scale-dependent galaxy bias, giving us a final estimator that is unbiased to within 1%, so that we can reconstruct the dark matter clustering power spectrum at this accuracy up to k{approx}1h Mpc{sup -1}. We also perform a comprehensive study of other physical effects that can affect the analysis, such as redshift space distortions and differences in radial windows between galaxy clustering and weak
NASA Astrophysics Data System (ADS)
Feng, Jian-xin; Tang, Jia-fu; Wang, Guang-xing
2007-04-01
On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.
Clustering method for estimating principal diffusion directions
Nazem-Zadeh, Mohammad-Reza; Jafari-Khouzani, Kourosh; Davoodi-Bojd, Esmaeil; Jiang, Quan; Soltanian-Zadeh, Hamid
2012-01-01
Diffusion tensor magnetic resonance imaging (DTMRI) is a non-invasive tool for the investigation of white matter structure within the brain. However, the traditional tensor model is unable to characterize anisotropies of orders higher than two in heterogeneous areas containing more than one fiber population. To resolve this issue, high angular resolution diffusion imaging (HARDI) with a large number of diffusion encoding gradients is used along with reconstruction methods such as Q-ball. Using HARDI data, the fiber orientation distribution function (ODF) on the unit sphere is calculated and used to extract the principal diffusion directions (PDDs). Fast and accurate estimation of PDDs is a prerequisite for tracking algorithms that deal with fiber crossings. In this paper, the PDDs are defined as the directions around which the ODF data is concentrated. Estimates of the PDDs based on this definition are less sensitive to noise in comparison with the previous approaches. A clustering approach to estimate the PDDs is proposed which is an extension of fuzzy c-means clustering developed for orientation of points on a sphere. MDL (Minimum description length) principle is proposed to estimate the number of PDDs. Using both simulated and real diffusion data, the proposed method has been evaluated and compared with some previous protocols. Experimental results show that the proposed clustering algorithm is more accurate, more resistant to noise, and faster than some of techniques currently being utilized. PMID:21642005
Lyoo, Chul Hyoung; Zanotti-Fregonara, Paolo; Zoghbi, Sami S; Liow, Jeih-San; Xu, Rong; Pike, Victor W; Zarate, Carlos A; Fujita, Masahiro; Innis, Robert B
2014-01-01
Image-derived input function (IDIF) obtained by manually drawing carotid arteries (manual-IDIF) can be reliably used in [(11)C](R)-rolipram positron emission tomography (PET) scans. However, manual-IDIF is time consuming and subject to inter- and intra-operator variability. To overcome this limitation, we developed a fully automated technique for deriving IDIF with a supervised clustering algorithm (SVCA). To validate this technique, 25 healthy controls and 26 patients with moderate to severe major depressive disorder (MDD) underwent T1-weighted brain magnetic resonance imaging (MRI) and a 90-minute [(11)C](R)-rolipram PET scan. For each subject, metabolite-corrected input function was measured from the radial artery. SVCA templates were obtained from 10 additional healthy subjects who underwent the same MRI and PET procedures. Cluster-IDIF was obtained as follows: 1) template mask images were created for carotid and surrounding tissue; 2) parametric image of weights for blood were created using SVCA; 3) mask images to the individual PET image were inversely normalized; 4) carotid and surrounding tissue time activity curves (TACs) were obtained from weighted and unweighted averages of each voxel activity in each mask, respectively; 5) partial volume effects and radiometabolites were corrected using individual arterial data at four points. Logan-distribution volume (V T/f P) values obtained by cluster-IDIF were similar to reference results obtained using arterial data, as well as those obtained using manual-IDIF; 39 of 51 subjects had a V T/f P error of <5%, and only one had error >10%. With automatic voxel selection, cluster-IDIF curves were less noisy than manual-IDIF and free of operator-related variability. Cluster-IDIF showed widespread decrease of about 20% [(11)C](R)-rolipram binding in the MDD group. Taken together, the results suggest that cluster-IDIF is a good alternative to full arterial input function for estimating Logan-V T/f P in [(11)C
Zoghbi, Sami S.; Liow, Jeih-San; Xu, Rong; Pike, Victor W.; Zarate, Carlos A.; Fujita, Masahiro; Innis, Robert B.
2014-01-01
Image-derived input function (IDIF) obtained by manually drawing carotid arteries (manual-IDIF) can be reliably used in [11C](R)-rolipram positron emission tomography (PET) scans. However, manual-IDIF is time consuming and subject to inter- and intra-operator variability. To overcome this limitation, we developed a fully automated technique for deriving IDIF with a supervised clustering algorithm (SVCA). To validate this technique, 25 healthy controls and 26 patients with moderate to severe major depressive disorder (MDD) underwent T1-weighted brain magnetic resonance imaging (MRI) and a 90-minute [11C](R)-rolipram PET scan. For each subject, metabolite-corrected input function was measured from the radial artery. SVCA templates were obtained from 10 additional healthy subjects who underwent the same MRI and PET procedures. Cluster-IDIF was obtained as follows: 1) template mask images were created for carotid and surrounding tissue; 2) parametric image of weights for blood were created using SVCA; 3) mask images to the individual PET image were inversely normalized; 4) carotid and surrounding tissue time activity curves (TACs) were obtained from weighted and unweighted averages of each voxel activity in each mask, respectively; 5) partial volume effects and radiometabolites were corrected using individual arterial data at four points. Logan-distribution volume (VT/fP) values obtained by cluster-IDIF were similar to reference results obtained using arterial data, as well as those obtained using manual-IDIF; 39 of 51 subjects had a VT/fP error of <5%, and only one had error >10%. With automatic voxel selection, cluster-IDIF curves were less noisy than manual-IDIF and free of operator-related variability. Cluster-IDIF showed widespread decrease of about 20% [11C](R)-rolipram binding in the MDD group. Taken together, the results suggest that cluster-IDIF is a good alternative to full arterial input function for estimating Logan-VT/fP in [11C](R)-rolipram PET clinical
An improved FCM medical image segmentation algorithm based on MMTD.
Zhou, Ningning; Yang, Tingting; Zhang, Shaobai
2014-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) is one of the popular clustering algorithms for medical image segmentation. But FCM is highly vulnerable to noise due to not considering the spatial information in image segmentation. This paper introduces medium mathematics system which is employed to process fuzzy information for image segmentation. It establishes the medium similarity measure based on the measure of medium truth degree (MMTD) and uses the correlation of the pixel and its neighbors to define the medium membership function. An improved FCM medical image segmentation algorithm based on MMTD which takes some spatial features into account is proposed in this paper. The experimental results show that the proposed algorithm is more antinoise than the standard FCM, with more certainty and less fuzziness. This will lead to its practicable and effective applications in medical image segmentation.
Hall, L O; Bensaid, A M; Clarke, L P; Velthuizen, R P; Silbiger, M S; Bezdek, J C
1992-01-01
Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms, and a supervised computational neural network. Initial clinical results are presented on normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. For a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed, with fuzz-c-means approaches being slightly preferred over feedforward cascade correlation results. Various facets of both approaches, such as supervised versus unsupervised learning, time complexity, and utility for the diagnostic process, are compared.
Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong
2015-01-01
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong
2015-01-01
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands.
NASA Astrophysics Data System (ADS)
Turan, Muhammed K.; Sehirli, Eftal; Elen, Abdullah; Karas, Ismail R.
2015-07-01
Gel electrophoresis (GE) is one of the most used method to separate DNA, RNA, protein molecules according to size, weight and quantity parameters in many areas such as genetics, molecular biology, biochemistry, microbiology. The main way to separate each molecule is to find borders of each molecule fragment. This paper presents a software application that show columns edges of DNA fragments in 3 steps. In the first step the application obtains lane histograms of agarose gel electrophoresis images by doing projection based on x-axis. In the second step, it utilizes k-means clustering algorithm to classify point values of lane histogram such as left side values, right side values and undesired values. In the third step, column edges of DNA fragments is shown by using mean algorithm and mathematical processes to separate DNA fragments from the background in a fully automated way. In addition to this, the application presents locations of DNA fragments and how many DNA fragments exist on images captured by a scientific camera.
Shrimankar, D. D.; Sathe, S. R.
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
Shrimankar, D D; Sathe, S R
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Kidney segmentation in CT sequences using SKFCM and improved GrowCut algorithm
2015-01-01
Background Organ segmentation is an important step in computer-aided diagnosis and pathology detection. Accurate kidney segmentation in abdominal computed tomography (CT) sequences is an essential and crucial task for surgical planning and navigation in kidney tumor ablation. However, kidney segmentation in CT is a substantially challenging work because the intensity values of kidney parenchyma are similar to those of adjacent structures. Results In this paper, a coarse-to-fine method was applied to segment kidney from CT images, which consists two stages including rough segmentation and refined segmentation. The rough segmentation is based on a kernel fuzzy C-means algorithm with spatial information (SKFCM) algorithm and the refined segmentation is implemented with improved GrowCut (IGC) algorithm. The SKFCM algorithm introduces a kernel function and spatial constraint into fuzzy c-means clustering (FCM) algorithm. The IGC algorithm makes good use of the continuity of CT sequences in space which can automatically generate the seed labels and improve the efficiency of segmentation. The experimental results performed on the whole dataset of abdominal CT images have shown that the proposed method is accurate and efficient. The method provides a sensitivity of 95.46% with specificity of 99.82% and performs better than other related methods. Conclusions Our method achieves high accuracy in kidney segmentation and considerably reduces the time and labor required for contour delineation. In addition, the method can be expanded to 3D segmentation directly without modification. PMID:26356850
NASA Astrophysics Data System (ADS)
Micheletti, Natan; Tonini, Marj; Lane, Stuart N.
2017-02-01
Acquisition of high density point clouds using terrestrial laser scanners (TLSs) has become commonplace in geomorphic science. The derived point clouds are often interpolated onto regular grids and the grids compared to detect change (i.e. erosion and deposition/advancement movements). This procedure is necessary for some applications (e.g. digital terrain analysis), but it inevitably leads to a certain loss of potentially valuable information contained within the point clouds. In the present study, an alternative methodology for geomorphological analysis and feature detection from point clouds is proposed. It rests on the use of the Density-Based Spatial Clustering of Applications with Noise (DBSCAN), applied to TLS data for a rock glacier front slope in the Swiss Alps. The proposed methods allowed the detection and isolation of movements directly from point clouds which yield to accuracies in the following computation of volumes that depend only on the actual registered distance between points. We demonstrated that these values are more conservative than volumes computed with the traditional DEM comparison. The results are illustrated for the summer of 2015, a season of enhanced geomorphic activity associated with exceptionally high temperatures.
Ahmadi, Katayoon; Fouladi Nia, Babak
2016-01-01
Automatic segmentation of medical CT scan images is one of the most challenging fields in digital image processing. The goal of this paper is to discuss the automatic segmentation of CT scan images to detect and separate vessels in the liver. The segmentation of liver vessels is very important in the liver surgery planning and identifying the structure of vessels and their relationship to tumors. Fuzzy C-means (FCM) method has already been proposed for segmentation of liver vessels. Due to classical optimization process, this method suffers lack of sensitivity to the initial values of class centers and segmentation of local minima. In this article, a method based on FCM in conjunction with genetic algorithms (GA) is applied for segmentation of liver's blood vessels. This method was simulated and validated using 20 CT scan images of the liver. The results showed that the accuracy, sensitivity, specificity, and CPU time of new method in comparison with FCM algorithm reaching up to 91%, 83.62, 94.11%, and 27.17 were achieved, respectively. Moreover, selection of optimal and robust parameters in the initial step led to rapid convergence of the proposed method. The outcome of this research assists medical teams in estimating disease progress and selecting proper treatments. PMID:28044090
Ahmadi, Katayoon; Karimi, Abbas; Fouladi Nia, Babak
2016-01-01
Automatic segmentation of medical CT scan images is one of the most challenging fields in digital image processing. The goal of this paper is to discuss the automatic segmentation of CT scan images to detect and separate vessels in the liver. The segmentation of liver vessels is very important in the liver surgery planning and identifying the structure of vessels and their relationship to tumors. Fuzzy C-means (FCM) method has already been proposed for segmentation of liver vessels. Due to classical optimization process, this method suffers lack of sensitivity to the initial values of class centers and segmentation of local minima. In this article, a method based on FCM in conjunction with genetic algorithms (GA) is applied for segmentation of liver's blood vessels. This method was simulated and validated using 20 CT scan images of the liver. The results showed that the accuracy, sensitivity, specificity, and CPU time of new method in comparison with FCM algorithm reaching up to 91%, 83.62, 94.11%, and 27.17 were achieved, respectively. Moreover, selection of optimal and robust parameters in the initial step led to rapid convergence of the proposed method. The outcome of this research assists medical teams in estimating disease progress and selecting proper treatments.
NASA Astrophysics Data System (ADS)
Ward, W. O. C.; Wilkinson, P. B.; Chambers, J. E.; Oxby, L. S.; Bai, L.
2014-04-01
A novel method for the effective identification of bedrock subsurface elevation from electrical resistivity tomography images is described. Identifying subsurface boundaries in the topographic data can be difficult due to smoothness constraints used in inversion, so a statistical population-based approach is used that extends previous work in calculating isoresistivity surfaces. The analysis framework involves a procedure for guiding a clustering approach based on the fuzzy c-means algorithm. An approximation of resistivity distributions, found using kernel density estimation, was utilized as a means of guiding the cluster centroids used to classify data. A fuzzy method was chosen over hard clustering due to uncertainty in hard edges in the topography data, and a measure of clustering uncertainty was identified based on the reciprocal of cluster membership. The algorithm was validated using a direct comparison of known observed bedrock depths at two 3-D survey sites, using real-time GPS information of exposed bedrock by quarrying on one site, and borehole logs at the other. Results show similarly accurate detection as a leading isosurface estimation method, and the proposed algorithm requires significantly less user input and prior site knowledge. Furthermore, the method is effectively dimension-independent and will scale to data of increased spatial dimensions without a significant effect on the runtime. A discussion on the results by automated versus supervised analysis is also presented.
Adaptive Clustering of Hypermedia Documents.
ERIC Educational Resources Information Center
Johnson, Andrew; Fotouhi, Farshad
1996-01-01
Discussion of hypermedia systems focuses on a comparison of two types of adaptive algorithm (genetic algorithm and neural network) in clustering hypermedia documents. These clusters allow the user to index into the nodes to find needed information more quickly, since clustering is "personalized" based on the user's paths rather than…
Matlab Cluster Ensemble Toolbox
Sapio, Vincent De; Kegelmeyer, Philip
2009-04-27
This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. With regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.
Douglass, Michael; Bezak, Eva; Penfold, Scott
2015-04-21
The preliminary framework of a combined radiobiological model is developed and calibrated in the current work. The model simulates the production of individual cells forming a tumour, the spatial distribution of individual ionization events (using Geant4-DNA) and the stochastic biochemical repair of DNA double strand breaks (DSBs) leading to the prediction of survival or death of individual cells. In the current work, we expand upon a previously developed tumour generation and irradiation model to include a stochastic ionization damage clustering and DNA lesion repair model. The Geant4 code enabled the positions of each ionization event in the cells to be simulated and recorded for analysis. An algorithm was developed to cluster the ionization events in each cell into simple and complex double strand breaks. The two lesion kinetic (TLK) model was then adapted to predict DSB repair kinetics and the resultant cell survival curve. The parameters in the cell survival model were then calibrated using experimental cell survival data of V79 cells after low energy proton irradiation. A monolayer of V79 cells was simulated using the tumour generation code developed previously. The cells were then irradiated by protons with mean energies of 0.76 MeV and 1.9 MeV using a customized version of Geant4. By replicating the experimental parameters of a low energy proton irradiation experiment and calibrating the model with two sets of data, the model is now capable of predicting V79 cell survival after low energy (<2 MeV) proton irradiation for a custom set of input parameters. The novelty of this model is the realistic cellular geometry which can be irradiated using Geant4-DNA and the method in which the double strand breaks are predicted from clustering the spatial distribution of ionisation events. Unlike the original TLK model which calculates a tumour average cell survival probability, the cell survival probability is calculated for each cell in the geometric tumour model
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2014-03-01
We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the q-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported the idea of the GPU implementation for 2D models (Komura and Okabe, 2012). We here explain the details of sample programs, and discuss the performance of the present GPU implementation for the 3D Ising and XY models. We also show the calculated results of the moment ratio for these models, and discuss phase transitions. Catalogue identifier: AERM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 5632 No. of bytes in distributed program, including test data, etc.: 14688 Distribution format: tar.gz Programming language: C, CUDA. Computer: System with an NVIDIA CUDA enabled GPU. Operating system: System with an NVIDIA CUDA enabled GPU. Classification: 23. External routines: NVIDIA CUDA Toolkit 3.0 or newer Nature of problem: Monte Carlo simulation of classical spin systems. Ising, q-state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices. Solution method: GPU-based Swendsen-Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [1] and that by Kalentev et al. [2]. Restrictions: The system size is limited depending on the memory of a GPU. Running time: For the parameters used in the sample programs, it takes about a minute for each program. Of course, it depends on the system size, the number of Monte Carlo steps, etc. References: [1] K
NASA Astrophysics Data System (ADS)
Douglass, Michael; Bezak, Eva; Penfold, Scott
2015-04-01
The preliminary framework of a combined radiobiological model is developed and calibrated in the current work. The model simulates the production of individual cells forming a tumour, the spatial distribution of individual ionization events (using Geant4-DNA) and the stochastic biochemical repair of DNA double strand breaks (DSBs) leading to the prediction of survival or death of individual cells. In the current work, we expand upon a previously developed tumour generation and irradiation model to include a stochastic ionization damage clustering and DNA lesion repair model. The Geant4 code enabled the positions of each ionization event in the cells to be simulated and recorded for analysis. An algorithm was developed to cluster the ionization events in each cell into simple and complex double strand breaks. The two lesion kinetic (TLK) model was then adapted to predict DSB repair kinetics and the resultant cell survival curve. The parameters in the cell survival model were then calibrated using experimental cell survival data of V79 cells after low energy proton irradiation. A monolayer of V79 cells was simulated using the tumour generation code developed previously. The cells were then irradiated by protons with mean energies of 0.76 MeV and 1.9 MeV using a customized version of Geant4. By replicating the experimental parameters of a low energy proton irradiation experiment and calibrating the model with two sets of data, the model is now capable of predicting V79 cell survival after low energy (<2 MeV) proton irradiation for a custom set of input parameters. The novelty of this model is the realistic cellular geometry which can be irradiated using Geant4-DNA and the method in which the double strand breaks are predicted from clustering the spatial distribution of ionisation events. Unlike the original TLK model which calculates a tumour average cell survival probability, the cell survival probability is calculated for each cell in the geometric tumour model
Lu, Jing; Chen, Lei; Yin, Jun; Huang, Tao; Bi, Yi; Kong, Xiangyin; Zheng, Mingyue; Cai, Yu-Dong
2016-01-01
Lung cancer, characterized by uncontrolled cell growth in the lung tissue, is the leading cause of global cancer deaths. Until now, effective treatment of this disease is limited. Many synthetic compounds have emerged with the advancement of combinatorial chemistry. Identification of effective lung cancer candidate drug compounds among them is a great challenge. Thus, it is necessary to build effective computational methods that can assist us in selecting for potential lung cancer drug compounds. In this study, a computational method was proposed to tackle this problem. The chemical-chemical interactions and chemical-protein interactions were utilized to select candidate drug compounds that have close associations with approved lung cancer drugs and lung cancer-related genes. A permutation test and K-means clustering algorithm were employed to exclude candidate drugs with low possibilities to treat lung cancer. The final analysis suggests that the remaining drug compounds have potential anti-lung cancer activities and most of them have structural dissimilarity with approved drugs for lung cancer.
Delineation of river bed-surface patches by clustering high-resolution spatial grain size data
NASA Astrophysics Data System (ADS)
Nelson, Peter A.; Bellugi, Dino; Dietrich, William E.
2014-01-01
The beds of gravel-bed rivers commonly display distinct sorting patterns, which at length scales of ~ 0.1 - 1 channel widths appear to form an organization of patches or facies. This paper explores alternatives to traditional visual facies mapping by investigating methods of patch delineation in which clustering analysis is applied to a high-resolution grid of spatial grain-size distributions (GSDs) collected during a flume experiment. Specifically, we examine four clustering techniques: 1) partitional clustering of grain-size distributions with the k-means algorithm (assigning each GSD to a type of patch based solely on its distribution characteristics), 2) spatially-constrained agglomerative clustering ("growing" patches by merging adjacent GSDs, thus generating a hierarchical structure of patchiness), 3) spectral clustering using Normalized Cuts (using the spatial distance between GSDs and the distribution characteristics to generate a matrix describing the similarity between all GSDs, and using the eigenvalues of this matrix to divide the bed into patches), and 4) fuzzy clustering with the fuzzy c-means algorithm (assigning each GSD a membership probability to every patch type). For each clustering method, we calculate metrics describing how well-separated cluster-average GSDs are and how patches are arranged in space. We use these metrics to compute optimal clustering parameters, to compare the clustering methods against each other, and to compare clustering results with patches mapped visually during the flume experiment.All clustering methods produced better-separated patch GSDs than the visually-delineated patches. Although they do not produce crisp cluster assignment, fuzzy algorithms provide useful information that can characterize the uncertainty of a location on the bed belonging to any particular type of patch, and they can be used to characterize zones of transition from one patch to another. The extent to which spatial information influences
NASA Astrophysics Data System (ADS)
Khateri, Parisa; Rad, Hamidreza Saligheh; Jafari, Amir Homayoun; Ay, Mohammad Reza
2014-01-01
Quantitative PET image reconstruction requires an accurate map of attenuation coefficients of the tissue under investigation at 511 keV (μ-map), and in order to correct the emission data for attenuation. The use of MRI-based attenuation correction (MRAC) has recently received lots of attention in the scientific literature. One of the major difficulties facing MRAC has been observed in the areas where bone and air collide, e.g. ethmoidal sinuses in the head area. Bone is intrinsically not detectable by conventional MRI, making it difficult to distinguish air from bone. Therefore, development of more versatile MR sequences to label the bone structure, e.g. ultra-short echo-time (UTE) sequences, certainly plays a significant role in novel methodological developments. However, long acquisition time and complexity of UTE sequences limit its clinical applications. To overcome this problem, we developed a novel combination of Short-TE (ShTE) pulse sequence to detect bone signal with a 2-point Dixon technique for water-fat discrimination, along with a robust image segmentation method based on fuzzy clustering C-means (FCM) to segment the head area into four classes of air, bone, soft tissue and adipose tissue. The imaging protocol was set on a clinical 3 T Tim Trio and also 1.5 T Avanto (Siemens Medical Solution, Erlangen, Germany) employing a triple echo time pulse sequence in the head area. The acquisition parameters were as follows: TE1/TE2/TE3=0.98/4.925/6.155 ms, TR=8 ms, FA=25 on the 3 T system, and TE1/TE2/TE3=1.1/2.38/4.76 ms, TR=16 ms, FA=18 for the 1.5 T system. The second and third echo-times belonged to the Dixon decomposition to distinguish soft and adipose tissues. To quantify accuracy, sensitivity and specificity of the bone segmentation algorithm, resulting classes of MR-based segmented bone were compared with the manual segmented one by our expert neuro-radiologist. Results for both 3 T and 1.5 T systems show that bone segmentation applied in several
NASA Astrophysics Data System (ADS)
Hsu, Kuo-Hsien
2012-11-01
Formosat-2 image is a kind of high-spatial-resolution (2 meters GSD) remote sensing satellite data, which includes one panchromatic band and four multispectral bands (Blue, Green, Red, near-infrared). An essential sector in the daily processing of received Formosat-2 image is to estimate the cloud statistic of image using Automatic Cloud Coverage Assessment (ACCA) algorithm. The information of cloud statistic of image is subsequently recorded as an important metadata for image product catalog. In this paper, we propose an ACCA method with two consecutive stages: preprocessing and post-processing analysis. For pre-processing analysis, the un-supervised K-means classification, Sobel's method, thresholding method, non-cloudy pixels reexamination, and cross-band filter method are implemented in sequence for cloud statistic determination. For post-processing analysis, Box-Counting fractal method is implemented. In other words, the cloud statistic is firstly determined via pre-processing analysis, the correctness of cloud statistic of image of different spectral band is eventually cross-examined qualitatively and quantitatively via post-processing analysis. The selection of an appropriate thresholding method is very critical to the result of ACCA method. Therefore, in this work, We firstly conduct a series of experiments of the clustering-based and spatial thresholding methods that include Otsu's, Local Entropy(LE), Joint Entropy(JE), Global Entropy(GE), and Global Relative Entropy(GRE) method, for performance comparison. The result shows that Otsu's and GE methods both perform better than others for Formosat-2 image. Additionally, our proposed ACCA method by selecting Otsu's method as the threshoding method has successfully extracted the cloudy pixels of Formosat-2 image for accurate cloud statistic estimation.
Haplotyping Problem, A Clustering Approach
Eslahchi, Changiz; Sadeghi, Mehdi; Pezeshk, Hamid; Kargar, Mehdi; Poormohammadi, Hadi
2007-09-06
Construction of two haplotypes from a set of Single Nucleotide Polymorphism (SNP) fragments is called haplotype reconstruction problem. One of the most popular computational model for this problem is Minimum Error Correction (MEC). Since MEC is an NP-hard problem, here we propose a novel heuristic algorithm based on clustering analysis in data mining for haplotype reconstruction problem. Based on hamming distance and similarity between two fragments, our iterative algorithm produces two clusters of fragments; then, in each iteration, the algorithm assigns a fragment to one of the clusters. Our results suggest that the algorithm has less reconstruction error rate in comparison with other algorithms.
Haplotyping Problem, A Clustering Approach
NASA Astrophysics Data System (ADS)
Eslahchi, Changiz; Sadeghi, Mehdi; Pezeshk, Hamid; Kargar, Mehdi; Poormohammadi, Hadi
2007-09-01
Construction of two haplotypes from a set of Single Nucleotide Polymorphism (SNP) fragments is called haplotype reconstruction problem. One of the most popular computational model for this problem is Minimum Error Correction (MEC). Since MEC is an NP-hard problem, here we propose a novel heuristic algorithm based on clustering analysis in data mining for haplotype reconstruction problem. Based on hamming distance and similarity between two fragments, our iterative algorithm produces two clusters of fragments; then, in each iteration, the algorithm assigns a fragment to one of the clusters. Our results suggest that the algorithm has less reconstruction error rate in comparison with other algorithms.
Devignes, Marie-Dominique; Benabderrahmane, Sidahmed; Smaïl-Tabbone, Malika; Napoli, Amedeo; Poch, Olivier
2012-01-01
Functional classification aims at grouping genes according to their molecular function or the biological process they participate in. Evaluating the validity of such unsupervised gene classification remains a challenge given the variety of distance measures and classification algorithms that can be used. We evaluate here functional classification of genes with the help of reference sets: KEGG (Kyoto Encyclopaedia of Genes and Genomes) pathways and Pfam clans. These sets represent ground truth for any distance based on GO (Gene Ontology) biological process and molecular function annotations respectively. Overlaps between clusters and reference sets are estimated by the F-score method. We test our previously described IntelliGO semantic distance with hierarchical and fuzzy C-means clustering and we compare results with the state-of-the-art DAVID (Database for Annotation Visualisation and Integrated Discovery) functional classification method. Finally, study of best matching clusters to reference sets leads us to propose a set-difference method for discovering missing information.
Fuzzy clustering of large satellite images using high performance computing
NASA Astrophysics Data System (ADS)
Petcu, Dana; Zaharie, Daniela; Panica, Silviu; Hussein, Ashraf S.; Sayed, Ahmed; El-Shishiny, Hisham
2011-11-01
Fuzzy clustering is one of the most frequently used methods for identifying homogeneous regions in remote sensing images. In the case of large images, the computational costs of fuzzy clustering can be prohibitive unless high performance computing is used. Therefore, efficient parallel implementations are highly desirable. This paper presents results on the efficiency of a parallelization strategy for the Fuzzy c-Means (FCM) algorithm. In addition, the parallelization strategy has been extended in the case of two FCM variants, which incorporates spatial information (Spatial FCM and Gaussian Kernel-based FCM with spatial bias correction). The high-level requirements that guided the formulation of the proposed parallel implementations are: (i) find appropriate partitioning of large images in order to ensure a balanced load of processors; (ii) use as much as possible the collective computations; (iii) reduce the cost of communications between processors. The parallel implementations were tested through several test cases including multispectral images and images having a large number of pixels. The experiments were conducted on both a computational cluster and a BlueGene/P supercomputer with up to 1024 processors. Generally, good scalability was obtained both with respect to the number of clusters and the number of spectral bands.
Overview on techniques in cluster analysis.
Frades, Itziar; Matthiesen, Rune
2010-01-01
Clustering is the unsupervised, semisupervised, and supervised classification of patterns into groups. The clustering problem has been addressed in many contexts and disciplines. Cluster analysis encompasses different methods and algorithms for grouping objects of similar kinds into respective categories. In this chapter, we describe a number of methods and algorithms for cluster analysis in a stepwise framework. The steps of a typical clustering analysis process include sequentially pattern representation, the choice of the similarity measure, the choice of the clustering algorithm, the assessment of the output, and the representation of the clusters.
Single-cluster dynamics for the random-cluster model
NASA Astrophysics Data System (ADS)
Deng, Youjin; Qian, Xiaofeng; Blöte, Henk W. J.
2009-09-01
We formulate a single-cluster Monte Carlo algorithm for the simulation of the random-cluster model. This algorithm is a generalization of the Wolff single-cluster method for the q -state Potts model to noninteger values q>1 . Its results for static quantities are in a satisfactory agreement with those of the existing Swendsen-Wang-Chayes-Machta (SWCM) algorithm, which involves a full-cluster decomposition of random-cluster configurations. We explore the critical dynamics of this algorithm for several two-dimensional Potts and random-cluster models. For integer q , the single-cluster algorithm can be reduced to the Wolff algorithm, for which case we find that the autocorrelation functions decay almost purely exponentially, with dynamic exponents zexp=0.07 (1), 0.521 (7), and 1.007 (9) for q=2 , 3, and 4, respectively. For noninteger q , the dynamical behavior of the single-cluster algorithm appears to be very dissimilar to that of the SWCM algorithm. For large critical systems, the autocorrelation function displays a range of power-law behavior as a function of time. The dynamic exponents are relatively large. We provide an explanation for this peculiar dynamic behavior.
Single-cluster dynamics for the random-cluster model.
Deng, Youjin; Qian, Xiaofeng; Blöte, Henk W J
2009-09-01
We formulate a single-cluster Monte Carlo algorithm for the simulation of the random-cluster model. This algorithm is a generalization of the Wolff single-cluster method for the q-state Potts model to noninteger values q>1. Its results for static quantities are in a satisfactory agreement with those of the existing Swendsen-Wang-Chayes-Machta (SWCM) algorithm, which involves a full-cluster decomposition of random-cluster configurations. We explore the critical dynamics of this algorithm for several two-dimensional Potts and random-cluster models. For integer q, the single-cluster algorithm can be reduced to the Wolff algorithm, for which case we find that the autocorrelation functions decay almost purely exponentially, with dynamic exponents z(exp)=0.07 (1), 0.521 (7), and 1.007 (9) for q=2, 3, and 4, respectively. For noninteger q, the dynamical behavior of the single-cluster algorithm appears to be very dissimilar to that of the SWCM algorithm. For large critical systems, the autocorrelation function displays a range of power-law behavior as a function of time. The dynamic exponents are relatively large. We provide an explanation for this peculiar dynamic behavior.
A Linear Algebra Measure of Cluster Quality.
ERIC Educational Resources Information Center
Mather, Laura A.
2000-01-01
Discussion of models for information retrieval focuses on an application of linear algebra to text clustering, namely, a metric for measuring cluster quality based on the theory that cluster quality is proportional to the number of terms that are disjoint across the clusters. Explains term-document matrices and clustering algorithms. (Author/LRW)
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.; Bensaid, Amine M.; Clarke, Laurence P.; Velthuizen, Robert P.; Silbiger, Martin S.; Bezdek, James C.
1992-01-01
Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms and a supervised computational neural network, a dynamic multilayered perception trained with the cascade correlation learning algorithm. Initial clinical results are presented on both normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. However, for a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed.
Cross-domain, soft-partition clustering with diversity measure and knowledge reference
Qian, Pengjiang; Sun, Shouwei; Jiang, Yizhang; Su, Kuan-Hao; Ni, Tongguang; Wang, Shitong; Muzic, Raymond F.
2016-01-01
Conventional, soft-partition clustering approaches, such as fuzzy c-means (FCM), maximum entropy clustering (MEC) and fuzzy clustering by quadratic regularization (FC-QR), are usually incompetent in those situations where the data are quite insufficient or much polluted by underlying noise or outliers. In order to address this challenge, the quadratic weights and Gini-Simpson diversity based fuzzy clustering model (QWGSD-FC), is first proposed as a basis of our work. Based on QWGSD-FC and inspired by transfer learning, two types of cross-domain, soft-partition clustering frameworks and their corresponding algorithms, referred to as type-I/type-II knowledge-transfer-oriented c-means (TI-KT-CM and TII-KT-CM), are subsequently presented, respectively. The primary contributions of our work are four-fold: (1) The delicate QWGSD-FC model inherits the most merits of FCM, MEC and FC-QR. With the weight factors in the form of quadratic memberships, similar to FCM, it can more effectively calculate the total intra-cluster deviation than the linear form recruited in MEC and FC-QR. Meanwhile, via Gini-Simpson diversity index, like Shannon entropy in MEC, and equivalent to the quadratic regularization in FC-QR, QWGSD-FC is prone to achieving the unbiased probability assignments, (2) owing to the reference knowledge from the source domain, both TI-KT-CM and TII-KT-CM demonstrate high clustering effectiveness as well as strong parameter robustness in the target domain, (3) TI-KT-CM refers merely to the historical cluster centroids, whereas TII-KT-CM simultaneously uses the historical cluster centroids and their associated fuzzy memberships as the reference. This indicates that TII-KT-CM features more comprehensive knowledge learning capability than TI-KT-CM and TII-KT-CM consequently exhibits more perfect cross-domain clustering performance and (4) neither the historical cluster centroids nor the historical cluster centroid based fuzzy memberships involved in TI-KT-CM or TII
Weigend, Florian
2014-10-07
Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf{sub 12} and [LaPb{sub 7}Bi{sub 7}]{sup 4−}. For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the “pure” genetic algorithm.
Weigend, Florian
2014-10-07
Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf12 and [LaPb7Bi7](4-). For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the "pure" genetic algorithm.
NASA Astrophysics Data System (ADS)
Zhou, Pu-cheng; Liu, Cun-chao
2013-08-01
Camouflaged targets detection in complex background is a challenging problem. Spectral-polarimetric imaging can offers spectral information and polarization information from the objects in the scene. Fusion of the spectral and polarization information in the images will result in better camouflaged target identification and recognition. In this paper a novel spectral-polarimetric image fusion algorithm based on Shearlet transform is proposed. Firstly, every polarimetric image in each wave band is decomposed into images of low frequency components and high frequency components by Shearlet transform. Then, the fused low frequency approximate coefficients are obtained with weighted average method, and the fused high frequency coefficients are obtained with area-based feature selection method, so features and details from different spectral-polarimetric images are fused successfully. After that, the kernel fuzzy c-means clustering algorithm is used for camouflaged object separation from its background. Experimental results have shown that better identification performance was achieved.
NASA Astrophysics Data System (ADS)
Borgelt, Christian
In clustering we often face the situation that only a subset of the available attributes is relevant for forming clusters, even though this may not be known beforehand. In such cases it is desirable to have a clustering algorithm that automatically weights attributes or even selects a proper subset. In this paper I study such an approach for fuzzy clustering, which is based on the idea to transfer an alternative to the fuzzifier (Klawonn and Höppner, What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier, In: Proc. 5th Int. Symp. on Intelligent Data Analysis, 254-264, Springer, Berlin, 2003) to attribute weighting fuzzy clustering (Keller and Klawonn, Int J Uncertain Fuzziness Knowl Based Syst 8:735-746, 2000). In addition, by reformulating Gustafson-Kessel fuzzy clustering, a scheme for weighting and selecting principal axes can be obtained. While in Borgelt (Feature weighting and feature selection in fuzzy clustering, In: Proc. 17th IEEE Int. Conf. on Fuzzy Systems, IEEE Press, Piscataway, NJ, 2008) I already presented such an approach for a global selection of attributes and principal axes, this paper extends it to a cluster-specific selection, thus arriving at a fuzzy subspace clustering algorithm (Parsons, Haque, and Liu, 2004).
Unconventional methods for clustering
NASA Astrophysics Data System (ADS)
Kotyrba, Martin
2016-06-01
Cluster analysis or clustering is a task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is the main task of exploratory data mining and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. The topic of this paper is one of the modern methods of clustering namely SOM (Self Organising Map). The paper describes the theory needed to understand the principle of clustering and descriptions of algorithm used with clustering in our experiments.
Change detection in synthetic aperture radar images based on image fusion and fuzzy clustering.
Gong, Maoguo; Zhou, Zhiqiang; Ma, Jingjing
2012-04-01
This paper presents an unsupervised distribution-free change detection approach for synthetic aperture radar (SAR) images based on an image fusion strategy and a novel fuzzy clustering algorithm. The image fusion technique is introduced to generate a difference image by using complementary information from a mean-ratio image and a log-ratio image. In order to restrain the background information and enhance the information of changed regions in the fused difference image, wavelet fusion rules based on an average operator and minimum local area energy are chosen to fuse the wavelet coefficients for a low-frequency band and a high-frequency band, respectively. A reformulated fuzzy local-information C-means clustering algorithm is proposed for classifying changed and unchanged regions in the fused difference image. It incorporates the information about spatial context in a novel fuzzy way for the purpose of enhancing the changed information and of reducing the effect of speckle noise. Experiments on real SAR images show that the image fusion strategy integrates the advantages of the log-ratio operator and the mean-ratio operator and gains a better performance. The change detection results obtained by the improved fuzzy clustering algorithm exhibited lower error than its preexistences.
Brightest Cluster Galaxy Identification
NASA Astrophysics Data System (ADS)
Leisman, Luke; Haarsma, D. B.; Sebald, D. A.; ACCEPT Team
2011-01-01
Brightest cluster galaxies (BCGs) play an important role in several fields of astronomical research. The literature includes many different methods and criteria for identifying the BCG in the cluster, such as choosing the brightest galaxy, the galaxy nearest the X-ray peak, or the galaxy with the most extended profile. Here we examine a sample of 75 clusters from the Archive of Chandra Cluster Entropy Profile Tables (ACCEPT) and the Sloan Digital Sky Survey (SDSS), measuring masked magnitudes and profiles for BCG candidates in each cluster. We first identified galaxies by hand; in 15% of clusters at least one team member selected a different galaxy than the others.We also applied 6 other identification methods to the ACCEPT sample; in 30% of clusters at least one of these methods selected a different galaxy than the other methods. We then developed an algorithm that weighs brightness, profile, and proximity to the X-ray peak and centroid. This algorithm incorporates the advantages of by-hand identification (weighing multiple properties) and automated selection (repeatable and consistent). The BCG population chosen by the algorithm is more uniform in its properties than populations selected by other methods, particularly in the relation between absolute magnitude (a proxy for galaxy mass) and average gas temperature (a proxy for cluster mass). This work supported by a Barry M. Goldwater Scholarship and a Sid Jansma Summer Research Fellowship.
NASA Astrophysics Data System (ADS)
Pereira, Sebastián; Campusano, Luis E.; Hitschfeld-Kahler, Nancy; Pizarro, Daniel; Haines, Christopher P.; Clowes, Roger G.; Marinello, Gabriel; Söchting, Ilona K.
2017-04-01
This paper is the first in a series, presenting a new galaxy cluster finder based on a three-dimensional Voronoi Tesselation plus a maximum likelihood estimator, followed by gapping-filtering in radial velocity(VoML+G). The scientific aim of the series is a reassessment of the diversity of optical clusters in the local universe. A mock galaxy database mimicking the southern strip of the magnitude(blue)-limited 2dF Galaxy Redshift Survey (2dFGRS), for the redshift range 0.009 < z < 0.22, is built on the basis of the Millennium Simulation of the LCDM cosmology and a reference catalog of “Millennium clusters,” spannning across the 1.0 × 1012–1.0 × 1015 M ⊙ h ‑1 dark matter (DM) halo mass range, is recorded. The validation of VoML+G is performed through its application to the mock data and the ensuing determination of the completeness and purity of the cluster detections by comparison with the reference catalog. The execution of VoML+G over the 2dFGRS mock data identified 1614 clusters, 22% with N g ≥ 10, 64 percent with 10 > N g ≥ 5, and 14% with N g < 5. The ensemble of VoML+G clusters has a ∼59% completeness and a ∼66% purity, whereas the subsample with N g ≥ 10, to z ∼ 0.14, has greatly improved mean rates of ∼75% and ∼90%, respectively. The VoML+G cluster velocity dispersions are found to be compatible with those corresponding to “Millennium clusters” over the 300–1000 km s‑1 interval, i.e., for cluster halo masses in excess of ∼3.0 × 1013 M ⊙ h ‑1.
Algorithms and Algorithmic Languages.
ERIC Educational Resources Information Center
Veselov, V. M.; Koprov, V. M.
This paper is intended as an introduction to a number of problems connected with the description of algorithms and algorithmic languages, particularly the syntaxes and semantics of algorithmic languages. The terms "letter, word, alphabet" are defined and described. The concept of the algorithm is defined and the relation between the algorithm and…
A patch-based tensor decomposition algorithm for M-FISH image classification.
Wang, Min; Huang, Ting-Zhu; Li, Jingyao; Wang, Yu-Ping
2016-05-03
Multiplex-fluorescence in situ hybridization (M-FISH) is a chromosome imaging technique which can be used to detect chromosomal abnormalities such as translocations, deletions, duplications, and inversions. Chromosome classification from M-FISH imaging data is a key step to implement the technique. In the classified M-FISH image, each pixel in a chromosome is labeled with a class index and drawn with a pseudo-color so that geneticists can easily conduct diagnosis, for example, identifying chromosomal translocations by examining color changes between chromosomes. However, the information of pixels in a neighborhood is often overlooked by existing approaches. In this work, we assume that the pixels in a patch belong to the same class and use the patch to represent the center pixel's class information, by which we can use the correlations of neighboring pixels and the structural information across different spectral channels for the classification. On the basis of assumption, we propose a patch-based classification algorithm by using higher order singular value decomposition (HOSVD). The developed method has been tested on a comprehensive M-FISH database that we established, demonstrating improved performance. When compared with other pixel-wise M-FISH image classifiers such as fuzzy c-means clustering (FCM), adaptive fuzzy c-means clustering (AFCM), improved adaptive fuzzy c-means clustering (IAFCM), and sparse representation classification (SparseRC) methods, the proposed method gave the highest correct classification ratio (CCR), which can translate into improved diagnosis of genetic diseases and cancers. © 2016 International Society for Advancement of Cytometry.
Ugulu, Ilker; Aydin, Halil
2016-01-01
We propose an approach to clustering and visualization of students' cognitive structural models. We use the self-organizing map (SOM) combined with Ward's clustering to conduct cluster analysis. In the study carried out on 100 subjects, a conceptual understanding test consisting of open-ended questions was used as a data collection tool. The results of analyses indicated that students constructed the aliveness concept by associating it predominantly with human. Motion appeared as the most frequently associated term with the aliveness concept. The results suggest that the aliveness concept has been constructed using anthropocentric and animistic cognitive structures. In the next step, we used the data obtained from the conceptual understanding test for training the SOM. Consequently, we propose a visualization method about cognitive structure of the aliveness concept. PMID:26819579
Cluster headache Overview By Mayo Clinic Staff Cluster headaches, which occur in cyclical patterns or clusters, are one of the most painful types of headache. A cluster headache commonly awakens you ...
Slonim, Noam; Atwal, Gurinder Singh; Tkačik, Gašper; Bialek, William
2005-01-01
In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial assumptions about the structure of data. Here, we reformulate the clustering problem from an information theoretic perspective that avoids many of these assumptions. In particular, our formulation obviates the need for defining a cluster “prototype,” does not require an a priori similarity metric, is invariant to changes in the representation of the data, and naturally captures nonlinear relations. We apply this approach to different domains and find that it consistently produces clusters that are more coherent than those extracted by existing algorithms. Finally, our approach provides a way of clustering based on collective notions of similarity rather than the traditional pairwise measures. PMID:16352721
Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold
2014-12-01
In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.
Elaff, Ihab
2016-01-01
Background Brain segmentation from diffusion tensor imaging (DTI) into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) with acceptable results is subjected to many factors. Objectives The most important issue in brain segmentation from DTI images is the selection of suitable scalar indices that best describe the required tissue in the images. Specifying suitable clustering method and suitable number of clusters of the selected method are other factors which affects the segmentation process significantly. Materials and Methods The segmentation process is evaluated using four different clustering methods with different number of clusters where some DTI scalar indices for 10 human brains are processed. Results The aim was to produce results with less segmentation error and a lower computational cost while attempting to minimizing boundary overlapping and minimizing the effect of artifacts due to macroscale scanning. Conclusion The volume ratios of the best produced outputs with respect to the total brain size are 16.7% ± 3.53% for CSF, 35.05% ± 1.13% for WM, and 48.2% ± 2.88% for GM. PMID:27703655
Bayesian Decision Theoretical Framework for Clustering
ERIC Educational Resources Information Center
Chen, Mo
2011-01-01
In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the…
Metamodel-based global optimization using fuzzy clustering for design space reduction
NASA Astrophysics Data System (ADS)
Li, Yulin; Liu, Li; Long, Teng; Dong, Weili
2013-09-01
High fidelity analysis are utilized in modern engineering design optimization problems which involve expensive black-box models. For computation-intensive engineering design problems, efficient global optimization methods must be developed to relieve the computational burden. A new metamodel-based global optimization method using fuzzy clustering for design space reduction (MGO-FCR) is presented. The uniformly distributed initial sample points are generated by Latin hypercube design to construct the radial basis function metamodel, whose accuracy is improved with increasing number of sample points gradually. Fuzzy c-mean method and Gath-Geva clustering method are applied to divide the design space into several small interesting cluster spaces for low and high dimensional problems respectively. Modeling efficiency and accuracy are directly related to the design space, so unconcerned spaces are eliminated by the proposed reduction principle and two pseudo reduction algorithms. The reduction principle is developed to determine whether the current design space should be reduced and which space is eliminated. The first pseudo reduction algorithm improves the speed of clustering, while the second pseudo reduction algorithm ensures the design space to be reduced. Through several numerical benchmark functions, comparative studies with adaptive response surface method, approximated unimodal region elimination method and mode-pursuing sampling are carried out. The optimization results reveal that this method captures the real global optimum for all the numerical benchmark functions. And the number of function evaluations show that the efficiency of this method is favorable especially for high dimensional problems. Based on this global design optimization method, a design optimization of a lifting surface in high speed flow is carried out and this method saves about 10 h compared with genetic algorithms. This method possesses favorable performance on efficiency, robustness
A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.
Liu, Zhi; Xu, Shuqiong; Zhang, Yun; Chen, Chun Lung Philip
2014-11-01
This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method.
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth
2015-01-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
NASA Astrophysics Data System (ADS)
Perez, Adrianna; Moreno, Jorge; Naiman, Jill; Ramirez-Ruiz, Enrico; Hopkins, Philip F.
2017-01-01
In this work, we analyze the environments surrounding star clusters of simulated merging galaxies. Our framework employs Feedback In Realistic Environments (FIRE) model (Hopkins et al., 2014). The FIRE project is a high resolution cosmological simulation that resolves star forming regions and incorporates stellar feedback in a physically realistic way. The project focuses on analyzing the properties of the star clusters formed in merging galaxies. The locations of these star clusters are identified with astrodendro.py, a publicly available dendrogram algorithm. Once star cluster properties are extracted, they will be used to create a sub-grid (smaller than the resolution scale of FIRE) of gas confinement in these clusters. Then, we can examine how the star clusters interact with these available gas reservoirs (either by accreting this mass or blowing it out via feedback), which will determine many properties of the cluster (star formation history, compact object accretion, etc). These simulations will further our understanding of star formation within stellar clusters during galaxy evolution. In the future, we aim to enhance sub-grid prescriptions for feedback specific to processes within star clusters; such as, interaction with stellar winds and gas accretion onto black holes and neutron stars.
Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla
2013-12-01
To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).
NASA Astrophysics Data System (ADS)
Manolopoulou, M.; Plionis, M.
2017-03-01
We study the possible rotation of cluster galaxies, developing, testing, and applying a novel algorithm which identifies rotation, if such does exist, as well as its rotational centre, its axis orientation, rotational velocity amplitude, and, finally, the clockwise or counterclockwise direction of rotation on the plane of the sky. To validate our algorithms we construct realistic Monte Carlo mock rotating clusters and confirm that our method provides robust indications of rotation. We then apply our methodology on a sample of Abell clusters with z ≲ 0.1 with member galaxies selected from the Sloan Digital Sky Survey DR10 spectroscopic data base. After excluding a number of substructured clusters, which could provide erroneous indications of rotation, and taking into account the expected fraction of misidentified coherent substructure velocities for rotation, provided by our Monte Carlo simulation analysis, we find that ∼23 per cent of our clusters are rotating under a set of strict criteria. Loosening the strictness of the criteria, on the expense of introducing spurious rotation indications, we find this fraction increasing to ∼28 per cent. We correlate our rotation indicators with the cluster dynamical state, provided either by their Bautz-Morgan type or by their X-ray isophotal shape and find for those clusters showing rotation within 1.5 h^{-1}_{70} Mpc that the significance of their rotation is related to the dynamically younger phases of cluster formation but after the initial anisotropic accretion and merging has been completed. Finally, finding rotational modes in galaxy clusters could lead to the necessity of correcting the dynamical cluster mass calculations.
Swarm Intelligence in Text Document Clustering
Cui, Xiaohui; Potok, Thomas E
2008-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.
Remote sensing imagery classification using multi-objective gravitational search algorithm
NASA Astrophysics Data System (ADS)
Zhang, Aizhu; Sun, Genyun; Wang, Zhenjie
2016-10-01
Simultaneous optimization of different validity measures can capture different data characteristics of remote sensing imagery (RSI) and thereby achieving high quality classification results. In this paper, two conflicting cluster validity indices, the Xie-Beni (XB) index and the fuzzy C-means (FCM) (Jm) measure, are integrated with a diversity-enhanced and memory-based multi-objective gravitational search algorithm (DMMOGSA) to present a novel multi-objective optimization based RSI classification method. In this method, the Gabor filter method is firstly implemented to extract texture features of RSI. Then, the texture features are syncretized with the spectral features to construct the spatial-spectral feature space/set of the RSI. Afterwards, cluster of the spectral-spatial feature set is carried out on the basis of the proposed method. To be specific, cluster centers are randomly generated initially. After that, the cluster centers are updated and optimized adaptively by employing the DMMOGSA. Accordingly, a set of non-dominated cluster centers are obtained. Therefore, numbers of image classification results of RSI are produced and users can pick up the most promising one according to their problem requirements. To quantitatively and qualitatively validate the effectiveness of the proposed method, the proposed classification method was applied to classifier two aerial high-resolution remote sensing imageries. The obtained classification results are compared with that produced by two single cluster validity index based and two state-of-the-art multi-objective optimization algorithms based classification results. Comparison results show that the proposed method can achieve more accurate RSI classification.
NASA Astrophysics Data System (ADS)
Liu, Xiaoming; Mei, Ming; Liu, Jun; Hu, Wei
2015-12-01
Clustered microcalcifications (MCs) in mammograms are an important early sign of breast cancer in women. Their accurate detection is important in computer-aided detection (CADe). In this paper, we integrated the possibilistic fuzzy c-means (PFCM) clustering algorithm and weighted support vector machine (WSVM) for the detection of MC clusters in full-field digital mammograms (FFDM). For each image, suspicious MC regions are extracted with region growing and active contour segmentation. Then geometry and texture features are extracted for each suspicious MC, a mutual information-based supervised criterion is used to select important features, and PFCM is applied to cluster the samples into two clusters. Weights of the samples are calculated based on possibilities and typicality values from the PFCM, and the ground truth labels. A weighted nonlinear SVM is trained. During the test process, when an unknown image is presented, suspicious regions are located with the segmentation step, selected features are extracted, and the suspicious MC regions are classified as containing MC or not by the trained weighted nonlinear SVM. Finally, the MC regions are analyzed with spatial information to locate MC clusters. The proposed method is evaluated using a database of 410 clinical mammograms and compared with a standard unweighted support vector machine (SVM) classifier. The detection performance is evaluated using response receiver operating (ROC) curves and free-response receiver operating characteristic (FROC) curves. The proposed method obtained an area under the ROC curve of 0.8676, while the standard SVM obtained an area of 0.8268 for MC detection. For MC cluster detection, the proposed method obtained a high sensitivity of 92 % with a false-positive rate of 2.3 clusters/image, and it is also better than standard SVM with 4.7 false-positive clusters/image at the same sensitivity.
Muetterties, Earl L.
1980-05-01
Metal cluster chemistry is one of the most rapidly developing areas of inorganic and organometallic chemistry. Prior to 1960 only a few metal clusters were well characterized. However, shortly after the early development of boron cluster chemistry, the field of metal cluster chemistry began to grow at a very rapid rate and a structural and a qualitative theoretical understanding of clusters came quickly. Analyzed here is the chemistry and the general significance of clusters with particular emphasis on the cluster research within my group. The importance of coordinately unsaturated, very reactive metal clusters is the major subject of discussion.
Time series clustering analysis of health-promoting behavior
NASA Astrophysics Data System (ADS)
Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng
2013-10-01
Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.
Grande, J A; Andújar, J M; Aroba, J; de la Torre, M L; Beltrán, R
2005-04-01
In the present work, Acid Mine Drainage (AMD) processes in the Chorrito Stream, which flows into the Cobica River (Iberian Pyrite Belt, Southwest Spain) are characterized by means of clustering techniques based on fuzzy logic. Also, pH behavior in contrast to precipitation is clearly explained, proving that the influence of rainfall inputs on the acidity and, as a result, on the metal load of a riverbed undergoing AMD processes highly depends on the moment when it occurs. In general, the riverbed dynamic behavior is the response to the sum of instant stimuli produced by isolated rainfall, the seasonal memory depending on the moment of the target hydrological year and, finally, the own inertia of the river basin, as a result of an accumulation process caused by age-long mining activity.
Toward Parallel Document Clustering
Mogill, Jace A.; Haglin, David J.
2011-09-01
A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.
The global Minmax k-means algorithm.
Wang, Xiaoyan; Bai, Yanping
2016-01-01
The global k-means algorithm is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure from suitable initial positions, and employs k-means to minimize the sum of the intra-cluster variances. However the global k-means algorithm sometimes results singleton clusters and the initial positions sometimes are bad, after a bad initialization, poor local optimal can be easily obtained by k-means algorithm. In this paper, we modified the global k-means algorithm to eliminate the singleton clusters at first, and then we apply MinMax k-means clustering error method to global k-means algorithm to overcome the effect of bad initialization, proposed the global Minmax k-means algorithm. The proposed clustering method is tested on some popular data sets and compared to the k-means algorithm, the global k-means algorithm and the MinMax k-means algorithm. The experiment results show our proposed algorithm outperforms other algorithms mentioned in the paper.
NASA Technical Reports Server (NTRS)
Gregory, Kyle J.; Hill, Joanne E. (Editor); Black, J. Kevin; Baumgartner, Wayne H.; Jahoda, Keith
2016-01-01
A fundamental challenge in a spaceborne application of a gas-based Time Projection Chamber (TPC) for observation of X-ray polarization is handling the large amount of data collected. The TPC polarimeter described uses the APV-25 Application Specific Integrated Circuit (ASIC) to readout a strip detector. Two dimensional photoelectron track images are created with a time projection technique and used to determine the polarization of the incident X-rays. The detector produces a 128x30 pixel image per photon interaction with each pixel registering 12 bits of collected charge. This creates challenging requirements for data storage and downlink bandwidth with only a modest incidence of photons and can have a significant impact on the overall mission cost. An approach is described for locating and isolating the photoelectron track within the detector image, yielding a much smaller data product, typically between 8x8 pixels and 20x20 pixels. This approach is implemented using a Microsemi RT-ProASIC3-3000 Field-Programmable Gate Array (FPGA), clocked at 20 MHz and utilizing 10.7k logic gates (14% of FPGA), 20 Block RAMs (17% of FPGA), and no external RAM. Results will be presented, demonstrating successful photoelectron track cluster detection with minimal impact to detector dead-time.
Join-Graph Propagation Algorithms
Mateescu, Robert; Kask, Kalev; Gogate, Vibhav; Dechter, Rina
2010-01-01
The paper investigates parameterized approximate message-passing schemes that are based on bounded inference and are inspired by Pearl's belief propagation algorithm (BP). We start with the bounded inference mini-clustering algorithm and then move to the iterative scheme called Iterative Join-Graph Propagation (IJGP), that combines both iteration and bounded inference. Algorithm IJGP belongs to the class of Generalized Belief Propagation algorithms, a framework that allowed connections with approximate algorithms from statistical physics and is shown empirically to surpass the performance of mini-clustering and belief propagation, as well as a number of other state-of-the-art algorithms on several classes of networks. We also provide insight into the accuracy of iterative BP and IJGP by relating these algorithms to well known classes of constraint propagation schemes. PMID:20740057
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively.
Dynamic Trajectory Extraction from Stereo Vision Using Fuzzy Clustering
NASA Astrophysics Data System (ADS)
Onishi, Masaki; Yoda, Ikushi
In recent years, many human tracking researches have been proposed in order to analyze human dynamic trajectory. These researches are general technology applicable to various fields, such as customer purchase analysis in a shopping environment and safety control in a (railroad) crossing. In this paper, we present a new approach for tracking human positions by stereo image. We use the framework of two-stepped clustering with k-means method and fuzzy clustering to detect human regions. In the initial clustering, k-means method makes middle clusters from objective features extracted by stereo vision at high speed. In the last clustering, c-means fuzzy method cluster middle clusters based on attributes into human regions. Our proposed method can be correctly clustered by expressing ambiguity using fuzzy clustering, even when many people are close to each other. The validity of our technique was evaluated with the experiment of trajectories extraction of doctors and nurses in an emergency room of a hospital.
Hierarchical clustering in minimum spanning trees.
Yu, Meichen; Hillebrand, Arjan; Tewarie, Prejaas; Meier, Jil; van Dijk, Bob; Van Mieghem, Piet; Stam, Cornelis Jan
2015-02-01
The identification of clusters or communities in complex networks is a reappearing problem. The minimum spanning tree (MST), the tree connecting all nodes with minimum total weight, is regarded as an important transport backbone of the original weighted graph. We hypothesize that the clustering of the MST reveals insight in the hierarchical structure of weighted graphs. However, existing theories and algorithms have difficulties to define and identify clusters in trees. Here, we first define clustering in trees and then propose a tree agglomerative hierarchical clustering (TAHC) method for the detection of clusters in MSTs. We then demonstrate that the TAHC method can detect clusters in artificial trees, and also in MSTs of weighted social networks, for which the clusters are in agreement with the previously reported clusters of the original weighted networks. Our results therefore not only indicate that clusters can be found in MSTs, but also that the MSTs contain information about the underlying clusters of the original weighted network.
Hierarchical clustering in minimum spanning trees
NASA Astrophysics Data System (ADS)
Yu, Meichen; Hillebrand, Arjan; Tewarie, Prejaas; Meier, Jil; van Dijk, Bob; Van Mieghem, Piet; Stam, Cornelis Jan
2015-02-01
The identification of clusters or communities in complex networks is a reappearing problem. The minimum spanning tree (MST), the tree connecting all nodes with minimum total weight, is regarded as an important transport backbone of the original weighted graph. We hypothesize that the clustering of the MST reveals insight in the hierarchical structure of weighted graphs. However, existing theories and algorithms have difficulties to define and identify clusters in trees. Here, we first define clustering in trees and then propose a tree agglomerative hierarchical clustering (TAHC) method for the detection of clusters in MSTs. We then demonstrate that the TAHC method can detect clusters in artificial trees, and also in MSTs of weighted social networks, for which the clusters are in agreement with the previously reported clusters of the original weighted networks. Our results therefore not only indicate that clusters can be found in MSTs, but also that the MSTs contain information about the underlying clusters of the original weighted network.
A GMBCG galaxy cluster catalog of 55,880 rich clusters from SDSS DR7
Hao, Jiangang; McKay, Timothy A.; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Annis, James; Wechsler, Risa H.; Evrard, August; Siegel, Seth R.; Becker, Matthew; Busha, Michael; /Fermilab /Michigan U. /Chicago U., Astron. Astrophys. Ctr. /UC, Santa Barbara /KICP, Chicago /KIPAC, Menlo Park /SLAC /Caltech /Brookhaven
2010-08-01
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
A GMBCG Galaxy Cluster Catalog of 55,424 Rich Clusters from SDSS DR7
Hao, Jiangang; McKay, Timothy A.; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Annis, James; Wechsler, Risa H.; Evrard, August; Siegel, Seth R.; Becker, Matthew; Busha, Michael; Gerdes, David; Johnston, David E.; Sheldon, Erin; /Brookhaven
2011-08-22
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
NASA Astrophysics Data System (ADS)
Martín-Herrero, J.
2004-10-01
I present a hybrid method for the labelling of clusters in two-dimensional lattices, which combines the recursive approach with iterative scanning to reduce the stack size required by the pure recursive technique, while keeping its benefits: single pass and straightforward cluster characterization and percolation detection parallel to the labelling. While the capacity to hold the entire lattice in memory is usually regarded as the major constraint for the applicability of the recursive technique, the required stack size is the real limiting factor. Resorting to recursion only for the transverse direction greatly reduces the recursion depth and therefore the required stack. It also enhances the overall performance of the recursive technique, as is shown by results on a set of uniform random binary lattices and on a set of samples of the Ising model. I also show how this technique may replace the recursive technique in Wolff's cluster algorithm, decreasing the risk of stack overflow and increasing its speed, and the Hoshen-Kopelman algorithm in the Swendsen-Wang cluster algorithm, allowing effortless characterization during generation of the samples and increasing its speed.
Fully Automated Complementary DNA Microarray Segmentation using a Novel Fuzzy-based Algorithm.
Saberkari, Hamidreza; Bahrami, Sheyda; Shamsi, Mousa; Amoshahy, Mohammad Javad; Ghavifekr, Habib Badri; Sedaaghi, Mohammad Hossein
2015-01-01
DNA microarray is a powerful approach to study simultaneously, the expression of 1000 of genes in a single experiment. The average value of the fluorescent intensity could be calculated in a microarray experiment. The calculated intensity values are very close in amount to the levels of expression of a particular gene. However, determining the appropriate position of every spot in microarray images is a main challenge, which leads to the accurate classification of normal and abnormal (cancer) cells. In this paper, first a preprocessing approach is performed to eliminate the noise and artifacts available in microarray cells using the nonlinear anisotropic diffusion filtering method. Then, the coordinate center of each spot is positioned utilizing the mathematical morphology operations. Finally, the position of each spot is exactly determined through applying a novel hybrid model based on the principle component analysis and the spatial fuzzy c-means clustering (SFCM) algorithm. Using a Gaussian kernel in SFCM algorithm will lead to improving the quality in complementary DNA microarray segmentation. The performance of the proposed algorithm has been evaluated on the real microarray images, which is available in Stanford Microarray Databases. Results illustrate that the accuracy of microarray cells segmentation in the proposed algorithm reaches to 100% and 98% for noiseless/noisy cells, respectively.
Fully Automated Complementary DNA Microarray Segmentation using a Novel Fuzzy-based Algorithm
Saberkari, Hamidreza; Bahrami, Sheyda; Shamsi, Mousa; Amoshahy, Mohammad Javad; Ghavifekr, Habib Badri; Sedaaghi, Mohammad Hossein
2015-01-01
DNA microarray is a powerful approach to study simultaneously, the expression of 1000 of genes in a single experiment. The average value of the fluorescent intensity could be calculated in a microarray experiment. The calculated intensity values are very close in amount to the levels of expression of a particular gene. However, determining the appropriate position of every spot in microarray images is a main challenge, which leads to the accurate classification of normal and abnormal (cancer) cells. In this paper, first a preprocessing approach is performed to eliminate the noise and artifacts available in microarray cells using the nonlinear anisotropic diffusion filtering method. Then, the coordinate center of each spot is positioned utilizing the mathematical morphology operations. Finally, the position of each spot is exactly determined through applying a novel hybrid model based on the principle component analysis and the spatial fuzzy c-means clustering (SFCM) algorithm. Using a Gaussian kernel in SFCM algorithm will lead to improving the quality in complementary DNA microarray segmentation. The performance of the proposed algorithm has been evaluated on the real microarray images, which is available in Stanford Microarray Databases. Results illustrate that the accuracy of microarray cells segmentation in the proposed algorithm reaches to 100% and 98% for noiseless/noisy cells, respectively. PMID:26284175
Absolute classification with unsupervised clustering
NASA Technical Reports Server (NTRS)
Jeon, Byeungwoo; Landgrebe, D. A.
1992-01-01
An absolute classification algorithm is proposed in which the class definition through training samples or otherwise is required only for a particular class of interest. The absolute classification is considered as a problem of unsupervised clustering when one cluster is known initially. The definitions and statistics of the other classes are automatically developed through the weighted unsupervised clustering procedure, which is developed to keep the cluster corresponding to the class of interest from losing its identity as the class of interest. Once all the classes are developed, a conventional relative classifier such as the maximum-likelihood classifier is used in the classification.
Histamine headache; Headache - histamine; Migrainous neuralgia; Headache - cluster; Horton's headache; Vascular headache - cluster ... be related to the body's sudden release of histamine (chemical in the body released during an allergic ...
Ranking inter-relationships between clusters
NASA Astrophysics Data System (ADS)
Wang, Tingting; Chen, Feng; Phoebe Chen, Yi-Ping
2011-12-01
The evaluation of the relationships between clusters is important to identify vital unknown information in many real-life applications, such as in the fields of crime detection, evolution trees, metallurgical industry and biology engraftment. This article proposes a method called 'mode pattern + mutual information' to rank the inter-relationship between clusters. The idea of the mode pattern is used to find outstanding objects from each cluster, and the mutual information criterion measures the close proximity of a pair of clusters. Our approach is different from the conventional algorithms of classifying and clustering, because our focus is not to classify objects into different clusters, but instead, we aim to rank the inter-relationship between clusters when the clusters are given. We conducted experiments on a wide range of real-life datasets, including image data and cancer diagnosis data. The experimental results show that our algorithm is effective and promising.
Sanfilippo, Antonio P.; Calapristi, Augustin J.; Crow, Vernon L.; Hetzler, Elizabeth G.; Turner, Alan E.
2004-05-26
We present an approach to the disambiguation of cluster labels that capitalizes on the notion of semantic similarity to assign WordNet senses to cluster labels. The approach provides interesting insights on how document clustering can provide the basis for developing a novel approach to word sense disambiguation.
Splitting Methods for Convex Clustering
Chi, Eric C.; Lange, Kenneth
2016-01-01
Clustering is a fundamental problem in many scientific applications. Standard methods such as k-means, Gaussian mixture models, and hierarchical clustering, however, are beset by local minima, which are sometimes drastically suboptimal. Recently introduced convex relaxations of k-means and hierarchical clustering shrink cluster centroids toward one another and ensure a unique global minimizer. In this work we present two splitting methods for solving the convex clustering problem. The first is an instance of the alternating direction method of multipliers (ADMM); the second is an instance of the alternating minimization algorithm (AMA). In contrast to previously considered algorithms, our ADMM and AMA formulations provide simple and unified frameworks for solving the convex clustering problem under the previously studied norms and open the door to potentially novel norms. We demonstrate the performance of our algorithm on both simulated and real data examples. While the differences between the two algorithms appear to be minor on the surface, complexity analysis and numerical experiments show AMA to be significantly more efficient. This article has supplemental materials available online. PMID:27087770
The Environmental Technology Innovation Clusters Program advises cluster organizations, encourages collaboration between clusters, tracks U.S. environmental technology clusters, and connects EPA programs to cluster needs.
Structural transitions in clusters.
Hartke, Bernd
2002-05-03
If one adds more particles to a cluster, the energetically optimal structure is neither preserved nor does it change in a continuous fashion. Instead, one finds several cluster size regions where one structural principle dominates almost without exception, and rather narrow boundary regions in-between. The structure of the solid is usually reached only at relatively large sizes, after more than one structural transition. The occurrence of this general phenomenon of size-dependent structural transitions does not seem to depend on the nature of the particles, it is found for atomic, molecular, homogeneous, and heterogeneous clusters alike. Clearly, it is a collective many-body phenomenon which can in principle be calculated but not understood in a fully reductionistic manner. Actual calculations with sufficient accuracy are not feasible today, because of the enormous computational expense, even when unconventional evolutionary algorithms are employed for global geometry optimization. Therefore, simple rules for cluster structures are highly desirable. In fact, we are dealing here not just with the academic quest for linkages between cluster structure and features of the potential energy surface, but structural transitions in clusters are also of immediate relevance for many natural and industrial processes, ranging from crystal growth all the way to nanotechnology. This article provides an exemplary overview of research on this topic, from simple model systems where first qualitative explanations start to be successful, up to more realistic complex systems which are still beyond our understanding.
Clustering Millions of Faces by Identity.
Otto, Charles; Wang, Dayong; Jain, Anil
2017-03-07
Given a large collection of unlabeled face images, we address the problem of clustering faces into an unknown number of identities. This problem is of interest in social media, law enforcement, and other applications, where the number of faces can be of the order of hundreds of million, while the number of identities (clusters) can range from a few thousand to millions. To address the challenges of run-time complexity and cluster quality, we present an approximate Rank-Order clustering algorithm that performs better than popular clustering algorithms (k-Means and Spectral). Our experiments include clustering up to 123 million face images into over 10 million clusters. Clustering results are analyzed in terms of external (known face labels) and internal (unknown face labels) quality measures, and run-time. Our algorithm achieves an F-measure of 0:87 on the LFW benchmark (13K faces of 5; 749 individuals), which drops to 0:27 on the largest dataset considered (13K faces in LFW + 123M distractor images). Additionally, we show that frames in the YouTube benchmark can be clustered with an F-measure of 0:71. An internal per-cluster quality measure is developed to rank individual clusters for manual exploration of high quality clusters that are compact and isolated.
ConsensusCluster: a software tool for unsupervised cluster discovery in numerical data.
Seiler, Michael; Huang, C Chris; Szalma, Sandor; Bhanot, Gyan
2010-02-01
We have created a stand-alone software tool, ConsensusCluster, for the analysis of high-dimensional single nucleotide polymorphism (SNP) and gene expression microarray data. Our software implements the consensus clustering algorithm and principal component analysis to stratify the data into a given number of robust clusters. The robustness is achieved by combining clustering results from data and sample resampling as well as by averaging over various algorithms and parameter settings to achieve accurate, stable clustering results. We have implemented several different clustering algorithms in the software, including K-Means, Partition Around Medoids, Self-Organizing Map, and Hierarchical clustering methods. After clustering the data, ConsensusCluster generates a consensus matrix heatmap to give a useful visual representation of cluster membership, and automatically generates a log of selected features that distinguish each pair of clusters. ConsensusCluster gives more robust and more reliable clusters than common software packages and, therefore, is a powerful unsupervised learning tool that finds hidden patterns in data that might shed light on its biological interpretation. This software is free and available from http://code.google.com/p/consensus-cluster .
Discriminative clustering via extreme learning machine.
Huang, Gao; Liu, Tianchi; Yang, Yan; Lin, Zhiping; Song, Shiji; Wu, Cheng
2015-10-01
Discriminative clustering is an unsupervised learning framework which introduces the discriminative learning rule of supervised classification into clustering. The underlying assumption is that a good partition (clustering) of the data should yield high discrimination, namely, the partitioned data can be easily classified by some classification algorithms. In this paper, we propose three discriminative clustering approaches based on Extreme Learning Machine (ELM). The first algorithm iteratively trains weighted ELM (W-ELM) classifier to gradually maximize the data discrimination. The second and third methods are both built on Fisher's Linear Discriminant Analysis (LDA); but one approach adopts alternative optimization, while the other leverages kernel k-means. We show that the proposed algorithms can be easily implemented, and yield competitive clustering accuracy on real world data sets compared to state-of-the-art clustering methods.
Clustering granulometric features
NASA Astrophysics Data System (ADS)
Brun, Marcel; Balagurunathan, Yoganand; Barrera, Junior; Dougherty, Edward R.
2002-05-01
Granulometric features have been widely used for classification, segmentation and recently in estimation of parameters in shape models. In this paper we study the inference of clustering based on granulometric features for a collection of structuring probes in the context of random models. We use random Boolean models to represent grains of different shapes and structure. It is known that granulometric features are excellent descriptors of shape and structure of grains. Inference based on clustering these features helps to analyze the consistency of these features and clustering algorithms. This greatly aids in classifier design and feature selection. Features and the order of their addition play a role in reducing the inference errors. We study four different types of feature addition methods and the effect of replication in reducing the inference errors.
Image segmentation using fuzzy LVQ clustering networks
NASA Technical Reports Server (NTRS)
Tsao, Eric Chen-Kuo; Bezdek, James C.; Pal, Nikhil R.
1992-01-01
In this note we formulate image segmentation as a clustering problem. Feature vectors extracted from a raw image are clustered into subregions, thereby segmenting the image. A fuzzy generalization of a Kohonen learning vector quantization (LVQ) which integrates the Fuzzy c-Means (FCM) model with the learning rate and updating strategies of the LVQ is used for this task. This network, which segments images in an unsupervised manner, is thus related to the FCM optimization problem. Numerical examples on photographic and magnetic resonance images are given to illustrate this approach to image segmentation.
Partially supervised speaker clustering.
Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S
2012-05-01
model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.
Clustering gene expression data using a diffraction‐inspired framework
2012-01-01
Background The recent developments in microarray technology has allowed for the simultaneous measurement of gene expression levels. The large amount of captured data challenges conventional statistical tools for analysing and finding inherent correlations between genes and samples. The unsupervised clustering approach is often used, resulting in the development of a wide variety of algorithms. Typical clustering algorithms require selecting certain parameters to operate, for instance the number of expected clusters, as well as defining a similarity measure to quantify the distance between data points. The diffraction‐based clustering algorithm however is designed to overcome this necessity for user‐defined parameters, as it is able to automatically search the data for any underlying structure. Methods The diffraction‐based clustering algorithm presented in this paper is tested using five well‐known expression datasets pertaining to cancerous tissue samples. The clustering results are then compared to those results obtained from conventional algorithms such as the k‐means, fuzzy c‐means, self‐organising map, hierarchical clustering algorithm, Gaussian mixture model and density‐based spatial clustering of applications with noise (DBSCAN). The performance of each algorithm is measured using an average external criterion and an average validity index. Results The diffraction‐based clustering algorithm is shown to be independent of the number of clusters as the algorithm searches the feature space and requires no form of parameter selection. The results show that the diffraction‐based clustering algorithm performs significantly better on the real biological datasets compared to the other existing algorithms. Conclusion The results of the diffraction‐based clustering algorithm presented in this paper suggest that the method can provide researchers with a new tool for successfully analysing microarray data. PMID:23164195
A compilation of jet finding algorithms
Flaugher, B.; Meier, K.
1992-12-31
Technical descriptions of jet finding algorithms currently in use in p{anti p} collider experiments (CDF, UA1, UA2), e{sup +}e{sup {minus}} experiments and Monte-Carlo event generators (LUND programs, ISAJET) have been collected. For the hadron collider experiments, the clustering methods fall into two categories: cone algorithms and nearest-neighbor algorithms. In addition, UA2 has employed a combination of both methods for some analysis. While there are clearly differences between the cone and nearest-neighbor algorithms, the authors have found that there are also differences among the cone algorithms in the details of how the centroid of a cone cluster is located and how the E{sub T} and P{sub T} of the jet are defined. The most commonly used jet algorithm in electron-positron experiments is the JADE-type cluster algorithm. Five various incarnations of this approach have been described.
Analyzing geographic clustered response
Merrill, D.W.; Selvin, S.; Mohr, M.S.
1991-08-01
In the study of geographic disease clusters, an alternative to traditional methods based on rates is to analyze case locations on a transformed map in which population density is everywhere equal. Although the analyst's task is thereby simplified, the specification of the density equalizing map projection (DEMP) itself is not simple and continues to be the subject of considerable research. Here a new DEMP algorithm is described, which avoids some of the difficulties of earlier approaches. The new algorithm (a) avoids illegal overlapping of transformed polygons; (b) finds the unique solution that minimizes map distortion; (c) provides constant magnification over each map polygon; (d) defines a continuous transformation over the entire map domain; (e) defines an inverse transformation; (f) can accept optional constraints such as fixed boundaries; and (g) can use commercially supported minimization software. Work is continuing to improve computing efficiency and improve the algorithm. 21 refs., 15 figs., 2 tabs.
Segmentation of color images based on the gravitational clustering concept
NASA Astrophysics Data System (ADS)
Lai, Andrew H.; Yung, H. C.
1998-03-01
A new clustering algorithm derived from the Markovian model of the gravitational clustering concept is proposed that works in the RGB measurement space for color image. To enable the model to be applicable in image segmentation, the new algorithm imposes a clustering constraint at each clustering iteration to control and determine the formation of multiple clusters. Using such constraint to limit the attraction between clusters, a termination condition can be easily defined. The new clustering algorithm is evaluated objectively and subjectively on three different images against the K-means clustering algorithm, the recursive histogram clustering algorithm for color, the Hedley-Yan algorithm, and the widely used seed-based region growing algorithm. From the evaluation, it is observed that the new algorithm exhibits the following characteristics: (1) its objective measurement figures are comparable with the best in this group of segmentation algorithms; (2) it generates smoother region boundaries; (3) the segmented boundaries align closely with the original boundaries; and (4) it forms a meaningful number of segmented regions.
Iterative Discovery of Multiple AlternativeClustering Views.
Donglin Niu; Dy, Jennifer G; Jordan, Michael I
2014-07-01
Complex data can be grouped and interpreted in many different ways. Most existing clustering algorithms, however, only find one clustering solution, and provide little guidance to data analysts who may not be satisfied with that single clustering and may wish to explore alternatives. We introduce a novel approach that provides several clustering solutions to the user for the purposes of exploratory data analysis. Our approach additionally captures the notion that alternative clusterings may reside in different subspaces (or views). We present an algorithm that simultaneously finds these subspaces and the corresponding clusterings. The algorithm is based on an optimization procedure that incorporates terms for cluster quality and novelty relative to previously discovered clustering solutions. We present a range of experiments that compare our approach to alternatives and explore the connections between simultaneous and iterative modes of discovery of multiple clusterings.
Geometric Clustering and its Applications
2013-10-31
limited theoretical treatment exists. As a by- product , we present the first R-tree based algorithm for Rfn. Our algorithms do not assume that either P...student, James McClain. On a related subject, we made progress on accurate localization of RFID tags in three dimensions [7]. 4 Clustering on Road...Hekimian-Williams, B. Grant, Xiuwen Liu, Zhenghao Zhang, and P. Kumar. Accurate localization of rfid tags using phase difference. In RFID , 2010 IEEE
A new auroral boundary determination algorithm based on observations from TIMED/GUVI and DMSP/SSUSI
NASA Astrophysics Data System (ADS)
Ding, Guang-Xing; He, Fei; Zhang, Xiao-Xin; Chen, Bo
2017-02-01
An automatic auroral boundary determination algorithm is proposed in this study based on the partial auroral oval images from the Global Ultraviolet Imager (GUVI) aboard the Thermosphere-Ionosphere-Mesosphere Energetics and Dynamics satellite and the Special Sensor Ultraviolet Spectrographic Imager (SSUSI) aboard the Defense Meteorological Satellite Program (DMSP F16). This algorithm based on the fuzzy local information C-means clustering segmentation can be used to extract the auroral oval poleward and equatorward boundaries from merged images with filled gaps from both GUVI and SSUSI. Both extracted poleward and equatorward boundary locations are used to fit the global shape of the auroral oval with a off-center quasi-elliptical fitting technique. Comparison of the extracted auroral oval boundaries with those identified from the DMSP SSJ observations demonstrates that this new proposed algorithm can reliably be used to construct the global configuration of auroral ovals under different geomagnetic activities at different local times. The statistical errors of magnetic latitudes of the fitted auroral oval boundaries were generally less than 3° at 2 sigma and indicate that the the fitted boundaries agree better with b2e and b5e than b1e and b6 boundaries. This proposed algorithm provides us with a useful tool to extract the global shape and position of the auroral oval from the partial auroral images.
The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering
NASA Astrophysics Data System (ADS)
Schaefer, Andreas; Daniell, James; Wenzel, Friedemann
2016-04-01
Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in
NASA Technical Reports Server (NTRS)
Wang, Lui; Bayer, Steven E.
1991-01-01
Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.
Multimodal Estimation of Distribution Algorithms.
Yang, Qiang; Chen, Wei-Neng; Li, Yun; Chen, C L Philip; Xu, Xiang-Min; Zhang, Jun
2016-02-15
Taking the advantage of estimation of distribution algorithms (EDAs) in preserving high diversity, this paper proposes a multimodal EDA. Integrated with clustering strategies for crowding and speciation, two versions of this algorithm are developed, which operate at the niche level. Then these two algorithms are equipped with three distinctive techniques: 1) a dynamic cluster sizing strategy; 2) an alternative utilization of Gaussian and Cauchy distributions to generate offspring; and 3) an adaptive local search. The dynamic cluster sizing affords a potential balance between exploration and exploitation and reduces the sensitivity to the cluster size in the niching methods. Taking advantages of Gaussian and Cauchy distributions, we generate the offspring at the niche level through alternatively using these two distributions. Such utilization can also potentially offer a balance between exploration and exploitation. Further, solution accuracy is enhanced through a new local search scheme probabilistically conducted around seeds of niches with probabilities determined self-adaptively according to fitness values of these seeds. Extensive experiments conducted on 20 benchmark multimodal problems confirm that both algorithms can achieve competitive performance compared with several state-of-the-art multimodal algorithms, which is supported by nonparametric tests. Especially, the proposed algorithms are very promising for complex problems with many local optima.
Dhane, Dhiraj Manohar; Krishna, Vishal; Achar, Arun; Bar, Chittaranjan; Sanyal, Kunal; Chakraborty, Chandan
2016-09-01
Chronic lower extremity wound is a complicated disease condition of localized injury to skin and its tissues which have plagued many elders worldwide. The ulcer assessment and management is expensive and is burden on health establishment. Currently accurate wound evaluation remains a tedious task as it rely on visual inspection. This paper propose a new method for wound-area detection, using images digitally captured by a hand-held, optical camera. The strategy proposed involves spectral approach for clustering, based on the affinity matrix. The spectral clustering (SC) involves construction of similarity matrix of Laplacian based on Ng-Jorden-Weiss algorithm. Starting with a quadratic method, wound photographs were pre-processed for color homogenization. The first-order statistics filter was then applied to extract spurious regions. The filter was selected based on the performance, evaluated on four quality metrics. Then, the spectral method was used on the filtered images for effective segmentation. The segmented regions were post-processed using morphological operators. The performance of spectral segmentation was confirmed by ground-truth pictures labeled by dermatologists. The SC results were additionally compared with the results of k-means and Fuzzy C-Means (FCM) clustering algorithms. The SC approach on a set of 105 images, effectively delineated targeted wound beds yielding a segmentation accuracy of 86.73 %, positive predictive values of 91.80 %, and a sensitivity of 89.54 %. This approach shows the robustness of tool for ulcer perimeter measurement and healing progression. The article elucidates its potential to be incorporated in patient facing medical systems targeting a rapid clinical assistance.
Feature Clustering for Accelerating Parallel Coordinate Descent
Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.
2012-12-06
We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.
Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming.
Wang, Haizhou; Song, Mingzhou
2011-12-01
The heuristic k-means algorithm, widely used for cluster analysis, does not guarantee optimality. We developed a dynamic programming algorithm for optimal one-dimensional clustering. The algorithm is implemented as an R package called Ckmeans.1d.dp. We demonstrate its advantage in optimality and runtime over the standard iterative k-means algorithm.
A local search for a graph clustering problem
NASA Astrophysics Data System (ADS)
Navrotskaya, Anna; Il'ev, Victor
2016-10-01
In the clustering problems one has to partition a given set of objects (a data set) into some subsets (called clusters) taking into consideration only similarity of the objects. One of most visual formalizations of clustering is graph clustering, that is grouping the vertices of a graph into clusters taking into consideration the edge structure of the graph whose vertices are objects and edges represent similarities between the objects. In the graph k-clustering problem the number of clusters does not exceed k and the goal is to minimize the number of edges between clusters and the number of missing edges within clusters. This problem is NP-hard for any k ≥ 2. We propose a polynomial time (2k-1)-approximation algorithm for graph k-clustering. Then we apply a local search procedure to the feasible solution found by this algorithm and hold experimental research of obtained heuristics.
NASA Technical Reports Server (NTRS)
1999-01-01
Penetrating 25,000 light-years of obscuring dust and myriad stars, NASA's Hubble Space Telescope has provided the clearest view yet of one of the largest young clusters of stars inside our Milky Way galaxy, located less than 100 light-years from the very center of the Galaxy. Having the equivalent mass greater than 10,000 stars like our sun, the monster cluster is ten times larger than typical young star clusters scattered throughout our Milky Way. It is destined to be ripped apart in just a few million years by gravitational tidal forces in the galaxy's core. But in its brief lifetime it shines more brightly than any other star cluster in the Galaxy. Quintuplet Cluster is 4 million years old. It has stars on the verge of blowing up as supernovae. It is the home of the brightest star seen in the galaxy, called the Pistol star. This image was taken in infrared light by Hubble's NICMOS camera in September 1997. The false colors correspond to infrared wavelengths. The galactic center stars are white, the red stars are enshrouded in dust or behind dust, and the blue stars are foreground stars between us and the Milky Way's center. The cluster is hidden from direct view behind black dust clouds in the constellation Sagittarius. If the cluster could be seen from earth it would appear to the naked eye as a 3rd magnitude star, 1/6th of a full moon's diameter apart.
Voting-based consensus clustering for combining multiple clusterings of chemical structures
2012-01-01
Background Although many consensus clustering methods have been successfully used for combining multiple classifiers in many areas such as machine learning, applied statistics, pattern recognition and bioinformatics, few consensus clustering methods have been applied for combining multiple clusterings of chemical structures. It is known that any individual clustering method will not always give the best results for all types of applications. So, in this paper, three voting and graph-based consensus clusterings were used for combining multiple clusterings of chemical structures to enhance the ability of separating biologically active molecules from inactive ones in each cluster. Results The cumulative voting-based aggregation algorithm (CVAA), cluster-based similarity partitioning algorithm (CSPA) and hyper-graph partitioning algorithm (HGPA) were examined. The F-measure and Quality Partition Index method (QPI) were used to evaluate the clusterings and the results were compared to the Ward’s clustering method. The MDL Drug Data Report (MDDR) dataset was used for experiments and was represented by two 2D fingerprints, ALOGP and ECFP_4. The performance of voting-based consensus clustering method outperformed the Ward’s method using F-measure and QPI method for both ALOGP and ECFP_4 fingerprints, while the graph-based consensus clustering methods outperformed the Ward’s method only for ALOGP using QPI. The Jaccard and Euclidean distance measures were the methods of choice to generate the ensembles, which give the highest values for both criteria. Conclusions The results of the experiments show that consensus clustering methods can improve the effectiveness of chemical structures clusterings. The cumulative voting-based aggregation algorithm (CVAA) was the method of choice among consensus clustering methods. PMID:23244782
SMART: Unique Splitting-While-Merging Framework for Gene Clustering
Fa, Rui; Roberts, David J.; Nandi, Asoke K.
2014-01-01
Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms. PMID:24714159
NASA Astrophysics Data System (ADS)
Krick, Kessica
This proposal is a specific response to the strategic goal of NASA's research program to "discover how the universe works and explore how the universe evolved into its present form." Towards this goal, we propose to mine the Spitzer archive for all observations of galaxy groups and clusters for the purpose of studying galaxy evolution in clusters, contamination rates for Sunyaev Zeldovich cluster surveys, and to provide a database of Spitzer observed clusters to the broader community. Funding from this proposal will go towards two years of support for a Postdoc to do this work. After searching the Spitzer Heritage Archive, we have found 194 unique galaxy groups and clusters that have data from both the Infrared array camera (IRAC; Fazio et al. 2004) at 3.6 - 8 microns and the multiband imaging photometer for Spitzer (MIPS; Rieke et al. 2004) at 24microns. This large sample will add value beyond the individual datasets because it will be a larger sample of IR clusters than ever before and will have sufficient diversity in mass, redshift, and dynamical state to allow us to differentiate amongst the effects of these cluster properties. An infrared sample is important because it is unaffected by dust extinction while at the same time is an excellent measure of both stellar mass (IRAC wavelengths) and star formation rate (MIPS wavelengths). Additionally, IRAC can be used to differentiate star forming galaxies (SFG) from active galactic nuclei (AGN), due to their different spectral shapes in this wavelength regime. Specifically, we intend to identify SFG and AGN in galaxy groups and clusters. Groups and clusters differ from the field because the galaxy densities are higher, there is a large potential well due mainly to the mass of the dark matter, and there is hot X-ray gas (the intracluster medium; ICM). We will examine the impact of these differences in environment on galaxy formation by comparing cluster properties of AGN and SFG to those in the field. Also, we will
Convalescing Cluster Configuration Using a Superlative Framework
Sabitha, R.; Karthik, S.
2015-01-01
Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks. PMID:26543895
Pattern Clustering Using a Swarm Intelligence Approach
NASA Astrophysics Data System (ADS)
Das, Swagatam; Abraham, Ajith
Clustering aims at representing large datasets by a fewer number of prototypes or clusters. It brings simplicity in modeling data and thus plays a central role in the process of knowledge discovery and data mining. Data mining tasks, in these days, require fast and accurate partitioning of huge datasets, which may come with a variety of attributes or features. This, in turn, imposes severe computational requirements on the relevant clustering techniques. A family of bio-inspired algorithms, well-known as Swarm Intelligence (SI) has recently emerged that meets these requirements and has successfully been applied to a number of real world clustering problems. This chapter explores the role of SI in clustering different kinds of datasets. It finally describes a new SI technique for partitioning a linearly non-separable dataset into an optimal number of clusters in the kernel- induced feature space. Computer simulations undertaken in this research have also been provided to demonstrate the effectiveness of the proposed algorithm.
NASA Astrophysics Data System (ADS)
Miller, Christopher J. Miller
2012-03-01
There are many examples of clustering in astronomy. Stars in our own galaxy are often seen as being gravitationally bound into tight globular or open clusters. The Solar System's Trojan asteroids cluster at the gravitational Langrangian in front of Jupiter’s orbit. On the largest of scales, we find gravitationally bound clusters of galaxies, the Virgo cluster (in the constellation of Virgo at a distance of ˜50 million light years) being a prime nearby example. The Virgo cluster subtends an angle of nearly 8◦ on the sky and is known to contain over a thousand member galaxies. Galaxy clusters play an important role in our understanding of theUniverse. Clusters exist at peaks in the three-dimensional large-scale matter density field. Their sky (2D) locations are easy to detect in astronomical imaging data and their mean galaxy redshifts (redshift is related to the third spatial dimension: distance) are often better (spectroscopically) and cheaper (photometrically) when compared with the entire galaxy population in large sky surveys. Photometric redshift (z) [Photometric techniques use the broad band filter magnitudes of a galaxy to estimate the redshift. Spectroscopic techniques use the galaxy spectra and emission/absorption line features to measure the redshift] determinations of galaxies within clusters are accurate to better than delta_z = 0.05 [7] and when studied as a cluster population, the central galaxies form a line in color-magnitude space (called the the E/S0 ridgeline and visible in Figure 16.3) that contains galaxies with similar stellar populations [15]. The shape of this E/S0 ridgeline enables astronomers to measure the cluster redshift to within delta_z = 0.01 [23]. The most accurate cluster redshift determinations come from spectroscopy of the member galaxies, where only a fraction of the members need to be spectroscopically observed [25,42] to get an accurate redshift to the whole system. If light traces mass in the Universe, then the locations
On evaluating clustering procedures for use in classification
NASA Technical Reports Server (NTRS)
Pore, M. D.; Moritz, T. E.; Register, D. T.; Yao, S. S.; Eppler, W. G. (Principal Investigator)
1979-01-01
The problem of evaluating clustering algorithms and their respective computer programs for use in a preprocessing step for classification is addressed. In clustering for classification the probability of correct classification is suggested as the ultimate measure of accuracy on training data. A means of implementing this criterion and a measure of cluster purity are discussed. Examples are given. A procedure for cluster labeling that is based on cluster purity and sample size is presented.
[Cluster analysis and its application].
Půlpán, Zdenĕk
2002-01-01
The study exploits knowledge-oriented and context-based modification of well-known algorithms of (fuzzy) clustering. The role of fuzzy sets is inherently inclined towards coping with linguistic domain knowledge also. We try hard to obtain from rich diverse data and knowledge new information about enviroment that is being explored.
DICON: interactive visual analysis of multidimensional clusters.
Cao, Nan; Gotz, David; Sun, Jimeng; Qu, Huamin
2011-12-01
Clustering as a fundamental data analysis technique has been widely used in many analytic applications. However, it is often difficult for users to understand and evaluate multidimensional clustering results, especially the quality of clusters and their semantics. For large and complex data, high-level statistical information about the clusters is often needed for users to evaluate cluster quality while a detailed display of multidimensional attributes of the data is necessary to understand the meaning of clusters. In this paper, we introduce DICON, an icon-based cluster visualization that embeds statistical information into a multi-attribute display to facilitate cluster interpretation, evaluation, and comparison. We design a treemap-like icon to represent a multidimensional cluster, and the quality of the cluster can be conveniently evaluated with the embedded statistical information. We further develop a novel layout algorithm which can generate similar icons for similar clusters, making comparisons of clusters easier. User interaction and clutter reduction are integrated into the system to help users more effectively analyze and refine clustering results for large datasets. We demonstrate the power of DICON through a user study and a case study in the healthcare domain. Our evaluation shows the benefits of the technique, especially in support of complex multidimensional cluster analysis.
Clustering for unsupervised fault diagnosis in nuclear turbine shut-down transients
NASA Astrophysics Data System (ADS)
Baraldi, Piero; Di Maio, Francesco; Rigamonti, Marco; Zio, Enrico; Seraoui, Redouane
2015-06-01
Empirical methods for fault diagnosis usually entail a process of supervised training based on a set of examples of signal evolutions "labeled" with the corresponding, known classes of fault. However, in practice, the signals collected during plant operation may be, very often, "unlabeled", i.e., the information on the corresponding type of occurred fault is not available. To cope with this practical situation, in this paper we develop a methodology for the identification of transient signals showing similar characteristics, under the conjecture that operational/faulty transient conditions of the same type lead to similar behavior in the measured signals evolution. The methodology is founded on a feature extraction procedure, which feeds a spectral clustering technique, embedding the unsupervised fuzzy C-means (FCM) algorithm, which evaluates the functional similarity among the different operational/faulty transients. A procedure for validating the plausibility of the obtained clusters is also propounded based on physical considerations. The methodology is applied to a real industrial case, on the basis of 148 shut-down transients of a Nuclear Power Plant (NPP) steam turbine.
Unbridled growth of spin-glass clusters
NASA Astrophysics Data System (ADS)
Kessler, David A.; Bretz, Michael
1990-03-01
We investigate the application of the recent cluster-based acceleration methods of Wolff and of Kandel et al. to the problem of simulating spin glasses. We find the techniques offer no improvement as the clusters generated by these algorithms are infinitely large or interact infinitely strongly, respectively. We comment on the reasons for this failure.
NASA Astrophysics Data System (ADS)
Labhardt, Lukas; Binggeli, Bruno
Star clusters are at the heart of astronomy, being key objects for our understanding of stellar evolution and galactic structure. Observations with the Hubble Space Telescope and other modern equipment have revealed fascinating new facts about these galactic building blocks. This book provides two comprehensive and up-to-date, pedagogically designed reviews on star clusters by two well-known experts in the field. Bruce Carney presents our current knowledge of the relative and absolute ages of globular clusters and the chemical history of our Galaxy. Bill Harris addresses globular clusters in external galaxies and their use as tracers of galaxy formation and cosmic distance indicators. The book is written for graduate students as well as professionals in astronomy and astrophysics.
ERIC Educational Resources Information Center
Pottawattamie County School System, Council Bluffs, IA.
The 15 occupational clusters (transportation, fine arts and humanities, communications and media, personal service occupations, construction, hospitality and recreation, health occupations, marine science occupations, consumer and homemaking-related occupations, agribusiness and natural resources, environment, public service, business and office…
Donchev, Todor I.; Petrov, Ivan G.
2011-05-31
Described herein is an apparatus and a method for producing atom clusters based on a gas discharge within a hollow cathode. The hollow cathode includes one or more walls. The one or more walls define a sputtering chamber within the hollow cathode and include a material to be sputtered. A hollow anode is positioned at an end of the sputtering chamber, and atom clusters are formed when a gas discharge is generated between the hollow anode and the hollow cathode.
Stream Clustering of Growing Objects
NASA Astrophysics Data System (ADS)
Siddiqui, Zaigham Faraz; Spiliopoulou, Myra
We study incremental clustering of objects that grow and accumulate over time. The objects come from a multi-table stream e.g. streams of
NASA Astrophysics Data System (ADS)
Bailly-Bechet, M.; Bradde, S.; Braunstein, A.; Flaxman, A.; Foini, L.; Zecchina, R.
2009-12-01
We propose a new method for obtaining hierarchical clustering based on the optimization of a cost function over trees of limited depth, and we derive a message-passing method that allows one to use it efficiently. The method and the associated algorithm can be interpreted as a natural interpolation between two well-known approaches, namely that of single linkage and the recently presented affinity propagation. We analyse using this general scheme three biological/medical structured data sets (human population based on genetic information, proteins based on sequences and verbal autopsies) and show that the interpolation technique provides new insight.
Clustering PPI data by combining FA and SHC method
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632
Clustering PPI data by combining FA and SHC method.
Lei, Xiujuan; Ying, Chao; Wu, Fang-Xiang; Xu, Jin
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorit