An improved fuzzy c-means clustering algorithm based on shadowed sets and PSO.
Zhang, Jian; Shen, Ling
2014-01-01
To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect. PMID:25477953
NASA Astrophysics Data System (ADS)
Wang, Deguang; Han, Baochang; Huang, Ming
Computer forensics is the technology of applying computer technology to access, investigate and analysis the evidence of computer crime. It mainly include the process of determine and obtain digital evidence, analyze and take data, file and submit result. And the data analysis is the key link of computer forensics. As the complexity of real data and the characteristics of fuzzy, evidence analysis has been difficult to obtain the desired results. This paper applies fuzzy c-means clustering algorithm based on particle swarm optimization (FCMP) in computer forensics, and it can be more satisfactory results.
Comments on "A robust fuzzy local information C-means clustering algorithm".
Celik, Turgay; Lee, Hwee Kuan
2013-03-01
In a recent paper, Krinidis and Chatzis proposed a variation of fuzzy c-means algorithm for image clustering. The local spatial and gray-level information are incorporated in a fuzzy way through an energy function. The local minimizers of the designed energy function to obtain the fuzzy membership of each pixel and cluster centers are proposed. In this paper, it is shown that the local minimizers of Krinidis and Chatzis to obtain the fuzzy membership and the cluster centers in an iterative manner are not exclusively solutions for true local minimizers of their designed energy function. Thus, the local minimizers of Krinidis and Chatzis do not converge to the correct local minima of the designed energy function not because of tackling to the local minima, but because of the design of energy function. PMID:23144036
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
ERIC Educational Resources Information Center
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
NASA Astrophysics Data System (ADS)
Zhao, Jianghong; Li, Deren; Wang, Yanmin
2008-12-01
Segmentation of Point cloud data is a key but difficult problem for architecture 3D reconstruction. Because compared to reverse engineering, there are more noise in ancient architecture point cloud data of edge because of mirror reflection and the traditional methods are hard that is not fuzzy in the preceding part of this paper, these methods can't embody the case of the points of borderline belonging two regions and it is difficult to satisfy demands of segmentation of ancient architecture point cloud data. Ancient architecture is mostly composed of columniation, plinth, arch, girder and tile on specifically order. Each of the component's surfaces is regular and smooth and belongingness of borderline points is very blurry. According to the character the author proposed a modified Fuzzy C-means clustering (MFCM) algorithm, which is used to add geometrical information during clustering. In addition this method improves belongingness constraints to avoid influence of noise on the result of segmentation. The algorithm is used in the project "Digital surveying of ancient architecture--- Forbidden City". Experiments show that the method is a good anti-noise, accuracy and adaptability and greater degree of human intervention is reduced. After segmentation internal point and point edge can be districted according membership of every point, so as to facilitate the follow-up to the surface feature extraction and model identification, and effective support for the three-dimensional model of the reconstruction of ancient buildings is provided.
NASA Astrophysics Data System (ADS)
Abdul-Nasir, Aimi Salihah; Mashor, Mohd Yusoff; Halim, Nurul Hazwani Abd; Mohamed, Zeehaida
2015-05-01
Malaria is a life-threatening parasitic infectious disease that corresponds for nearly one million deaths each year. Due to the requirement of prompt and accurate diagnosis of malaria, the current study has proposed an unsupervised pixel segmentation based on clustering algorithm in order to obtain the fully segmented red blood cells (RBCs) infected with malaria parasites based on the thin blood smear images of P. vivax species. In order to obtain the segmented infected cell, the malaria images are first enhanced by using modified global contrast stretching technique. Then, an unsupervised segmentation technique based on clustering algorithm has been applied on the intensity component of malaria image in order to segment the infected cell from its blood cells background. In this study, cascaded moving k-means (MKM) and fuzzy c-means (FCM) clustering algorithms has been proposed for malaria slide image segmentation. After that, median filter algorithm has been applied to smooth the image as well as to remove any unwanted regions such as small background pixels from the image. Finally, seeded region growing area extraction algorithm has been applied in order to remove large unwanted regions that are still appeared on the image due to their size in which cannot be cleaned by using median filter. The effectiveness of the proposed cascaded MKM and FCM clustering algorithms has been analyzed qualitatively and quantitatively by comparing the proposed cascaded clustering algorithm with MKM and FCM clustering algorithms. Overall, the results indicate that segmentation using the proposed cascaded clustering algorithm has produced the best segmentation performances by achieving acceptable sensitivity as well as high specificity and accuracy values compared to the segmentation results provided by MKM and FCM algorithms.
Ma, Li; Li, Yang; Fan, Suohai; Fan, Runzhu
2015-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA). The proposed algorithm combines artificial fish swarm algorithm (AFSA) with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI) are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM). PMID:26649068
Ji, Ze-Xuan; Sun, Quan-Sen; Xia, De-Shen
2011-07-01
A modified possibilistic fuzzy c-means clustering algorithm is presented for fuzzy segmentation of magnetic resonance (MR) images that have been corrupted by intensity inhomogeneities and noise. By introducing a novel adaptive method to compute the weights of local spatial in the objective function, the new adaptive fuzzy clustering algorithm is capable of utilizing local contextual information to impose local spatial continuity, thus allowing the suppression of noise and helping to resolve classification ambiguity. To estimate the intensity inhomogeneity, the global intensity is introduced into the coherent local intensity clustering algorithm and takes the local and global intensity information into account. The segmentation target therefore is driven by two forces to smooth the derived optimal bias field and improve the accuracy of the segmentation task. The proposed method has been successfully applied to 3 T, 7 T, synthetic and real MR images with desirable results. Comparisons with other approaches demonstrate the superior performance of the proposed algorithm. Moreover, the proposed algorithm is robust to initialization, thereby allowing fully automatic applications. PMID:21256710
Ma, Li; Li, Yang; Fan, Suohai; Fan, Runzhu
2015-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA). The proposed algorithm combines artificial fish swarm algorithm (AFSA) with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI) are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM). PMID:26649068
NASA Astrophysics Data System (ADS)
Ayvaz, M. Tamer; Karahan, Halil; Aral, Mustafa M.
2007-09-01
SummaryIn this study, we propose an inverse solution algorithm through which both the aquifer parameters and the zone structure of these parameters can be determined based on a given set of observations on piezometric heads. In the zone structure identification problem, kernel-based fuzzy c-means (KFCM) clustering method is used. The association of the zone structure with the transmissivity distribution is accomplished through a coupled simulation-optimization model. In the optimization model, genetic algorithm (GA) is used due to its efficiency in finding global or near global optimum solutions. Since the solution is based on the GA procedures, the optimization process starts with a randomly generated initial solution. Thus, there is no need to define an initial estimate of the solution. This is an advantage when compared to other studies reported in the literature. Further, the objective function used in the optimization model does not include a reference to field transmissivity data, which is another advantage of the proposed methodology. Numerical examples are provided to demonstrate the performance of the proposed algorithm. In the first example, transmissivity values and zone structures are determined for a known number of zones in the solution domain. In the second example, optimum number of zones as well as the transmissivity values and the zone structures are determined iteratively. A sensitivity analysis is also performed to test the performance of the proposed solution algorithm based on the number of observation data necessary to solve the problem accurately. Numerical results indicate that the proposed algorithm is effective and efficient and may be used in the inverse parameter estimation problems when both parameter values and zone structure are unknown.
Hard versus fuzzy c-means clustering for color quantization
NASA Astrophysics Data System (ADS)
Wen, Quan; Celebi, M. Emre
2011-12-01
Color quantization is an important operation with many applications in graphics and image processing. Most quantization methods are essentially based on data clustering algorithms. Recent studies have demonstrated the effectiveness of hard c-means (k-means) clustering algorithm in this domain. Other studies reported similar findings pertaining to the fuzzy c-means algorithm. Interestingly, none of these studies directly compared the two types of c-means algorithms. In this study, we implement fast and exact variants of the hard and fuzzy c-means algorithms with several initialization schemes and then compare the resulting quantizers on a diverse set of images. The results demonstrate that fuzzy c-means is significantly slower than hard c-means, and that with respect to output quality, the former algorithm is neither objectively nor subjectively superior to the latter.
NASA Astrophysics Data System (ADS)
Yu, Xuelian; Chen, Qian; Gu, Guohua; Qian, Weixian; Xu, Mengxi
2014-11-01
The integration between polarization and intensity images possessing complementary and discriminative information has emerged as a new and important research area. On the basis of the consideration that the resulting image has different clarity and layering requirement for the target and background, we propose a novel fusion method based on non-subsampled Contourlet transform (NSCT) and fuzzy C-means (FCM) segmentation for IR polarization and light intensity images. First, the polarization characteristic image is derived from fusion of the degree of polarization (DOP) and the angle of polarization (AOP) images using local standard variation and abrupt change degree (ACD) combined criteria. Then, the polarization characteristic image is segmented with FCM algorithm. Meanwhile, the two source images are respectively decomposed by NSCT. The regional energy-weighted and similarity measure are adopted to combine the low-frequency sub-band coefficients of the object. The high-frequency sub-band coefficients of the object boundaries are integrated through the maximum selection rule. In addition, the high-frequency sub-band coefficients of internal objects are integrated by utilizing local variation, matching measure and region feature weighting. The weighted average and maximum rules are employed independently in fusing the low-frequency and high-frequency components of the background. Finally, an inverse NSCT operation is accomplished and the final fused image is obtained. The experimental results illustrate that the proposed IR polarization image fusion algorithm can yield an improved performance in terms of the contrast between artificial target and cluttered background and a more detailed representation of the depicted scene.
NASA Astrophysics Data System (ADS)
Ayvaz, M. Tamer
2007-11-01
This study proposes an inverse solution algorithm through which both the aquifer parameters and the zone structure of these parameters can be determined based on a given set of observations on piezometric heads. In the zone structure identification problem fuzzy c-means ( FCM) clustering method is used. The association of the zone structure with the transmissivity distribution is accomplished through an optimization model. The meta-heuristic harmony search ( HS) algorithm, which is conceptualized using the musical process of searching for a perfect state of harmony, is used as an optimization technique. The optimum parameter zone structure is identified based on three criteria which are the residual error, parameter uncertainty, and structure discrimination. A numerical example given in the literature is solved to demonstrate the performance of the proposed algorithm. Also, a sensitivity analysis is performed to test the performance of the HS algorithm for different sets of solution parameters. Results indicate that the proposed solution algorithm is an effective way in the simultaneous identification of aquifer parameters and their corresponding zone structures.
Efficient inhomogeneity compensation using fuzzy c-means clustering models.
Szilágyi, László; Szilágyi, Sándor M; Benyó, Balázs
2012-10-01
Intensity inhomogeneity or intensity non-uniformity (INU) is an undesired phenomenon that represents the main obstacle for magnetic resonance (MR) image segmentation and registration methods. Various techniques have been proposed to eliminate or compensate the INU, most of which are embedded into classification or clustering algorithms, they generally have difficulties when INU reaches high amplitudes and usually suffer from high computational load. This study reformulates the design of c-means clustering based INU compensation techniques by identifying and separating those globally working computationally costly operations that can be applied to gray intensity levels instead of individual pixels. The theoretical assumptions are demonstrated using the fuzzy c-means algorithm, but the proposed modification is compatible with a various range of c-means clustering based INU compensation and MR image segmentation algorithms. Experiments carried out using synthetic phantoms and real MR images indicate that the proposed approach produces practically the same segmentation accuracy as the conventional formulation, but 20-30 times faster. PMID:22405524
NASA Astrophysics Data System (ADS)
Schröter, Ingmar; Paasche, Hendik; Dietrich, Peter; Wollschläger, Ute
2014-05-01
Soil moisture is a key variable of the hydrological cycle. For example, it controls partitioning of rainfall into a runoff and an infiltration component and modulating physical, chemical and biological processes within the soil. For a better understanding of these processes, knowledge about the spatio-temporal distribution of soil moisture is indispensable. For the field to the small catchment scale with survey areas up to a few square kilometres, there are numerous new and innovative ground-based and remote sensing technologies available which have great potential to provide temporal information about soil moisture patterns. The aim of this work is to design an optimal soil moisture monitoring program for a low-mountain catchment in central Germany. In a first step, the fuzzy c-means clustering technique (Paasche et al., 2006) was used to identify structure-relevant patterns in a set of different terrain attributes derived from a DEM. Based on these patterns optimal measurement locations were identified to conduct in-situ soil moisture measurements. To consider different wetting and drying states in the catchment, several TDR measurement campaigns were conducted from April to October 2013. The TDR measurements have been integrated with the structure-relevant patterns obtained by the fuzzy cluster analysis to regionally predict soil moisture. In this study, we outline the conceptual framework of this integrative approach and present first results from field measurements. The results of the project are expected to improve the monitoring and understanding of small catchment-scale hydrological processes and to contribute to a better representation of soil moisture dynamics in physically-based, hydrological models operating at the field to the small catchment scale. Reference: Paasche, H., J. Tronicke, K. Holliger, A.G. Green, and H. Maurer (2006): Integration of diverse physical-property models: Subsurface zonation and petrophysical parameter estimation based on fuzzy c-means cluster analyses. Geophysics 71(3), H33-H44, doi:10.1190/1.2192927.
Generalized rough fuzzy c-means algorithm for brain MR image segmentation.
Ji, Zexuan; Sun, Quansen; Xia, Yong; Chen, Qiang; Xia, Deshen; Feng, Dagan
2012-11-01
Fuzzy sets and rough sets have been widely used in many clustering algorithms for medical image segmentation, and have recently been combined together to better deal with the uncertainty implied in observed image data. Despite of their wide spread applications, traditional hybrid approaches are sensitive to the empirical weighting parameters and random initialization, and hence may produce less accurate results. In this paper, a novel hybrid clustering approach, namely the generalized rough fuzzy c-means (GRFCM) algorithm is proposed for brain MR image segmentation. In this algorithm, each cluster is characterized by three automatically determined rough-fuzzy regions, and accordingly the membership of each pixel is estimated with respect to the region it locates. The importance of each region is balanced by a weighting parameter, and the bias field in MR images is modeled by a linear combination of orthogonal polynomials. The weighting parameter estimation and bias field correction have been incorporated into the iterative clustering process. Our algorithm has been compared to the existing rough c-means and hybrid clustering algorithms in both synthetic and clinical brain MR images. Experimental results demonstrate that the proposed algorithm is more robust to the initialization, noise, and bias field, and can produce more accurate and reliable segmentations. PMID:22088865
Zhang, Ji-fu; Li, Xin; Yang, Hai-feng
2012-05-01
Discretization of continuous numerical attribute is one of the important research works in the preprocessing of celestial spectrum data. For characteristic line of celestial spectrum, a soft discretization algorithm is presented by using improved fuzzy C-means clustering. Firstly, candidate fuzzy clustering centers of characteristic line are chosen by using density values of sample data, so that its anti-noise ability is improved. Secondly, parameters in the fuzzy clustering are dynamically adjusted by taking compatibility of decision table as criteria, so that optimal discretization effect of the characteristic line is achieved. In the end, experimental results effectively validate that the algorithm has higher correct recognition rate of the algorithm by using three SDSS celestial spectrum data sets of high-redshift quasars, late-type star and quasars. PMID:22827108
NASA Astrophysics Data System (ADS)
Akinin, M. V.; Akinina, N. V.; Klochkov, A. Y.; Nikiforov, M. B.; Sokolova, A. V.
2015-05-01
The report reviewed the algorithm fuzzy c-means, performs image segmentation, give an estimate of the quality of his work on the criterion of Xie-Beni, contain the results of experimental studies of the algorithm in the context of solving the problem of drawing up detailed two-dimensional maps with the use of unmanned aerial vehicles. According to the results of the experiment concluded that the possibility of applying the algorithm in problems of decoding images obtained as a result of aerial photography. The considered algorithm can significantly break the original image into a plurality of segments (clusters) in a relatively short period of time, which is achieved by modification of the original k-means algorithm to work in a fuzzy task.
Application of fuzzy c-means clustering in data analysis of metabolomics.
Li, Xiang; Lu, Xin; Tian, Jing; Gao, Peng; Kong, Hongwei; Xu, Guowang
2009-06-01
Fuzzy c-means (FCM) clustering is an unsupervised method derived from fuzzy logic that is suitable for solving multiclass and ambiguous clustering problems. In this study, FCM clustering is applied to cluster metabolomics data. FCM is performed directly on the data matrix to generate a membership matrix which represents the degree of association the samples have with each cluster. The method is parametrized with the number of clusters (C) and the fuzziness coefficient (m), which denotes the degree of fuzziness in the algorithm. Both have been optimized by combining FCM with partial least-squares (PLS) using the membership matrix as the Y matrix in the PLS model. The quality parameters R(2)Y and Q(2) of the PLS model have been used to monitor and optimize C and m. Data of metabolic profiles from three gene types of Escherichia coli were used to demonstrate the method above. Different multivariable analysis methods have been compared. Principal component analysis failed to model the metabolite data, while partial least-squares discriminant analysis yielded results with overfitting. On the basis of the optimized parameters, the FCM was able to reveal main phenotype changes and individual characters of three gene types of E. coli. Coupled with PLS, FCM provides a powerful research tool for metabolomics with improved visualization, accurate classification, and outlier estimation. PMID:19408956
Fuzzy c-means clustering with spatial information for image segmentation.
Chuang, Keh-Shih; Tzeng, Hong-Long; Chen, Sharon; Wu, Jay; Chen, Tzong-Jer
2006-01-01
A conventional FCM algorithm does not fully utilize the spatial information in the image. In this paper, we present a fuzzy c-means (FCM) algorithm that incorporates spatial information into the membership function for clustering. The spatial function is the summation of the membership function in the neighborhood of each pixel under consideration. The advantages of the new method are the following: (1) it yields regions more homogeneous than those of other methods, (2) it reduces the spurious blobs, (3) it removes noisy spots, and (4) it is less sensitive to noise than other techniques. This technique is a powerful method for noisy image segmentation and works for both single and multiple-feature data with spatial information. PMID:16361080
A wavelet relational fuzzy C-means algorithm for 2D gel image segmentation.
Rashwan, Shaheera; Faheem, Mohamed Talaat; Sarhan, Amany; Youssef, Bayumy A B
2013-01-01
One of the most famous algorithms that appeared in the area of image segmentation is the Fuzzy C-Means (FCM) algorithm. This algorithm has been used in many applications such as data analysis, pattern recognition, and image segmentation. It has the advantages of producing high quality segmentation compared to the other available algorithms. Many modifications have been made to the algorithm to improve its segmentation quality. The proposed segmentation algorithm in this paper is based on the Fuzzy C-Means algorithm adding the relational fuzzy notion and the wavelet transform to it so as to enhance its performance especially in the area of 2D gel images. Both proposed modifications aim to minimize the oversegmentation error incurred by previous algorithms. The experimental results of comparing both the Fuzzy C-Means (FCM) and the Wavelet Fuzzy C-Means (WFCM) to the proposed algorithm on real 2D gel images acquired from human leukemias, HL-60 cell lines, and fetal alcohol syndrome (FAS) demonstrate the improvement achieved by the proposed algorithm in overcoming the segmentation error. In addition, we investigate the effect of denoising on the three algorithms. This investigation proves that denoising the 2D gel image before segmentation can improve (in most of the cases) the quality of the segmentation. PMID:24174990
A Wavelet Relational Fuzzy C-Means Algorithm for 2D Gel Image Segmentation
Rashwan, Shaheera; Faheem, Mohamed Talaat; Sarhan, Amany; Youssef, Bayumy A. B.
2013-01-01
One of the most famous algorithms that appeared in the area of image segmentation is the Fuzzy C-Means (FCM) algorithm. This algorithm has been used in many applications such as data analysis, pattern recognition, and image segmentation. It has the advantages of producing high quality segmentation compared to the other available algorithms. Many modifications have been made to the algorithm to improve its segmentation quality. The proposed segmentation algorithm in this paper is based on the Fuzzy C-Means algorithm adding the relational fuzzy notion and the wavelet transform to it so as to enhance its performance especially in the area of 2D gel images. Both proposed modifications aim to minimize the oversegmentation error incurred by previous algorithms. The experimental results of comparing both the Fuzzy C-Means (FCM) and the Wavelet Fuzzy C-Means (WFCM) to the proposed algorithm on real 2D gel images acquired from human leukemias, HL-60 cell lines, and fetal alcohol syndrome (FAS) demonstrate the improvement achieved by the proposed algorithm in overcoming the segmentation error. In addition, we investigate the effect of denoising on the three algorithms. This investigation proves that denoising the 2D gel image before segmentation can improve (in most of the cases) the quality of the segmentation. PMID:24174990
Wang, Changmiao; Jia, Fucang; Wu, Jianhuang; Li, Guanglin
2015-01-01
An adaptively regularized kernel-based fuzzy C-means clustering framework is proposed for segmentation of brain magnetic resonance images. The framework can be in the form of three algorithms for the local average grayscale being replaced by the grayscale of the average filter, median filter, and devised weighted images, respectively. The algorithms employ the heterogeneity of grayscales in the neighborhood and exploit this measure for local contextual information and replace the standard Euclidean distance with Gaussian radial basis kernel functions. The main advantages are adaptiveness to local context, enhanced robustness to preserve image details, independence of clustering parameters, and decreased computational costs. The algorithms have been validated against both synthetic and clinical magnetic resonance images with different types and levels of noises and compared with 6 recent soft clustering algorithms. Experimental results show that the proposed algorithms are superior in preserving image details and segmentation accuracy while maintaining a low computational complexity. PMID:26793269
Elazab, Ahmed; Wang, Changmiao; Jia, Fucang; Wu, Jianhuang; Li, Guanglin; Hu, Qingmao
2015-01-01
An adaptively regularized kernel-based fuzzy C-means clustering framework is proposed for segmentation of brain magnetic resonance images. The framework can be in the form of three algorithms for the local average grayscale being replaced by the grayscale of the average filter, median filter, and devised weighted images, respectively. The algorithms employ the heterogeneity of grayscales in the neighborhood and exploit this measure for local contextual information and replace the standard Euclidean distance with Gaussian radial basis kernel functions. The main advantages are adaptiveness to local context, enhanced robustness to preserve image details, independence of clustering parameters, and decreased computational costs. The algorithms have been validated against both synthetic and clinical magnetic resonance images with different types and levels of noises and compared with 6 recent soft clustering algorithms. Experimental results show that the proposed algorithms are superior in preserving image details and segmentation accuracy while maintaining a low computational complexity. PMID:26793269
Parastar, Hadi; Bazrafshan, Alisina
2016-03-18
Fuzzy C-means clustering (FCM) is proposed as a promising method for the clustering of chromatographic fingerprints of complex samples, such as essential oils. As an example, secondary metabolites of 14 citrus leaves samples are extracted and analyzed by gas chromatography-mass spectrometry (GC-MS). The obtained chromatographic fingerprints are divided to desired number of chromatographic regions. Owing to the fact that chromatographic problems, such as elution time shift and peak overlap can significantly affect the clustering results, therefore, each chromatographic region is analyzed using multivariate curve resolution-alternating least squares (MCR-ALS) to address these problems. Then, the resolved elution profiles are used to make a new data matrix based on peak areas of pure components to cluster by FCM. The FCM clustering parameters (i.e., fuzziness coefficient and number of cluster) are optimized by two different methods of partial least squares (PLS) as a conventional method and minimization of FCM objective function as our new idea. The results showed that minimization of FCM objective function is an easier and better way to optimize FCM clustering parameters. Then, the optimized FCM clustering algorithm is used to cluster samples and variables to figure out the similarities and dissimilarities among samples and to find discriminant secondary metabolites in each cluster (chemotype). Finally, the FCM clustering results are compared with those of principal component analysis (PCA), hierarchical cluster analysis (HCA) and Kohonon maps. The results confirmed the outperformance of FCM over the frequently used clustering algorithms. PMID:26916594
Fuzzy C-means method with empirical mode decomposition for clustering microarray data.
Wang, Yan-Fei; Yu, Zu-Guo; Anh, Vo
2013-01-01
Microarray techniques have revolutionised genomic research by making it possible to monitor the expression of thousands of genes in parallel. The Fuzzy C-Means (FCM) method is an efficient clustering approach devised for microarray data analysis. However, microarray data contains noise, which would affect clustering results. In this paper, we propose to combine the FCM method with the Empirical Mode Decomposition (EMD) for clustering microarray data to reduce the effect of the noise. The results suggest the clustering structures of denoised microarray data are more reasonable and genes have tighter association with their clusters than those using FCM only. PMID:23777170
Automatic online spike sorting with singular value decomposition and fuzzy C-mean clustering
2012-01-01
Background Understanding how neurons contribute to perception, motor functions and cognition requires the reliable detection of spiking activity of individual neurons during a number of different experimental conditions. An important problem in computational neuroscience is thus to develop algorithms to automatically detect and sort the spiking activity of individual neurons from extracellular recordings. While many algorithms for spike sorting exist, the problem of accurate and fast online sorting still remains a challenging issue. Results Here we present a novel software tool, called FSPS (Fuzzy SPike Sorting), which is designed to optimize: (i) fast and accurate detection, (ii) offline sorting and (iii) online classification of neuronal spikes with very limited or null human intervention. The method is based on a combination of Singular Value Decomposition for fast and highly accurate pre-processing of spike shapes, unsupervised Fuzzy C-mean, high-resolution alignment of extracted spike waveforms, optimal selection of the number of features to retain, automatic identification the number of clusters, and quantitative quality assessment of resulting clusters independent on their size. After being trained on a short testing data stream, the method can reliably perform supervised online classification and monitoring of single neuron activity. The generalized procedure has been implemented in our FSPS spike sorting software (available free for non-commercial academic applications at the address: http://www.spikesorting.com) using LabVIEW (National Instruments, USA). We evaluated the performance of our algorithm both on benchmark simulated datasets with different levels of background noise and on real extracellular recordings from premotor cortex of Macaque monkeys. The results of these tests showed an excellent accuracy in discriminating low-amplitude and overlapping spikes under strong background noise. The performance of our method is competitive with respect to other robust spike sorting algorithms. Conclusions This new software provides neuroscience laboratories with a new tool for fast and robust online classification of single neuron activity. This feature could become crucial in situations when online spike detection from multiple electrodes is paramount, such as in human clinical recordings or in brain-computer interfaces. PMID:22871125
Tsantis, Stavros; Spiliopoulos, Stavros; Karnabatidis, Dimitrios; Skouroliakou, Aikaterini; Hazle, John D.; Kagadis, George C. E-mail: George.Kagadis@med.upatras.gr
2014-07-15
Purpose: Speckle suppression in ultrasound (US) images of various anatomic structures via a novel speckle noise reduction algorithm. Methods: The proposed algorithm employs an enhanced fuzzy c-means (EFCM) clustering and multiresolution wavelet analysis to distinguish edges from speckle noise in US images. The edge detection procedure involves a coarse-to-fine strategy with spatial and interscale constraints so as to classify wavelet local maxima distribution at different frequency bands. As an outcome, an edge map across scales is derived whereas the wavelet coefficients that correspond to speckle are suppressed in the inverse wavelet transform acquiring the denoised US image. Results: A total of 34 thyroid, liver, and breast US examinations were performed on a Logiq 9 US system. Each of these images was subjected to the proposed EFCM algorithm and, for comparison, to commercial speckle reduction imaging (SRI) software and another well-known denoising approach, Pizurica's method. The quantification of the speckle suppression performance in the selected set of US images was carried out via Speckle Suppression Index (SSI) with results of 0.61, 0.71, and 0.73 for EFCM, SRI, and Pizurica's methods, respectively. Peak signal-to-noise ratios of 35.12, 33.95, and 29.78 and edge preservation indices of 0.94, 0.93, and 0.86 were found for the EFCM, SIR, and Pizurica's method, respectively, demonstrating that the proposed method achieves superior speckle reduction performance and edge preservation properties. Based on two independent radiologists’ qualitative evaluation the proposed method significantly improved image characteristics over standard baseline B mode images, and those processed with the Pizurica's method. Furthermore, it yielded results similar to those for SRI for breast and thyroid images significantly better results than SRI for liver imaging, thus improving diagnostic accuracy in both superficial and in-depth structures. Conclusions: A new wavelet-based EFCM clustering model was introduced toward noise reduction and detail preservation. The proposed method improves the overall US image quality, which in turn could affect the decision-making on whether additional imaging and/or intervention is needed.
Image watermarking using a dynamically weighted fuzzy c-means algorithm
NASA Astrophysics Data System (ADS)
Kang, Myeongsu; Ho, Linh Tran; Kim, Yongmin; Kim, Cheol Hong; Kim, Jong-Myon
2011-10-01
Digital watermarking has received extensive attention as a new method of protecting multimedia content from unauthorized copying. In this paper, we present a nonblind watermarking system using a proposed dynamically weighted fuzzy c-means (DWFCM) technique combined with discrete wavelet transform (DWT), discrete cosine transform (DCT), and singular value decomposition (SVD) techniques for copyright protection. The proposed scheme efficiently selects blocks in which the watermark is embedded using new membership values of DWFCM as the embedding strength. We evaluated the proposed algorithm in terms of robustness against various watermarking attacks and imperceptibility compared to other algorithms [DWT-DCT-based and DCT- fuzzy c-means (FCM)-based algorithms]. Experimental results indicate that the proposed algorithm outperforms other algorithms in terms of robustness against several types of attacks, such as noise addition (Gaussian noise, salt and pepper noise), rotation, Gaussian low-pass filtering, mean filtering, median filtering, Gaussian blur, image sharpening, histogram equalization, and JPEG compression. In addition, the proposed algorithm achieves higher values of peak signal-to-noise ratio (approximately 49 dB) and lower values of measure-singular value decomposition (5.8 to 6.6) than other algorithms.
Self-organization and clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
NASA Astrophysics Data System (ADS)
Liu, Shuang; Hou, Biao; Jiao, Licheng; Zhang, Guofeng
2007-11-01
Synthetic Aperture Radar (SAR) images are inherently affected by multiplicative speckle noise, which is due to the coherent nature of the scattering phenomenon. Speckle noise of SAR affects image quality and image interpretation seriously. To alleviate deleterious effects of speckle, various ways have been devised to suppress it. An ideal algorithm should smooth the speckle without blurring edges and fine details. But most classical algorithms cannot satisfy these two demands very well. Due to the property of SAR images speckles is multiplicative noise, it difficult to estimate the variance of the high-frequency subband coefficients. Most classical approaches such as wavelet thresholding or shrinkage scheme of Donoho and Johnstone are not suitable for SAR images speckle noise removal. In this paper, a novel approach to SAR image speckle reduction is presented, which is based on second generation bandelets and a kernel-based possibilistic C-means clustering algorithm (BKPCM).
A multiple-kernel fuzzy C-means algorithm for image segmentation.
Chen, Long; Chen, C L Philip; Lu, Mingzhu
2011-10-01
In this paper, a generalized multiple-kernel fuzzy C-means (FCM) (MKFCM) methodology is introduced as a framework for image-segmentation problems. In the framework, aside from the fact that the composite kernels are used in the kernel FCM (KFCM), a linear combination of multiple kernels is proposed and the updating rules for the linear coefficients of the composite kernel are derived as well. The proposed MKFCM algorithm provides us a new flexible vehicle to fuse different pixel information in image-segmentation problems. That is, different pixel information represented by different kernels is combined in the kernel space to produce a new kernel. It is shown that two successful enhanced KFCM-based image-segmentation algorithms are special cases of MKFCM. Several new segmentation algorithms are also derived from the proposed MKFCM framework. Simulations on the segmentation of synthetic and medical images demonstrate the flexibility and advantages of MKFCM-based approaches. PMID:21803693
Segmentation of pomegranate MR images using spatial fuzzy c-means (SFCM) algorithm
NASA Astrophysics Data System (ADS)
Moradi, Ghobad; Shamsi, Mousa; Sedaaghi, M. H.; Alsharif, M. R.
2011-10-01
Segmentation is one of the fundamental issues of image processing and machine vision. It plays a prominent role in a variety of image processing applications. In this paper, one of the most important applications of image processing in MRI segmentation of pomegranate is explored. Pomegranate is a fruit with pharmacological properties such as being anti-viral and anti-cancer. Having a high quality product in hand would be critical factor in its marketing. The internal quality of the product is comprehensively important in the sorting process. The determination of qualitative features cannot be manually made. Therefore, the segmentation of the internal structures of the fruit needs to be performed as accurately as possible in presence of noise. Fuzzy c-means (FCM) algorithm is noise-sensitive and pixels with noise are classified inversely. As a solution, in this paper, the spatial FCM algorithm in pomegranate MR images' segmentation is proposed. The algorithm is performed with setting the spatial neighborhood information in FCM and modification of fuzzy membership function for each class. The segmentation algorithm results on the original and the corrupted Pomegranate MR images by Gaussian, Salt Pepper and Speckle noises show that the SFCM algorithm operates much more significantly than FCM algorithm. Also, after diverse steps of qualitative and quantitative analysis, we have concluded that the SFCM algorithm with 5×5 window size is better than the other windows.
Remote sensing ocean data analyses using fuzzy C-Means clustering
NASA Astrophysics Data System (ADS)
Xu, Suqin; Chen, Jie; Gao, Guoxing
2009-10-01
With the deep understanding and exploitation of the wide Ocean, There are more and more fine instrument installed or loaded on measuring ships or other marines. The high costs and complexity of corrosion place ever-increasing demands on the analyses of surrounding ocean environment. In this paper, the fuzzy C-Means clustering is used to analyze the surrounding ocean environment with remote sensing data. The studied ocean area is considered as a two dimensional gird or an image, and the fuzzy C-Means clustering technique is used to reveal the underlying relationship of the elements and segment the interrelated ocean in regions with similar spectral properties in the influence of instrument corrosion. The influence of the environment elements in instrument corrosion is studied and a priori spatial information is added to improving the segmentation result. The fitness function containing neighbor information was set up based on the gray information and the neighbor relations between the pixels. By making use of the global searching ability of the predator-prey particle swarm optimization, the optimal cluster center could be obtained by iterative optimization and the segmentation could be accomplished. The calculation results show that the segmentation is accurate and reasonable. This ocean environment analysis fruit has used in real application and has proved to be valuable in ship instrument corrosion monitoring and the guide of other ocean activity.
2010-01-01
Background Computer-aided segmentation and border detection in dermoscopic images is one of the core components of diagnostic procedures and therapeutic interventions for skin cancer. Automated assessment tools for dermoscopy images have become an important research field mainly because of inter- and intra-observer variations in human interpretation. In this study, we compare two approaches for automatic border detection in dermoscopy images: density based clustering (DBSCAN) and Fuzzy C-Means (FCM) clustering algorithms. In the first approach, if there exists enough density –greater than certain number of points- around a point, then either a new cluster is formed around the point or an existing cluster grows by including the point and its neighbors. In the second approach FCM clustering is used. This approach has the ability to assign one data point into more than one cluster. Results Each approach is examined on a set of 100 dermoscopy images whose manually drawn borders by a dermatologist are used as the ground truth. Error rates; false positives and false negatives along with true positives and true negatives are quantified by comparing results with manually determined borders from a dermatologist. The assessments obtained from both methods are quantitatively analyzed over three accuracy measures: border error, precision, and recall. Conclusion As well as low border error, high precision and recall, visual outcome showed that the DBSCAN effectively delineated targeted lesion, and has bright future; however, the FCM had poor performance especially in border error metric. PMID:20946610
Sliding PCA Fuzzy Clustering Algorithm
NASA Astrophysics Data System (ADS)
Salgado, Paulo; Gonçalves, Lio; Igrejas, Getúlio
2011-09-01
This paper proposes a new robust approach to nonlinear clustering based on the Principal Component Analysis (PCA) approach. A robust c-means partition is derived by using the natural PCA noise-rejection mechanism and the nonlinearity captured by a sliding process of the clusters prototype. A non-linear extension of PCA has been developed for detecting the lower-dimensional representation of real world data sets. For these cases local linear approaches are used widely because of their computational simplicity and understandability. We will present a new method that joins (merges) the fuzzy clustering algorithm with a local sliding PCA analysis. With this strategy it is possible to identify the non-linear relations and obtain morphological information of the data. The Sliding PCA-Fuzzy cluster algorithm (SPCA-FCA) is a fuzzy clustering algorithm that estimates local principal component vectors as the vectors spanning prototypes of clusters, performed on the neighborhood of the center of cluster and normal approximations in order to estimate a tangent surface that characterizes the trend and curvature of the data points or contours region. Numerical experiments demonstrate that the proposed method is useful for capturing cluster cores by rejecting noise samples, and we can easily assess cluster validity by using cluster-crossing curves.
NASA Astrophysics Data System (ADS)
Kesikoğlu, M. H.; Atasever, Ü. H.; Özkan, C.
2013-10-01
Change detection analyze means that according to observations made in different times, the process of defining the change detection occurring in nature or in the state of any objects or the ability of defining the quantity of temporal effects by using multitemporal data sets. There are lots of change detection techniques met in literature. It is possible to group these techniques under two main topics as supervised and unsupervised change detection. In this study, the aim is to define the land cover changes occurring in specific area of Kayseri with unsupervised change detection techniques by using Landsat satellite images belonging to different years which are obtained by the technique of remote sensing. While that process is being made, image differencing method is going to be applied to the images by following the procedure of image enhancement. After that, the method of Principal Component Analysis is going to be applied to the difference image obtained. To determine the areas that have and don't have changes, the image is grouped as two parts by Fuzzy C-Means Clustering method. For achieving these processes, firstly the process of image to image registration is completed. As a result of this, the images are being referred to each other. After that, gray scale difference image obtained is partitioned into 3 × 3 nonoverlapping blocks. With the method of principal component analysis, eigenvector space is gained and from here, principal components are reached. Finally, feature vector space consisting principal component is partitioned into two clusters using Fuzzy C-Means Clustering and after that change detection process has been done.
Thermogram breast cancer prediction approach based on Neutrosophic sets and fuzzy c-means algorithm.
Gaber, Tarek; Ismail, Gehad; Anter, Ahmed; Soliman, Mona; Ali, Mona; Semary, Noura; Hassanien, Aboul Ella; Snasel, Vaclav
2015-08-01
The early detection of breast cancer makes many women survive. In this paper, a CAD system classifying breast cancer thermograms to normal and abnormal is proposed. This approach consists of two main phases: automatic segmentation and classification. For the former phase, an improved segmentation approach based on both Neutrosophic sets (NS) and optimized Fast Fuzzy c-mean (F-FCM) algorithm was proposed. Also, post-segmentation process was suggested to segment breast parenchyma (i.e. ROI) from thermogram images. For the classification, different kernel functions of the Support Vector Machine (SVM) were used to classify breast parenchyma into normal or abnormal cases. Using benchmark database, the proposed CAD system was evaluated based on precision, recall, and accuracy as well as a comparison with related work. The experimental results showed that our system would be a very promising step toward automatic diagnosis of breast cancer using thermograms as the accuracy reached 100%. PMID:26737234
Wen, Ying; He, Lianghua; von Deneen, Karen M; Lu, Yue
2013-11-01
We present an effective method for brain tissue classification based on diffusion tensor imaging (DTI) data. The method accounts for two main DTI segmentation obstacles: random noise and magnetic field inhomogeneities. In the proposed method, DTI parametric maps were used to resolve intensity inhomogeneities of brain tissue segmentation because they could provide complementary information for tissues and define accurate tissue maps. An improved fuzzy c-means with spatial constraints proposal was used to enhance the noise and artifact robustness of DTI segmentation. Fuzzy c-means clustering with spatial constraints (FCM_S) could effectively segment images corrupted by noise, outliers, and other imaging artifacts. Its effectiveness contributes not only to the introduction of fuzziness for belongingness of each pixel but also to the exploitation of spatial contextual information. We proposed an improved FCM_S applied on DTI parametric maps, which explores the mean and covariance of the feature spatial information for automated segmentation of DTI. The experiments on synthetic images and real-world datasets showed that our proposed algorithms, especially with new spatial constraints, were more effective. PMID:23891435
T1- and T2-weighted spatially constrained fuzzy c-means clustering for brain MRI segmentation
NASA Astrophysics Data System (ADS)
Despotović, Ivana; Goossens, Bart; Vansteenkiste, Ewout; Philips, Wilfried
2010-03-01
The segmentation of brain tissue in magnetic resonance imaging (MRI) plays an important role in clinical analysis and is useful for many applications including studying brain diseases, surgical planning and computer assisted diagnoses. In general, accurate tissue segmentation is a difficult task, not only because of the complicated structure of the brain and the anatomical variability between subjects, but also because of the presence of noise and low tissue contrasts in the MRI images, especially in neonatal brain images. Fuzzy clustering techniques have been widely used in automated image segmentation. However, since the standard fuzzy c-means (FCM) clustering algorithm does not consider any spatial information, it is highly sensitive to noise. In this paper, we present an extension of the FCM algorithm to overcome this drawback, by combining information from both T1-weighted (T1-w) and T2-weighted (T2-w) MRI scans and by incorporating spatial information. This new spatially constrained FCM (SCFCM) clustering algorithm preserves the homogeneity of the regions better than existing FCM techniques, which often have difficulties when tissues have overlapping intensity profiles. The performance of the proposed algorithm is tested on simulated and real adult MR brain images with different noise levels, as well as on neonatal MR brain images with the gestational age of 39 weeks. Experimental quantitative and qualitative segmentation results show that the proposed method is effective and more robust to noise than other FCM-based methods. Also, SCFCM appears as a very promising tool for complex and noisy image segmentation of the neonatal brain.
Carotid artery image segmentation using modified spatial fuzzy c-means and ensemble clustering.
Hassan, Mehdi; Chaudhry, Asmatullah; Khan, Asifullah; Kim, Jin Young
2012-12-01
Disease diagnosis based on ultrasound imaging is popular because of its non-invasive nature. However, ultrasound imaging system produces low quality images due to the presence of spackle noise and wave interferences. This shortcoming requires a considerable effort from experts to diagnose a disease from the carotid artery ultrasound images. Image segmentation is one of the techniques, which can help efficiently in diagnosing a disease from the carotid artery ultrasound images. Most of the pixels in an image are highly correlated. Considering the spatial information of surrounding pixels in the process of image segmentation may further improve the results. When data is highly correlated, one pixel may belong to more than one clusters with different degree of membership. In this paper, we present an image segmentation technique namely improved spatial fuzzy c-means and an ensemble clustering approach for carotid artery ultrasound images to identify the presence of plaque. Spatial, wavelets and gray level co-occurrence matrix (GLCM) features are extracted from carotid artery ultrasound images. Redundant and less important features are removed from the features set using genetic search process. Finally, segmentation process is performed on optimal or reduced features. Ensemble clustering with reduced feature set outperforms with respect to segmentation time as well as clustering accuracy. Intima-media thickness (IMT) is measured from the images segmented by the proposed approach. Based on IMT measured values, Multi-Layer Back-Propagation Neural Networks (MLBPNN) is used to classify the images into normal or abnormal. Experimental results show the learning capability of MLBPNN classifier and validate the effectiveness of our proposed technique. The proposed approach of segmentation and classification of carotid artery ultrasound images seems to be very useful for detection of plaque in carotid artery. PMID:22981822
NASA Astrophysics Data System (ADS)
Rapstine, Thomas D.
Gravity gradiometry has been used as a geophysical tool to image salt structure in hydrocarbon exploration. The knowledge of the location, orientation, and spatial extent of salt bodies helps characterize possible petroleum prospects. Imaging around and underneath salt bodies can be challenging given the petrophysical properties and complicated geometry of salt. Methods for imaging beneath salt using seismic data exist but are often iterative and expensive, requiring a refinement of a velocity model at each iteration. Fortunately, the relatively strong density contrast between salt and background density structure pro- vides the opportunity for gravity gradiometry to be useful in exploration, especially when integrated with other geophysical data such as seismic. Quantitatively integrating multiple geophysical data is not trivial, but can improve the recovery of salt body geometry and petrophysical composition using inversion. This thesis provides two options for quantitatively integrating seismic, AGG, and petrophysical data that may aid the imaging of salt bodies. Both methods leverage and expand upon previously developed deterministic inversion methods. The inversion methods leverage seismically derived information, such as horizon slope and salt body interpretation, to constrain the inversion of airborne gravity gradiometry data (AGG) to arrive at a density contrast model. The first method involves constraining a top of salt inversion using slope in a seismic image. The second method expands fuzzy c-means (FCM) clustering inversion to include spatial control on clustering based on a seismically derived salt body interpretation. The effective- ness of the methods are illustrated on a 2D synthetic earth model derived from the SEAM Phase 1 salt model. Both methods show that constraining the inversion of AGG data using information derived from seismic images can improve the recovery of salt.
NASA Astrophysics Data System (ADS)
Güler, Cüneyt; Thyne, Geoffrey D.
2004-12-01
In this paper, classification of a large hydrochemical data set (more than 600 water samples and 11 hydrochemical variables) from southeastern California by fuzzy c-means (FCM) and hierarchical cluster analysis (HCA) clustering techniques is performed and its application to hydrochemical facies delineation is discussed. Results from both FCM and HCA clustering produced cluster centers (prototypes) that can be used to identify the physical and chemical processes creating the variations in the water chemistries. There are several advantages to FCM, and it is concluded that FCM, as an exploratory data analysis technique, is potentially useful in establishing hydrochemical facies distribution and may provide a better tool than HCA for clustering large data sets when overlapping or continuous clusters exist.
NASA Astrophysics Data System (ADS)
Nasseri, Aynur; Jafar Mohammadzadeh, Mohammad; Hashem Tabatabaei Raeisi, S.
2015-04-01
This paper deals with the application of the ant colony algorithm (AC) to a seismic dataset from Dezful Embayment in the southwest region of Iran. The objective of the approach is to generate an accurate representation of faults and discontinuities to assist in pertinent matters such as well planning and field optimization. The AC analyzed all spatial discontinuities in the seismic attributes from which features were extracted. True fault information from the attributes was detected by many artificial ants, whereas noise and the remains of the reflectors were eliminated. Furthermore, the fracture enhancement procedure was conducted by three steps on seismic data of the area. In the first step several attributes such as chaos, variance/coherence and dip deviation were taken into account; the resulting maps indicate high-resolution contrast for the variance attribute. Subsequently, the enhancement of spatial discontinuities was performed and finally elimination of the noise and remains of non-faulting events was carried out by simulating the behavior of ant colonies. After considering stepwise attribute optimization, focusing on chaos and variance in particular, an attribute fusion was generated and used in the ant colony algorithm. The resulting map displayed the highest performance in feature detection along the main structural feature trend, confined to a NW-SE direction. Thus, the optimized attribute fusion might be used with greater confidence to map the structural feature network with more accuracy and resolution. In order to assess the performance of the AC in feature detection, and cross validate the reliability of the method used, fuzzy c-means clustering (FCMC) was employed for the same dataset. Comparing the maps illustrates the effectiveness and preference of the AC approach due to its high resolution contrast for structural feature detection compared to the FCMC method. Accordingly, 3D planes of discontinuity determined spatial distribution of fractures in the field in order to assist well planning. Results revealed that the high impedance location probability related to an area in the vicinity of the faults, whilst low impedance location probably could indicate zones of high permeability which indicate flow conduits. Analysis under the present study suggests that the orientation and magnitude of fractures exhibiting the main trend of NW-SE in Dezful Embayment is more susceptible to stimulation and is more likely to open for fluid flow.
Model-free functional MRI analysis using Kohonen clustering neural network and fuzzy C-means.
Chuang, K H; Chiu, M J; Lin, C C; Chen, J H
1999-12-01
Conventional model-based or statistical analysis methods for functional MRI (fMRI) suffer from the limitation of the assumed paradigm and biased results. Temporal clustering methods, such as fuzzy clustering, can eliminate these problems but are difficult to find activation occupying a small area, sensitive to noise and initial values, and computationally demanding. To overcome these adversities, a cascade clustering method combining a Kohonen clustering network and fuzzy, means is developed. Receiver operating characteristic (ROC) analysis is used to compare this method with correlation coefficient analysis and t test on a series of testing phantoms. Results shown that this method can efficiently and stably identify the actual functional response with typical signal change to noise ratio, from a small activation area occupying only 0.2% of head size, with phase delay, and from other noise sources such as head motion. With the ability of finding activities of small sizes stably this method can not only identify the functional responses and the active regions more precisely, but also discriminate responses from different signal sources, such as large venous vessels or different types of activation patterns in human studies involving motor cortex activation. Even when the experimental paradigm is unknown in a blind test such that model-based methods are inapplicable, this method can identify the activation patterns and regions correctly. PMID:10695525
Effect of co-operative fuzzy c-means clustering on estimates of three parameters AVA inversion
NASA Astrophysics Data System (ADS)
Nair, Rajesh R.; Kandpal, Suresh Ch
2010-04-01
We determine the degree of variation of model fitness, to a true model based on amplitude variation with angle (AVA) methodology for a synthetic gas hydrate model, using co-operative fuzzy c-means clustering, constrained to a rock physics model. When a homogeneous starting model is used, with only traditional least squares optimization scheme for inversion, the variance of the parameters is found to be comparatively high. In this co-operative methodology, the output from the least squares inversion is fed as an input to the fuzzy scheme. Tests with co-operative inversion using fuzzy c-means with damped least squares technique and constraints derived from empirical relationship based on rock properties model show improved stability, model fitness and variance for all the three parameters in comparison with the standard inversion alone.
Hassan, Mehdi; Chaudhry, Asmatullah; Khan, Asifullah; Iftikhar, M Aksam
2014-02-01
In this paper, a robust method is proposed for segmentation of medical images by exploiting the concept of information gain. Medical images contain inherent noise due to imaging equipment, operating environment and patient movement during image acquisition. A robust medical image segmentation technique is thus inevitable for accurate results in subsequent stages. The clustering technique proposed in this work updates fuzzy membership values and cluster centroids based on information gain computed from the local neighborhood of a pixel. The proposed approach is less sensitive to noise and produces homogeneous clustering. Experiments are performed on medical and non-medical images and results are compared with state of the art segmentation approaches. Analysis of visual and quantitative results verifies that the proposed approach outperforms other techniques both on noisy and noise free images. Furthermore, the proposed technique is used to segment a dataset of 300 real carotid artery ultrasound images. A decision system for plaque detection in the carotid artery is then proposed. Intima media thickness (IMT) is measured from the segmented images produced by the proposed approach. A feature vector based on IMT values is constructed for making decision about the presence of plaque in carotid artery using probabilistic neural network (PNN). The proposed decision system detects plaque in carotid artery images with high accuracy. Finally, effect of the proposed segmentation technique has also been investigated on classification of carotid artery ultrasound images. PMID:24239296
Basic cluster compression algorithm
NASA Technical Reports Server (NTRS)
Hilbert, E. E.; Lee, J.
1980-01-01
Feature extraction and data compression of LANDSAT data is accomplished by BCCA program which reduces costs associated with transmitting, storing, distributing, and interpreting multispectral image data. Algorithm uses spatially local clustering to extract features from image data to describe spectral characteristics of data set. Approach requires only simple repetitive computations, and parallel processing can be used for very high data rates. Program is written in FORTRAN IV for batch execution and has been implemented on SEL 32/55.
Hierarchical modularization of biochemical pathways using fuzzy-c means clustering.
de Luis Balaguer, Maria A; Williams, Cranos M
2014-08-01
Biological systems that are representative of regulatory, metabolic, or signaling pathways can be highly complex. Mathematical models that describe such systems inherit this complexity. As a result, these models can often fail to provide a path toward the intuitive comprehension of these systems. More coarse information that allows a perceptive insight of the system is sometimes needed in combination with the model to understand control hierarchies or lower level functional relationships. In this paper, we present a method to identify relationships between components of dynamic models of biochemical pathways that reside in different functional groups. We find primary relationships and secondary relationships. The secondary relationships reveal connections that are present in the system, which current techniques that only identify primary relationships are unable to show. We also identify how relationships between components dynamically change over time. This results in a method that provides the hierarchy of the relationships among components, which can help us to understand the low level functional structure of the system and to elucidate potential hierarchical control. As a proof of concept, we apply the algorithm to the epidermal growth factor signal transduction pathway, and to the C3 photosynthesis pathway. We identify primary relationships among components that are in agreement with previous computational decomposition studies, and identify secondary relationships that uncover connections among components that current computational approaches were unable to reveal. PMID:24196983
A relational Fuzzy C-Means algorithm for detecting protein spots in two-dimensional gel images.
Rashwan, Shaheera; Faheem, Talaat; Sarhan, Amany; Youssef, Bayumy A B
2010-01-01
Two-dimensional polyacrylamide gel electrophoresis of proteins is a robust and reproducible technique. It is the most widely used separation tool in proteomics. Current efforts in the field are directed at the development of tools for expanding the range of proteins accessible with two-dimensional gels. Proteomics was built around the two-dimensional gel. The idea that multiple proteins can be analyzed in parallel grew from two-dimensional gel maps. Proteomics researchers needed to identify interested protein spots by examining the gel. This is time consuming, labor extensive and error prone. It is desired that the computer can analyze the proteins automatically by first detecting, then quantifying the protein spots in the 2D gel images. This paper focuses on the protein spot detection and segmentation of 2D gel electrophoresis images. We present a new technique for segmentation of 2D gel images using the Fuzzy C-Means (FCM) algorithm and matching spots using the notion of fuzzy relations. Through the experimental results, the new algorithm was found out to detect protein spots more accurately, then the current known algorithms. PMID:20865504
A meteor cluster detection algorithm
NASA Astrophysics Data System (ADS)
Burt, Joshua B.; Moorhead, Althea V.; Cooke, William J.
2014-02-01
We present an algorithm to identify groups of meteors within all-sky meteor network observations that are clustered in radiant, velocity, and time. These meteor clusters may reveal new minor meteor showers or uncover false negatives for known shower association. Sporadic meteoroid sources and established meteor showers exhibiting spatiotemporal proximity to identified clusters are reported by the algorithm for end-user reference, as well as the orbital similarity of cluster members quantified using the Drummond D-criterion. This algorithm will be integrated into the existing data-processing pipeline at the NASA Meteoroid Environments Office to alert staff in near-real time of clustered meteor events.
NASA Astrophysics Data System (ADS)
Ghaffarian, Saman; Gökaşar, Ilgın
2016-01-01
This study presents an approach for the automatic detection of vehicles using very high-resolution images and road vector data. Initially, road vector data and aerial images are integrated to extract road regions. Then, the extracted road/street region is clustered using an automatic histogram-based fuzzy C-means algorithm, and edge pixels are detected using the Canny edge detector. In order to automatically detect vehicles, we developed a local perceptual grouping approach based on fusion of edge detection and clustering outputs. To provide the locality, an ellipse is generated using characteristics of the candidate clusters individually. Then, ratio of edge pixels to nonedge pixels in the corresponding ellipse is computed to distinguish the vehicles. Finally, a point-merging rule is conducted to merge the points that satisfy a predefined threshold and are supposed to denote the same vehicles. The experimental validation of the proposed method was carried out on six very high-resolution aerial images that illustrate two highways, two shadowed roads, a crowded narrow street, and a street in a dense urban area with crowded parked vehicles. The evaluation of the results shows that our proposed method performed 86% and 83% in overall correctness and completeness, respectively.
Tang, Jing Rui; Mat Isa, Nor Ashidi; Ch’ng, Ewe Seng
2015-01-01
Despite the effectiveness of Pap-smear test in reducing the mortality rate due to cervical cancer, the criteria of the reporting standard of the Pap-smear test are mostly qualitative in nature. This study addresses the issue on how to define the criteria in a more quantitative and definite term. A negative Pap-smear test result, i.e. negative for intraepithelial lesion or malignancy (NILM), is qualitatively defined to have evenly distributed, finely granular chromatin in the nuclei of cervical squamous cells. To quantify this chromatin pattern, this study employed Fuzzy C-Means clustering as the segmentation technique, enabling different degrees of chromatin segmentation to be performed on sample images of non-neoplastic squamous cells. From the simulation results, a model representing the chromatin distribution of non-neoplastic cervical squamous cell is constructed with the following quantitative characteristics: at the best representative sensitivity level 4 based on statistical analysis and human experts’ feedbacks, a nucleus of non-neoplastic squamous cell has an average of 67 chromatins with a total area of 10.827μm2; the average distance between the nearest chromatin pair is 0.508μm and the average eccentricity of the chromatin is 0.47. PMID:26560331
NASA Astrophysics Data System (ADS)
An, Yu; Liu, Jie; Ye, Jinzuo; Mao, Yamin; Yang, Xin; Jiang, Shixin; Chi, Chongwei; Tian, Jie
2015-03-01
As an important molecular imaging modality, fluorescence molecular imaging (FMI) has the advantages of high sensitivity, low cost and ease of use. By labeling the regions of interest with fluorophore, FMI can noninvasively obtain the distribution of fluorophore in-vivo. However, due to the fact that the spectrum of fluorescence is in the section of the visible light range, there are mass of autofluorescence on the surface of the bio-tissues, which is a major disturbing factor in FMI. Meanwhile, the high-level of dark current for charge-coupled device (CCD) camera and other influencing factor can also produce a lot of background noise. In this paper, a novel method for image denoising of FMI based on fuzzy C-Means clustering (FCM) is proposed, because the fluorescent signal is the major component of the fluorescence images, and the intensity of autofluorescence and other background signals is relatively lower than the fluorescence signal. First, the fluorescence image is smoothed by sliding-neighborhood operations to initially eliminate the noise. Then, the wavelet transform (WLT) is performed on the fluorescence images to obtain the major component of the fluorescent signals. After that, the FCM method is adopt to separate the major component and background of the fluorescence images. Finally, the proposed method was validated using the original data obtained by in vivo implanted fluorophore experiment, and the results show that our proposed method can effectively obtain the fluorescence signal while eliminate the background noise, which could increase the quality of fluorescence images.
Keller, Brad M.; Nathan, Diane L.; Wang Yan; Zheng Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina
2012-08-15
Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') and vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r= 0.82, p < 0.001) and processed (r= 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r= 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's {kappa}{>=} 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies.
NASA Astrophysics Data System (ADS)
Brandl, Miriam B.; Beck, Dominik; Pham, Tuan D.
2011-06-01
The high dimensionality of image-based dataset can be a drawback for classification accuracy. In this study, we propose the application of fuzzy c-means clustering, cluster validity indices and the notation of a joint-feature-clustering matrix to find redundancies of image-features. The introduced matrix indicates how frequently features are grouped in a mutual cluster. The resulting information can be used to find data-derived feature prototypes with a common biological meaning, reduce data storage as well as computation times and improve the classification accuracy.
Effective FCM noise clustering algorithms in medical images.
Kannan, S R; Devi, R; Ramathilagam, S; Takezawa, K
2013-02-01
The main motivation of this paper is to introduce a class of robust non-Euclidean distance measures for the original data space to derive new objective function and thus clustering the non-Euclidean structures in data to enhance the robustness of the original clustering algorithms to reduce noise and outliers. The new objective functions of proposed algorithms are realized by incorporating the noise clustering concept into the entropy based fuzzy C-means algorithm with suitable noise distance which is employed to take the information about noisy data in the clustering process. This paper presents initial cluster prototypes using prototype initialization method, so that this work tries to obtain the final result with less number of iterations. To evaluate the performance of the proposed methods in reducing the noise level, experimental work has been carried out with a synthetic image which is corrupted by Gaussian noise. The superiority of the proposed methods has been examined through the experimental study on medical images. The experimental results show that the proposed algorithms perform significantly better than the standard existing algorithms. The accurate classification percentage of the proposed fuzzy C-means segmentation method is obtained using silhouette validity index. PMID:23219569
NASA Astrophysics Data System (ADS)
Abdulbaqi, Hayder Saad; Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin; Abood, Loay Kadom
2015-04-01
Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.
Abdulbaqi, Hayder Saad; Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin; Abood, Loay Kadom
2015-04-24
Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.
Chang, Yeun-Chung; Huang, Yan-Hao; Huang, Chiun-Sheng; Chang, Pei-Kang; Chen, Jeon-Hor; Chang, Ruey-Feng
2012-04-01
The purpose of this study is to evaluate the diagnostic efficacy of the representative characteristic kinetic curve of dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) extracted by fuzzy c-means (FCM) clustering for the discrimination of benign and malignant breast tumors using a novel computer-aided diagnosis (CAD) system. About the research data set, DCE-MRIs of 132 solid breast masses with definite histopathologic diagnosis (63 benign and 69 malignant) were used in this study. At first, the tumor region was automatically segmented using the region growing method based on the integrated color map formed by the combination of kinetic and area under curve color map. Then, the FCM clustering was used to identify the time-signal curve with the larger initial enhancement inside the segmented region as the representative kinetic curve, and then the parameters of the Tofts pharmacokinetic model for the representative kinetic curve were compared with conventional curve analysis (maximal enhancement, time to peak, uptake rate and washout rate) for each mass. The results were analyzed with a receiver operating characteristic curve and Student's t test to evaluate the classification performance. Accuracy, sensitivity, specificity, positive predictive value and negative predictive value of the combined model-based parameters of the extracted kinetic curve from FCM clustering were 86.36% (114/132), 85.51% (59/69), 87.30% (55/63), 88.06% (59/67) and 84.62% (55/65), better than those from a conventional curve analysis. The A(Z) value was 0.9154 for Tofts model-based parametric features, better than that for conventional curve analysis (0.8673), for discriminating malignant and benign lesions. In conclusion, model-based analysis of the characteristic kinetic curve of breast mass derived from FCM clustering provides effective lesion classification. This approach has potential in the development of a CAD system for DCE breast MRI. PMID:22245697
Kumar, Surendra; Ghosh, Subhojit; Tetarway, Suhash; Sinha, Rakesh Kumar
2015-07-01
In this study, the magnitude and spatial distribution of frequency spectrum in the resting electroencephalogram (EEG) were examined to address the problem of detecting alcoholism in the cerebral motor cortex. The EEG signals were recorded from chronic alcoholic conditions (n = 20) and the control group (n = 20). Data were taken from motor cortex region and divided into five sub-bands (delta, theta, alpha, beta-1 and beta-2). Three methodologies were adopted for feature extraction: (1) absolute power, (2) relative power and (3) peak power frequency. The dimension of the extracted features is reduced by linear discrimination analysis and classified by support vector machine (SVM) and fuzzy C-mean clustering. The maximum classification accuracy (88 %) with SVM clustering was achieved with the EEG spectral features with absolute power frequency on F4 channel. Among the bands, relatively higher classification accuracy was found over theta band and beta-2 band in most of the channels when computed with the EEG features of relative power. Electrodes wise CZ, C3 and P4 were having more alteration. Considering the good classification accuracy obtained by SVM with relative band power features in most of the EEG channels of motor cortex, it can be suggested that the noninvasive automated online diagnostic system for the chronic alcoholic condition can be developed with the help of EEG signals. PMID:25773367
Y-Means: An Autonomous Clustering Algorithm
NASA Astrophysics Data System (ADS)
Ghorbani, Ali A.; Onut, Iosif-Viorel
This paper proposes an unsupervised clustering technique for data classification based on the K-means algorithm. The K-means algorithm is well known for its simplicity and low time complexity. However, the algorithm has three main drawbacks: dependency on the initial centroids, dependency on the number of clusters, and degeneracy. Our solution accommodates these three issues, by proposing an approach to automatically detect a semi-optimal number of clusters according to the statistical nature of the data. As a side effect, the method also makes choices of the initial centroid-seeds not critical to the clustering results. The experimental results show the robustness of the Y-means algorithm as well as its good performance against a set of other well known unsupervised clustering techniques. Furthermore, we study the performance of our proposed solution against different distance and outlier-detection functions and recommend the best combinations.
Fuzzy-Kohonen-clustering neural network trained by genetic algorithm and fuzzy competition learning
NASA Astrophysics Data System (ADS)
Xie, Weixing; Li, Wenhua; Gao, Xinbo
1995-08-01
Kohonen networks are well known for clustering analysis. Classical Kohonen networks for hard c-means clustering (trained by winner-take-all learning) have some severe drawbacks. Fuzzy Kohonen networks (FKCNN) for fuzzy c-means clustering are trained by fuzzy competition learning, and can get better clustering results than the classical Kohonen networks. However, both winner-take-all and fuzzy competition learning algorithms are in essence local search techniques that search for the optimum by using a hill-climbing technique. Thus, they often fail in the search for the global optimum. In this paper we combine genetic algorithms (GAs) with fuzzy competition learning to train the FKCNN. Our experimental results show that the proposed GA/FC learning algorithm has much higher probabilities of finding the global optimal solutions than either the winner-take-all or the fuzzy competition learning.
Basic firefly algorithm for document clustering
NASA Astrophysics Data System (ADS)
Mohammed, Athraa Jasim; Yusof, Yuhanis; Husni, Husniza
2015-12-01
The Document clustering plays significant role in Information Retrieval (IR) where it organizes documents prior to the retrieval process. To date, various clustering algorithms have been proposed and this includes the K-means and Particle Swarm Optimization. Even though these algorithms have been widely applied in many disciplines due to its simplicity, such an approach tends to be trapped in a local minimum during its search for an optimal solution. To address the shortcoming, this paper proposes a Basic Firefly (Basic FA) algorithm to cluster text documents. The algorithm employs the Average Distance to Document Centroid (ADDC) as the objective function of the search. Experiments utilizing the proposed algorithm were conducted on the 20Newsgroups benchmark dataset. Results demonstrate that the Basic FA generates a more robust and compact clusters than the ones produced by K-means and Particle Swarm Optimization (PSO).
NASA Astrophysics Data System (ADS)
Castro, Marcelo A.; Thomasson, David; Avila, Nilo A.; Hufton, Jennifer; Senseney, Justin; Johnson, Reed F.; Dyall, Julie
2013-03-01
Monkeypox virus is an emerging zoonotic pathogen that results in up to 10% mortality in humans. Knowledge of clinical manifestations and temporal progression of monkeypox disease is limited to data collected from rare outbreaks in remote regions of Central and West Africa. Clinical observations show that monkeypox infection resembles variola infection. Given the limited capability to study monkeypox disease in humans, characterization of the disease in animal models is required. A previous work focused on the identification of inflammatory patterns using PET/CT image modality in two non-human primates previously inoculated with the virus. In this work we extended techniques used in computer-aided detection of lung tumors to identify inflammatory lesions from monkeypox virus infection and their progression using CT images. Accurate estimation of partial volumes of lung lesions via segmentation is difficult because of poor discrimination between blood vessels, diseased regions, and outer structures. We used hard C-means algorithm in conjunction with landmark based registration to estimate the extent of monkeypox virus induced disease before inoculation and after disease progression. Automated estimation is in close agreement with manual segmentation.
A local distribution based spatial clustering algorithm
NASA Astrophysics Data System (ADS)
Deng, Min; Liu, Qiliang; Li, Guangqiang; Cheng, Tao
2009-10-01
Spatial clustering is an important means for spatial data mining and spatial analysis, and it can be used to discover the potential spatial association rules and outliers among the spatial data. Most existing spatial clustering algorithms only utilize the spatial distance or local density to find the spatial clusters in a spatial database, without taking the spatial local distribution characters into account, so that the clustered results are unreasonable in many cases. To overcome such limitations, this paper develops a new indicator (i.e. local median angle) to measure the local distribution at first, and further proposes a new algorithm, called local distribution based spatial clustering algorithm (LDBSC in abbreviation). In the process of spatial clustering, a series of recursive search are implemented for all the entities so that those entities with its local median angle being very close or equal are clustered. In this way, all the spatial entities in the spatial database can be automatically divided into some clusters. Finally, two tests are implemented to demonstrate that the method proposed in this paper is more prominent than DBSCAN, as well as that it is very robust and feasible, and can be used to find the clusters with different shapes.
Optimal Hops-Based Adaptive Clustering Algorithm
NASA Astrophysics Data System (ADS)
Xuan, Xin; Chen, Jian; Zhen, Shanshan; Kuo, Yonghong
This paper proposes an optimal hops-based adaptive clustering algorithm (OHACA). The algorithm sets an energy selection threshold before the cluster forms so that the nodes with less energy are more likely to go to sleep immediately. In setup phase, OHACA introduces an adaptive mechanism to adjust cluster head and load balance. And the optimal distance theory is applied to discover the practical optimal routing path to minimize the total energy for transmission. Simulation results show that OHACA prolongs the life of network, improves utilizing rate and transmits more data because of energy balance.
Hierarchical link clustering algorithm in networks
NASA Astrophysics Data System (ADS)
Bodlaj, Jernej; Batagelj, Vladimir
2015-06-01
Hierarchical network clustering is an approach to find tightly and internally connected clusters (groups or communities) of nodes in a network based on its structure. Instead of nodes, it is possible to cluster links of the network. The sets of nodes belonging to clusters of links can overlap. While overlapping clusters of nodes are not always expected, they are natural in many applications. Using appropriate dissimilarity measures, we can complement the clustering strategy to consider, for example, the semantic meaning of links or nodes based on their properties. We propose a new hierarchical link clustering algorithm which in comparison to existing algorithms considers node and/or link properties (descriptions, attributes) of the input network alongside its structure using monotonic dissimilarity measures. The algorithm determines communities that form connected subnetworks (relational constraint) containing locally similar nodes with respect to their description. It is only implicitly based on the corresponding line graph of the input network, thus reducing its space and time complexities. We investigate both complexities analytically and statistically. Using provided dissimilarity measures, our algorithm can, in addition to the general overlapping community structure of input networks, uncover also related subregions inside these communities in a form of hierarchy. We demonstrate this ability on real-world and artificial network examples.
An algorithm for spatial heirarchy clustering
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Velasco, F. R. D.
1981-01-01
A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.
V-cluster algorithm: a new algorithm for clustering molecules based upon numeric data.
Xu, Jun; Zhang, Qiang; Shih, Chen-Kon
2006-08-01
Clustering molecules based on numeric data such as, gene-expression data, physiochemical properties, or theoretical data is very important in drug discovery and other life sciences. Most approaches use hierarchical clustering algorithms, non-hierarchical algorithms (for examples, K-mean and K-nearest neighbor), and other similar methods (for examples, the Self-Organization Mapping (SOM) and the Support Vector Machine (SVM)). These approaches are non-robust (results are not consistent) and, computationally expensive. This paper will report a new, non-hierarchical algorithm called the V-Cluster (V stands for vector) Algorithm. This algorithm produces rational, robust results while reducing computing complexity. Similarity measurement and data normalization rules are also discussed along with case studies. When molecules are represented in a set of numeric vectors, the V-Cluster Algorithm clusters the molecules in three steps: (1) ranking the vectors based upon their overall intensity levels, (2) computing cluster centers based upon neighboring density, and (3) assigning molecules to their nearest cluster center. The program is written in C/C++ language, and runs on Window95/NT and UNIX platforms. With the V-Cluster program, the user can quickly complete the clustering process and, easily examine the results by use of thumbnail graphs, superimposed intensity curves of vectors, and spreadsheets. Multi-functional query tools have also been implemented. PMID:16896541
Farjam, Reza; Tsien, Christina I.; Lawrence, Theodore S.; Cao, Yue; Department of Radiology, University of Michigan, 1500 East Medical Center Drive, Med Inn Building C478, Ann Arbor, Michigan 48109-5842; Department of Biomedical Engineering, University of Michigan, 2200 Bonisteel Boulevard, Ann Arbor, Michigan 48109-2099
2014-01-15
Purpose: To develop a pharmacokinetic modelfree framework to analyze the dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data for assessment of response of brain metastases to radiation therapy. Methods: Twenty patients with 45 analyzable brain metastases had MRI scans prior to whole brain radiation therapy (WBRT) and at the end of the 2-week therapy. The volumetric DCE images covering the whole brain were acquired on a 3T scanner with approximately 5 s temporal resolution and a total scan time of about 3 min. DCE curves from all voxels of the 45 brain metastases were normalized and then temporally aligned. A DCE matrix that is constructed from the aligned DCE curves of all voxels of the 45 lesions obtained prior to WBRT is processed by principal component analysis to generate the principal components (PCs). Then, the projection coefficient maps prior to and at the end of WBRT are created for each lesion. Next, a pattern recognition technique, based upon fuzzy-c-means clustering, is used to delineate the tumor subvolumes relating to the value of the significant projection coefficients. The relationship between changes in different tumor subvolumes and treatment response was evaluated to differentiate responsive from stable and progressive tumors. Performance of the PC-defined tumor subvolume was also evaluated by receiver operating characteristic (ROC) analysis in prediction of nonresponsive lesions and compared with physiological-defined tumor subvolumes. Results: The projection coefficient maps of the first three PCs contain almost all response-related information in DCE curves of brain metastases. The first projection coefficient, related to the area under DCE curves, is the major component to determine response while the third one has a complimentary role. In ROC analysis, the area under curve of 0.88 0.05 and 0.86 0.06 were achieved for the PC-defined and physiological-defined tumor subvolume in response assessment. Conclusions: The PC-defined subvolume of a brain metastasis could predict tumor response to therapy similar to the physiological-defined one, while the former is determined more rapidly for clinical decision-making support.
Farjam, Reza; Tsien, Christina I.; Lawrence, Theodore S.; Cao, Yue; Department of Radiology, University of Michigan, 1500 East Medical Center Drive, Med Inn Building C478, Ann Arbor, Michigan 48109-5842; Department of Biomedical Engineering, University of Michigan, 2200 Bonisteel Boulevard, Ann Arbor, Michigan 48109-2099
2014-01-15
Purpose: To develop a pharmacokinetic modelfree framework to analyze the dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data for assessment of response of brain metastases to radiation therapy. Methods: Twenty patients with 45 analyzable brain metastases had MRI scans prior to whole brain radiation therapy (WBRT) and at the end of the 2-week therapy. The volumetric DCE images covering the whole brain were acquired on a 3T scanner with approximately 5 s temporal resolution and a total scan time of about 3 min. DCE curves from all voxels of the 45 brain metastases were normalized and then temporally aligned. A DCE matrix that is constructed from the aligned DCE curves of all voxels of the 45 lesions obtained prior to WBRT is processed by principal component analysis to generate the principal components (PCs). Then, the projection coefficient maps prior to and at the end of WBRT are created for each lesion. Next, a pattern recognition technique, based upon fuzzy-c-means clustering, is used to delineate the tumor subvolumes relating to the value of the significant projection coefficients. The relationship between changes in different tumor subvolumes and treatment response was evaluated to differentiate responsive from stable and progressive tumors. Performance of the PC-defined tumor subvolume was also evaluated by receiver operating characteristic (ROC) analysis in prediction of nonresponsive lesions and compared with physiological-defined tumor subvolumes. Results: The projection coefficient maps of the first three PCs contain almost all response-related information in DCE curves of brain metastases. The first projection coefficient, related to the area under DCE curves, is the major component to determine response while the third one has a complimentary role. In ROC analysis, the area under curve of 0.88 ± 0.05 and 0.86 ± 0.06 were achieved for the PC-defined and physiological-defined tumor subvolume in response assessment. Conclusions: The PC-defined subvolume of a brain metastasis could predict tumor response to therapy similar to the physiological-defined one, while the former is determined more rapidly for clinical decision-making support.
Performance Comparison Of Evolutionary Algorithms For Image Clustering
NASA Astrophysics Data System (ADS)
Civicioglu, P.; Atasever, U. H.; Ozkan, C.; Besdok, E.; Karkinli, A. E.; Kesikoglu, A.
2014-09-01
Evolutionary computation tools are able to process real valued numerical sets in order to extract suboptimal solution of designed problem. Data clustering algorithms have been intensively used for image segmentation in remote sensing applications. Despite of wide usage of evolutionary algorithms on data clustering, their clustering performances have been scarcely studied by using clustering validation indexes. In this paper, the recently proposed evolutionary algorithms (i.e., Artificial Bee Colony Algorithm (ABC), Gravitational Search Algorithm (GSA), Cuckoo Search Algorithm (CS), Adaptive Differential Evolution Algorithm (JADE), Differential Search Algorithm (DSA) and Backtracking Search Optimization Algorithm (BSA)) and some classical image clustering techniques (i.e., k-means, fcm, som networks) have been used to cluster images and their performances have been compared by using four clustering validation indexes. Experimental test results exposed that evolutionary algorithms give more reliable cluster-centers than classical clustering techniques, but their convergence time is quite long.
Genetic algorithm optimization of atomic clusters
Morris, J.R.; Deaven, D.M.; Ho, K.M.; Wang, C.Z.; Pan, B.C.; Wacker, J.G.; Turner, D.E. |
1996-12-31
The authors have been using genetic algorithms to study the structures of atomic clusters and related problems. This is a problem where local minima are easy to locate, but barriers between the many minima are large, and the number of minima prohibit a systematic search. They use a novel mating algorithm that preserves some of the geometrical relationship between atoms, in order to ensure that the resultant structures are likely to inherit the best features of the parent clusters. Using this approach, they have been able to find lower energy structures than had been previously obtained. Most recently, they have been able to turn around the building block idea, using optimized structures from the GA to learn about systematic structural trends. They believe that an effective GA can help provide such heuristic information, and (conversely) that such information can be introduced back into the algorithm to assist in the search process.
Ergen, Burhan
2014-01-01
This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT) and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM) clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases. PMID:24790590
Kasim, Shahreen; Deris, Safaai; Othman, Razib M
2013-09-01
A drastic improvement in the analysis of gene expression has lead to new discoveries in bioinformatics research. In order to analyse the gene expression data, fuzzy clustering algorithms are widely used. However, the resulting analyses from these specific types of algorithms may lead to confusion in hypotheses with regard to the suggestion of dominant function for genes of interest. Besides that, the current fuzzy clustering algorithms do not conduct a thorough analysis of genes with low membership values. Therefore, we present a novel computational framework called the "multi-stage filtering-Clustering Functional Annotation" (msf-CluFA) for clustering gene expression data. The framework consists of four components: fuzzy c-means clustering (msf-CluFA-0), achieving dominant cluster (msf-CluFA-1), improving confidence level (msf-CluFA-2) and combination of msf-CluFA-0, msf-CluFA-1 and msf-CluFA-2 (msf-CluFA-3). By employing double filtering in msf-CluFA-1 and apriori algorithms in msf-CluFA-2, our new framework is capable of determining the dominant clusters and improving the confidence level of genes with lower membership values by means of which the unknown genes can be predicted. PMID:23930805
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various conditions were collected in conjunction with JSC postural control studies using a Tilt-Translation Device (TTD). The University of West Florida proposed applying the Fuzzy C-Means Clustering (FCM) Algorithms to this data with a view towards identifying various states and stages. Data supplied by NASA/JSC were submitted to the FCM algorithms in an attempt to identify and characterize cluster substructure in a mixed ensemble of pre- and post-adaptational TTD data. Following several unsuccessful trials with FCM using a full 11 dimensional data set, a set of two channels (features) were found to enable FCM to separate pre- from post-adaptational TTD data. The main conclusions are that: (1) FCM seems able to separate pre- from post-TTD subject no. 2 on the one trial that was used, but only in certain subintervals of time; and (2) Channels 2 (right rear transducer force) and 8 (hip sway bar) contain better discrimination information than other supersets and combinations of the data that were tried so far.
Sparse subspace clustering: algorithm, theory, and applications.
Elhamifar, Ehsan; Vidal, René
2013-11-01
Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering. PMID:24051734
CLASSY: An adaptive maximum likelihood clustering algorithm
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Rassbach, M. E. (Principal Investigator)
1979-01-01
The CLASSY clustering method alternates maximum likelihood iterative techniques for estimating the parameters of a mixture distribution with an adaptive procedure for splitting, combining, and eliminating the resultant components of the mixture. The adaptive procedure is based on maximizing the fit of a mixture of multivariate normal distributions to the observed data using its first through fourth central moments. It generates estimates of the number of multivariate normal components in the mixture as well as the proportion, mean vector, and covariance matrix for each component. The basic mathematical model for CLASSY and the actual operation of the algorithm as currently implemented are described. Results of applying CLASSY to real and simulated LANDSAT data are presented and compared with those generated by the iterative self-organizing clustering system algorithm on the same data sets.
Chaotic map clustering algorithm for EEG analysis
NASA Astrophysics Data System (ADS)
Bellotti, R.; De Carlo, F.; Stramaglia, S.
2004-03-01
The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
Cluster compression algorithm: A joint clustering/data compression concept
NASA Technical Reports Server (NTRS)
Hilbert, E. E.
1977-01-01
The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.
Filamentary galaxy clustering - A mapping algorithm
NASA Technical Reports Server (NTRS)
Gott, J. R., III; Moody, J. E.; Turner, E. L.
1983-01-01
A simple and objective algorithm is presented which not only accurately identifies the filamentary structures in the Shane-Wirtanen galaxy count catalog, but also finds a set of visually less impressive filaments in a static hierarchical model of the clustering conducted by Soneira and Peebles (1978). The statistical properties of the elements in the model, while very similar to those in the data, show a significant excess of long and bright filaments in the data relative to the model. Two possible interpretations of these results are presented and discussed.
Improved Ant Colony Clustering Algorithm and Its Performance Study.
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Improved Ant Colony Clustering Algorithm and Its Performance Study
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
A clustering routing algorithm based on improved ant colony clustering for wireless sensor networks
NASA Astrophysics Data System (ADS)
Xiao, Xiaoli; Li, Yang
Because of real wireless sensor network node distribution uniformity, this paper presents a clustering strategy based on the ant colony clustering algorithm (ACC-C). To reduce the energy consumption of the head near the base station and the whole network, The algorithm uses ant colony clustering on non-uniform clustering. The improve route optimal degree is presented to evaluate the performance of the chosen route. Simulation results show that, compared with other algorithms, like the LEACH algorithm and the improve particle cluster kind of clustering algorithm (PSC - C), the proposed approach is able to keep away from the node with less residual energy, which can improve the life of networks.
Tellaroli, Paola; Bazzi, Marco; Donato, Michele; Brazzale, Alessandra R; Drăghici, Sorin
2016-01-01
Four of the most common limitations of the many available clustering methods are: i) the lack of a proper strategy to deal with outliers; ii) the need for a good a priori estimate of the number of clusters to obtain reasonable results; iii) the lack of a method able to detect when partitioning of a specific data set is not appropriate; and iv) the dependence of the result on the initialization. Here we propose Cross-clustering (CC), a partial clustering algorithm that overcomes these four limitations by combining the principles of two well established hierarchical clustering algorithms: Ward's minimum variance and Complete-linkage. We validated CC by comparing it with a number of existing clustering methods, including Ward's and Complete-linkage. We show on both simulated and real datasets, that CC performs better than the other methods in terms of: the identification of the correct number of clusters, the identification of outliers, and the determination of real cluster memberships. We used CC to cluster samples in order to identify disease subtypes, and on gene profiles, in order to determine groups of genes with the same behavior. Results obtained on a non-biological dataset show that the method is general enough to be successfully used in such diverse applications. The algorithm has been implemented in the statistical language R and is freely available from the CRAN contributed packages repository. PMID:27015427
Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters
Tellaroli, Paola; Bazzi, Marco; Donato, Michele; Brazzale, Alessandra R.; Drăghici, Sorin
2016-01-01
Four of the most common limitations of the many available clustering methods are: i) the lack of a proper strategy to deal with outliers; ii) the need for a good a priori estimate of the number of clusters to obtain reasonable results; iii) the lack of a method able to detect when partitioning of a specific data set is not appropriate; and iv) the dependence of the result on the initialization. Here we propose Cross-clustering (CC), a partial clustering algorithm that overcomes these four limitations by combining the principles of two well established hierarchical clustering algorithms: Ward’s minimum variance and Complete-linkage. We validated CC by comparing it with a number of existing clustering methods, including Ward’s and Complete-linkage. We show on both simulated and real datasets, that CC performs better than the other methods in terms of: the identification of the correct number of clusters, the identification of outliers, and the determination of real cluster memberships. We used CC to cluster samples in order to identify disease subtypes, and on gene profiles, in order to determine groups of genes with the same behavior. Results obtained on a non-biological dataset show that the method is general enough to be successfully used in such diverse applications. The algorithm has been implemented in the statistical language R and is freely available from the CRAN contributed packages repository. PMID:27015427
Energy Aware Clustering Algorithms for Wireless Sensor Networks
NASA Astrophysics Data System (ADS)
Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian
2011-09-01
The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
A novel clustering algorithm inspired by membrane computing.
Peng, Hong; Luo, Xiaohui; Gao, Zhisheng; Wang, Jun; Pei, Zheng
2015-01-01
P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature. PMID:25874264
Accelerating Fuzzy-C Means Using an Estimated Subsample Size
Parker, Jonathon K.; Hall, Lawrence O.
2015-01-01
Many algorithms designed to accelerate the Fuzzy c-Means (FCM) clustering algorithm randomly sample the data. Typically, no statistical method is used to estimate the subsample size, despite the impact subsample sizes have on speed and quality. This paper introduces two new accelerated algorithms, GOFCM and MSERFCM, that use a statistical method to estimate the subsample size. GOFCM, a variant of SPFCM, also leverages progressive sampling. MSERFCM, a variant of rseFCM, gains a speedup from improved initialization. A general, novel stopping criterion for accelerated clustering is introduced. The new algorithms are compared to FCM and four accelerated variants of FCM. GOFCM's speedup was 4-47 times that of FCM and faster than SPFCM on each of the six datasets used in experiments. For five of the datasets, partitions were within 1% of those of FCM. MSERFCM's speedup was 5-26 times that of FCM and produced partitions within 3% of those of FCM on all datasets. A unique dataset, consisting of plankton images, exposed the strengths and weaknesses of many of the algorithms tested. It is shown that the new stopping criterion is effective in speeding up algorithms such as SPFCM and the final partitions are very close to those of FCM. PMID:26617455
Algorithms of maximum likelihood data clustering with applications
NASA Astrophysics Data System (ADS)
Giada, Lorenzo; Marsili, Matteo
2002-12-01
We address the problem of data clustering by introducing an unsupervised, parameter-free approach based on maximum likelihood principle. Starting from the observation that data sets belonging to the same cluster share a common information, we construct an expression for the likelihood of any possible cluster structure. The likelihood in turn depends only on the Pearson's coefficient of the data. We discuss clustering algorithms that provide a fast and reliable approximation to maximum likelihood configurations. Compared to standard clustering methods, our approach has the advantages that (i) it is parameter free, (ii) the number of clusters need not be fixed in advance and (iii) the interpretation of the results is transparent. In order to test our approach and compare it with standard clustering algorithms, we analyze two very different data sets: time series of financial market returns and gene expression data. We find that different maximization algorithms produce similar cluster structures whereas the outcome of standard algorithms has a much wider variability.
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models
NASA Technical Reports Server (NTRS)
Mjoisness, Eric; Castano, Rebecca; Gray, Alexander
1999-01-01
We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
NASA Astrophysics Data System (ADS)
Dekkers, M. J.; Heslop, D.; Herrero-Bervera, E.; Acton, G.; Krasa, D.
2014-12-01
Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6.44.1' N, 91.56.1' W) on the Cocos Plate occurs in 15.2 Ma oceanic crust generated by superfast seafloor spreading. Presently, it is the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Here we interpret down-hole trends in several rock-magnetic parameters with fuzzy c-means cluster analysis, a multivariate statistical technique. The parameters include the magnetization ratio, the coercivity ratio, the coercive force, the low-field susceptibility, and the Curie temperature. By their combined, multivariate, analysis the effects of magmatic and hydrothermal processes can be evaluated. The optimal number of clusters - a key point in the analysis because there is no a priori information on this - was determined through a combination of approaches: by calculation of several cluster validity indices, by testing for coherent cluster distributions on non-linear-map plots, and importantly by testing for stability of the cluster solution from all possible starting points. Here, we consider a solution robust if the cluster allocation is independent of the starting configuration. The five-cluster solution appeared to be robust. Three clusters are distinguished in the extrusive segment of the Hole that express increasing hydrothermal alteration of the lavas. The sheeted dike and gabbro portions are characterized by two clusters, both with higher coercivities than in lava samples. Extensive alteration, however, can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. All clusters display rock magnetic characteristics in line with a stable NRM. This implies that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Determination of the absolute paleointensity with thermal techniques is not straightforward because of the propensity of oxyexsolution during laboratory heating and/or the presence of intergrowths. The upper part of the extrusive sequence, the granoblastic portion of the dikes, and moderately altered gabbros may contain a comparatively uncontaminated thermoremanent magnetization.
Clustering algorithms do not learn, but they can be learned
NASA Astrophysics Data System (ADS)
Brun, Marcel; Dougherty, Edward R.
2005-08-01
Pattern classification theory involves an error criterion, optimal classifiers, and a theory of learning. For clustering, there has historically been little theory; in particular, there has generally (but not always) been no learning. The key point is that clustering has not been grounded on a probabilistic theory. Recently, a clustering theory has been developed in the context of random sets. This paper discusses learning within that context, in particular, k- nearest-neighbor learning of clustering algorithms.
Clustering algorithms for Stokes space modulation format recognition.
Boada, Ricard; Borkowski, Robert; Monroy, Idelfonso Tafur
2015-06-15
Stokes space modulation format recognition (Stokes MFR) is a blind method enabling digital coherent receivers to infer modulation format information directly from a received polarization-division-multiplexed signal. A crucial part of the Stokes MFR is a clustering algorithm, which largely influences the performance of the detection process, particularly at low signal-to-noise ratios. This paper reports on an extensive study of six different clustering algorithms: k-means, expectation maximization, density-based DBSCAN and OPTICS, spectral clustering and maximum likelihood clustering, used for discriminating between dual polarization: BPSK, QPSK, 8-PSK, 8-QAM, and 16-QAM. We determine essential performance metrics for each clustering algorithm and modulation format under test: minimum required signal-to-noise ratio, detection accuracy and algorithm complexity. PMID:26193532
A spectral image clustering algorithm based on ant colony optimization
NASA Astrophysics Data System (ADS)
Ashok, Luca; Messinger, David W.
2012-06-01
Ant Colony Optimization (ACO) is a computational method used for optimization problems. The ACO algorithm uses virtual ants to create candidate solutions that are represented by paths on a mathematical graph. We develop an algorithm using ACO that takes a multispectral image as input and outputs a cluster map denoting a cluster label for each pixel. The algorithm does this through identication of a series of one dimensional manifolds on the spectral data cloud via the ACO approach, and then associates pixels to these paths based on their spectral similarity to the paths. We apply the algorithm to multispectral imagery to divide the pixels into clusters based on their representation by a low dimensional manifold estimated by the best t ant path" through the data cloud. We present results from application of the algorithm to a multispectral Worldview-2 image and show that it produces useful cluster maps.
A systematic comparison of genome-scale clustering algorithms
2012-01-01
Background A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. Methods For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each cluster's agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. Results Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. Conclusions Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted. PMID:22759431
APPROXIMATION ALGORITHMS FOR CLUSTERING TO MINIMIZE THE SUM OF DIAMETERS
Kopp, S.; Mortveit, H.S.; Reidys, S.M.
2000-02-01
We consider the problem of partitioning the nodes of a complete edge weighted graph into {kappa} clusters so as to minimize the sum of the diameters of the clusters. Since the problem is NP-complete, our focus is on the development of good approximation algorithms. When edge weights satisfy the triangle inequality, we present the first approximation algorithm for the problem. The approximation algorithm yields a solution that has no more than 10k clusters such the total diameter of these clusters is within a factor O(log (n/{kappa})) of the optimal value fork clusters, where n is the number of nodes in the complete graph. For any fixed {kappa}, we present an approximation algorithm that produces {kappa} clusters whose total diameter is at most twice the optimal value. When the distances are not required to satisfy the triangle inequality, we show that, unless P = NP, for any {rho} {ge} 1, there is no polynomial time approximation algorithm that can provide a performance guarantee of {rho} even when the number of clusters is fixed at 3. Other results obtained include a polynomial time algorithm for the problem when the underlying graph is a tree with edge weights.
The Enhanced Hoshen-Kopelman Algorithm for Cluster Analysis
NASA Astrophysics Data System (ADS)
Hoshen, Joseph
1997-08-01
In 1976 Hoshen and Kopelman(J. Hoshen and R. Kopelman, Phys. Rev. B, 14, 3438 (1976).) introduced a breakthrough algorithm, known today as the Hoshen-Kopelman algorithm, for cluster analysis. This algorithm revolutionized Monte Carlo cluster calculations in percolation theory as it enables analysis of very large lattices containing 10^11 or more sites. Initially the HK algorithm primary use was in the domain of pure and basic sciences. Later it began finding applications in diverse fields of technology and applied sciences. Example of such applications are two and three dimensional image analysis, composite material modeling, polymers, remote sensing, brain modeling and food processing. While the original HK algorithm provides only cluster size data for only one class of sites, the Enhanced HK (EHK) algorithm, presented in this paper, enables calculations of cluster spatial moments -- characteristics of cluster shapes -- for multiple classes of sites. These enhancements preserve the time and space complexities of the original HK algorithm, such that very large lattices could be still analyzed simultaneously in a single pass through the lattice for cluster sizes, classes and shapes.
A fuzzy clustering algorithm to detect planar and quadric shapes
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa
1992-01-01
In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.
The ordered clustered travelling salesman problem: a hybrid genetic algorithm.
Ahmed, Zakir Hussain
2014-01-01
The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148
A novel spatial clustering algorithm based on Delaunay triangulation
NASA Astrophysics Data System (ADS)
Yang, Xiankun; Cui, Weihong
2008-12-01
Exploratory data analysis is increasingly more necessary as larger spatial data is managed in electro-magnetic media. Spatial clustering is one of the very important spatial data mining techniques. So far, a lot of spatial clustering algorithms have been proposed. In this paper we propose a robust spatial clustering algorithm named SCABDT (Spatial Clustering Algorithm Based on Delaunay Triangulation). SCABDT demonstrates important advantages over the previous works. First, it discovers even arbitrary shape of cluster distribution. Second, in order to execute SCABDT, we do not need to know any priori nature of distribution. Third, like DBSCAN, Experiments show that SCABDT does not require so much CPU processing time. Finally it handles efficiently outliers.
Ultrafast clustering algorithms for metagenomic sequence analysis
Fu, Limin; Niu, Beifang; Wu, Sitao; Wooley, John
2012-01-01
The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters. PMID:22772836
A space-time cluster algorithm for stochastic processes.
Gulbahce, N.
2003-01-01
We introduce a space-time cluster algorithm that will generate histories of stochastic processes. Michael Zimmer introduced a spacetime MC algorithm for stochastic classical dynamics and he applied it to simulate Ising model with Glauber dynamics. Following his steps, we extended Brower and Tamayo's embedded {phi}{sup 4} dynamics to space and time. We believe our algorithm can be applied to more general stochastic systems. Why space-time? To be able to study nonequilibrium systems, we need to know the probability of the 'history' of a nonequilibrium state. Histories are the entire space-time configurations. Cluster algorithms first introduced by SW, are useful to overcome critical slowing down. Brower and Tamayo have mapped continous field variables to Ising spins, and have grown and flipped SW clusters to gain speed. Our algorithm is an extended version of theirs to space and time.
A Fast Implementation of the ISODATA Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2005-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
Implementing Agglomerative Hierarchic Clustering Algorithms for Use in Document Retrieval.
ERIC Educational Resources Information Center
Voorhees, Ellen M.
1986-01-01
Describes a computerized information retrieval system that uses three agglomerative hierarchic clustering algorithms--single link, complete link, and group average link--and explains their implementations. It is noted that these implementations have been used to cluster a collection of 12,000 documents. (LRW)
Efficient Record Linkage Algorithms Using Complete Linkage Clustering
Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar
2016-01-01
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. PMID:27124604
Shao, Jianyin; Tanner, Stephen W; Thompson, Nephi; Cheatham, Thomas E
2007-11-01
Molecular dynamics simulation methods produce trajectories of atomic positions (and optionally velocities and energies) as a function of time and provide a representation of the sampling of a given molecule's energetically accessible conformational ensemble. As simulations on the 10-100 ns time scale become routine, with sampled configurations stored on the picosecond time scale, such trajectories contain large amounts of data. Data-mining techniques, like clustering, provide one means to group and make sense of the information in the trajectory. In this work, several clustering algorithms were implemented, compared, and utilized to understand MD trajectory data. The development of the algorithms into a freely available C code library, and their application to a simple test example of random (or systematically placed) points in a 2D plane (where the pairwise metric is the distance between points) provide a means to understand the relative performance. Eleven different clustering algorithms were developed, ranging from top-down splitting (hierarchical) and bottom-up aggregating (including single-linkage edge joining, centroid-linkage, average-linkage, complete-linkage, centripetal, and centripetal-complete) to various refinement (means, Bayesian, and self-organizing maps) and tree (COBWEB) algorithms. Systematic testing in the context of MD simulation of various DNA systems (including DNA single strands and the interaction of a minor groove binding drug DB226 with a DNA hairpin) allows a more direct assessment of the relative merits of the distinct clustering algorithms. Additionally, means to assess the relative performance and differences between the algorithms, to dynamically select the initial cluster count, and to achieve faster data mining by "sieved clustering" were evaluated. Overall, it was found that there is no one perfect "one size fits all" algorithm for clustering MD trajectories and that the results strongly depend on the choice of atoms for the pairwise comparison. Some algorithms tend to produce homogeneously sized clusters, whereas others have a tendency to produce singleton clusters. Issues related to the choice of a pairwise metric, clustering metrics, which atom selection is used for the comparison, and about the relative performance are discussed. Overall, the best performance was observed with the average-linkage, means, and SOM algorithms. If the cluster count is not known in advance, the hierarchical or average-linkage clustering algorithms are recommended. Although these algorithms perform well, it is important to be aware of the limitations or weaknesses of each algorithm, specifically the high sensitivity to outliers with hierarchical, the tendency to generate homogenously sized clusters with means, and the tendency to produce small or singleton clusters with average-linkage. PMID:26636222
A knowledge-based clustering algorithm driven by Gene Ontology.
Cheng, Jill; Cline, Melissa; Martin, John; Finkelstein, David; Awad, Tarif; Kulp, David; Siani-Rose, Michael A
2004-08-01
We have developed an algorithm for inferring the degree of similarity between genes by using the graph-based structure of Gene Ontology (GO). We applied this knowledge-based similarity metric to a clique-finding algorithm for detecting sets of related genes with biological classifications. We also combined it with an expression-based distance metric to produce a co-cluster analysis, which accentuates genes with both similar expression profiles and similar biological characteristics and identifies gene clusters that are more stable and biologically meaningful. These algorithms are demonstrated in the analysis of MPRO cell differentiation time series experiments. PMID:15468759
A modified density-based clustering algorithm and its implementation
NASA Astrophysics Data System (ADS)
Ban, Zhihua; Liu, Jianguo; Yuan, Lulu; Yang, Hua
2015-12-01
This paper presents an improved density-based clustering algorithm based on the paper of clustering by fast search and find of density peaks. A distance threshold is introduced for the purpose of economizing memory. In order to reduce the probability that two points share the same density value, similarity is utilized to define proximity measure. We have tested the modified algorithm on a large data set, several small data sets and shape data sets. It turns out that the proposed algorithm can obtain acceptable results and can be applied more wildly.
Measuring Constraint-Set Utility for Partitional Clustering Algorithms
NASA Technical Reports Server (NTRS)
Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato
2006-01-01
Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.
A geometric clustering algorithm with applications to structural data.
Xu, Shutan; Zou, Shuxue; Wang, Lincong
2015-05-01
An important feature of structural data, especially those from structural determination and protein-ligand docking programs, is that their distribution could be mostly uniform. Traditional clustering algorithms developed specifically for nonuniformly distributed data may not be adequate for their classification. Here we present a geometric partitional algorithm that could be applied to both uniformly and nonuniformly distributed data. The algorithm is a top-down approach that recursively selects the outliers as the seeds to form new clusters until all the structures within a cluster satisfy a classification criterion. The algorithm has been evaluated on a diverse set of real structural data and six sets of test data. The results show that it is superior to the previous algorithms for the clustering of structural data and is similar to or better than them for the classification of the test data. The algorithm should be especially useful for the identification of the best but minor clusters and for speeding up an iterative process widely used in NMR structure determination. PMID:25517067
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.
He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej
2011-12-01
Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression. PMID:22042156
[Multispectral image compression algorithm based on clustering and wavelet transform].
Liang, Wei; Zeng, Ping; Zhang, Hua; Luo, Xue-Mei
2013-10-01
Aiming at the problem of high time-space complexity and inadequate usage of spectral characteristics of existing multispectral image compression algorithms, an inter-spectrum sparse equivalent representation of multispectral image and its clustering realization ways were studied. Meanwhile, a new multispectral image compression algorithm based on spectral adaptive clustering and wavelet transform was designed. The affinity propagation clustering was utilized to generate inter-spectrum sparse equivalent representation which can remove inter-spectrum redundancy under low complexity, two-dimensional wavelet transform was used to remove spatial redundancy, and set partitioning in hierarchical trees (SPIHT) was used to encode. The quality of reconstruction images was improved by error compensation mechanism. Experimental results show that the proposed approach achieves good performance in time-space complexity, the peak signal-to-noise ratio(PSNR) is significantly higher than that of similar compression algorithms under the same compression ratio, and it is a generic and effective algorithm. PMID:24409728
K-Distributions: A New Algorithm for Clustering Categorical Data
NASA Astrophysics Data System (ADS)
Cai, Zhihua; Wang, Dianhong; Jiang, Liangxiao
Clustering is one of the most important tasks in data mining. The K-means algorithm is the most popular one for achieving this task because of its efficiency. However, it works only on numeric values although data sets in data mining often contain categorical values. Responding to this fact, the K-modes algorithm is presented to extend the K-means algorithm to categorical domains. Unfortunately, it suffers from computing the dissimilarity between each pair of objects and the mode of each cluster. Aiming at addressing these problems confronting K-modes, we present a new algorithm called K-distributions in this paper. We experimentally tested K-distributions using the well known 36 UCI data sets selected by Weka, and compared it to K-modes. The experimental results show that K-distributions significantly outperforms K-modes in term of clustering accuracy and log likelihood.
Diluvian Clustering: A Fast, Effective Algorithm for Clustering Compositional and Other Data.
Ritchie, Nicholas W M
2015-10-01
Diluvian Clustering is an unsupervised grid-based clustering algorithm well suited to interpreting large sets of noisy compositional data. The algorithm is notable for its ability to identify clusters that are either compact or diffuse and clusters that have either a large number or a small number of members. Diluvian Clustering is fundamentally different from most algorithms previously applied to cluster compositional data in that its implementation does not depend upon a metric. The algorithm reduces in two-dimensions to a case for which there is an intuitive, real-world parallel. Furthermore, the algorithm has few tunable parameters and these parameters have intuitive interpretations. By eliminating the dependence on an explicit metric, it is possible to derive reasonable clusters with disparate variances like those in real-world compositional data sets. The algorithm is computationally efficient. While the worst case scales as O(N²) most cases are closer to O(N) where N is the number of discrete data points. On a mid-range 2014 vintage computer, a typical 20,000 particle, 30 element data set can be clustered in a fraction of a second. PMID:26299780
Sampling Within k-Means Algorithm to Cluster Large Datasets
Bejarano, Jeremy; Bose, Koushiki; Brannan, Tyler; Thomas, Anita; Adragni, Kofi; Neerchal, Nagaraj; Ostrouchov, George
2011-08-01
Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.
Note on Ultrametric Hierarchical Clustering Algorithms.
ERIC Educational Resources Information Center
Batagelj, Vladimir
1981-01-01
Milligan presented the conditions that are required for a hierarchical clustering strategy to be monotonic, based on a formula by Lance and Williams. The statement of the conditions is improved and shown to provide necessary and sufficient conditions. (Author/GK)
CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET.
Aadil, Farhan; Bajwa, Khalid Bashir; Khan, Salabat; Chaudary, Nadeem Majeed; Akram, Adeel
2016-01-01
A vehicular ad hoc network (VANET) is a wirelessly connected network of vehicular nodes. A number of techniques, such as message ferrying, data aggregation, and vehicular node clustering aim to improve communication efficiency in VANETs. Cluster heads (CHs), selected in the process of clustering, manage inter-cluster and intra-cluster communication. The lifetime of clusters and number of CHs determines the efficiency of network. In this paper a Clustering algorithm based on Ant Colony Optimization (ACO) for VANETs (CACONET) is proposed. CACONET forms optimized clusters for robust communication. CACONET is compared empirically with state-of-the-art baseline techniques like Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO). Experiments varying the grid size of the network, the transmission range of nodes, and number of nodes in the network were performed to evaluate the comparative effectiveness of these algorithms. For optimized clustering, the parameters considered are the transmission range, direction and speed of the nodes. The results indicate that CACONET significantly outperforms MOPSO and CLPSO. PMID:27149517
CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET
Bajwa, Khalid Bashir; Khan, Salabat; Chaudary, Nadeem Majeed; Akram, Adeel
2016-01-01
A vehicular ad hoc network (VANET) is a wirelessly connected network of vehicular nodes. A number of techniques, such as message ferrying, data aggregation, and vehicular node clustering aim to improve communication efficiency in VANETs. Cluster heads (CHs), selected in the process of clustering, manage inter-cluster and intra-cluster communication. The lifetime of clusters and number of CHs determines the efficiency of network. In this paper a Clustering algorithm based on Ant Colony Optimization (ACO) for VANETs (CACONET) is proposed. CACONET forms optimized clusters for robust communication. CACONET is compared empirically with state-of-the-art baseline techniques like Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO). Experiments varying the grid size of the network, the transmission range of nodes, and number of nodes in the network were performed to evaluate the comparative effectiveness of these algorithms. For optimized clustering, the parameters considered are the transmission range, direction and speed of the nodes. The results indicate that CACONET significantly outperforms MOPSO and CLPSO. PMID:27149517
Cluster algorithms with empahsis on quantum spin systems
Gubernatis, J.E.; Kawashima, Naoki
1995-10-06
The purpose of this lecture is to discuss in detail the generalized approach of Kawashima and Gubernatis for the construction of cluster algorithms. We first present a brief refresher on the Monte Carlo method, describe the Swendsen-Wang algorithm, show how this algorithm follows from the Fortuin-Kastelyn transformation, and re=interpret this transformation in a form which is the basis of the generalized approach. We then derive the essential equations of the generalized approach. This derivation is remarkably simple if done from the viewpoint of probability theory, and the essential assumptions will be clearly stated. These assumptions are implicit in all useful cluster algorithms of which we are aware. They lead to a quite different perspective on cluster algorithms than found in the seminal works and in Ising model applications. Next, we illustrate how the generalized approach leads to a cluster algorithm for world-line quantum Monte Carlo simulations of Heisenberg models with S = 1/2. More succinctly, we also discuss the generalization of the Fortuin- Kasetelyn transformation to higher spin models and illustrate the essential steps for a S = 1 Heisenberg model. Finally, we summarize how to go beyond S = 1 to a general spin, XYZ model.
Functional clustering algorithm for the analysis of dynamic network data
NASA Astrophysics Data System (ADS)
Feldt, S.; Waddell, J.; Hetrick, V. L.; Berke, J. D.; Żochowski, M.
2009-05-01
We formulate a technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.
Personalized PageRank Clustering: A graph clustering algorithm based on random walks
NASA Astrophysics Data System (ADS)
A. Tabrizi, Shayan; Shakery, Azadeh; Asadpour, Masoud; Abbasi, Maziar; Tavallaie, Mohammad Ali
2013-11-01
Graph clustering has been an essential part in many methods and thus its accuracy has a significant effect on many applications. In addition, exponential growth of real-world graphs such as social networks, biological networks and electrical circuits demands clustering algorithms with nearly-linear time and space complexity. In this paper we propose Personalized PageRank Clustering (PPC) that employs the inherent cluster exploratory property of random walks to reveal the clusters of a given graph. We combine random walks and modularity to precisely and efficiently reveal the clusters of a graph. PPC is a top-down algorithm so it can reveal inherent clusters of a graph more accurately than other nearly-linear approaches that are mainly bottom-up. It also gives a hierarchy of clusters that is useful in many applications. PPC has a linear time and space complexity and has been superior to most of the available clustering algorithms on many datasets. Furthermore, its top-down approach makes it a flexible solution for clustering problems with different requirements.
NCUBE - A clustering algorithm based on a discretized data space
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Northouse, R. A.
1974-01-01
Cluster analysis involves the unsupervised grouping of data. The process provides an automatic procedure for generating known training samples for pattern classification. NCUBE, the clustering algorithm presented, is based upon the concept of imposing a gridwork on the data space. The NCUBE computer implementation of this concept provides an easily derived form of piecewise linear discrimination. This piecewise linear discrimination permits the separation of some types of data groups that are not linearly separable.
SPICi: a fast clustering algorithm for large biological networks
Jiang, Peng; Singh, Mona
2010-01-01
Motivation: Clustering algorithms play an important role in the analysis of biological networks, and can be used to uncover functional modules and obtain hints about cellular organization. While most available clustering algorithms work well on biological networks of moderate size, such as the yeast protein physical interaction network, they either fail or are too slow in practice for larger networks, such as functional networks for higher eukaryotes. Since an increasing number of larger biological networks are being determined, the limitations of current clustering approaches curtail the types of biological network analyses that can be performed. Results: We present a fast local network clustering algorithm SPICi. SPICi runs in time O(V log V+E) and space O(E), where V and E are the number of vertices and edges in the network, respectively. We evaluate SPICi's performance on several existing protein interaction networks of varying size, and compare SPICi to nine previous approaches for clustering biological networks. We show that SPICi is typically several orders of magnitude faster than previous approaches and is the only one that can successfully cluster all test networks within very short time. We demonstrate that SPICi has state-of-the-art performance with respect to the quality of the clusters it uncovers, as judged by its ability to recapitulate protein complexes and functional modules. Finally, we demonstrate the power of our fast network clustering algorithm by applying SPICi across hundreds of large context-specific human networks, and identifying modules specific for single conditions. Availability: Source code is available under the GNU Public License at http://compbio.cs.princeton.edu/spici Contact: mona@cs.princeton.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20185405
Particle flow reconstruction based on the directed tree clustering algorithm
Chakraborty, D.; Lima, J. G. R.; McIntosh, R.; Zutshi, V.
2006-10-27
We present the status of particle flow algorithm development at Northern Illinois University. A key element in our approach is the calorimeter-based directed tree clustering algorithm. We have attempted to identify and tackle the essential challenges and analyze the effect of several different approaches to the reconstruction of jet energies and the Z-boson mass. A number of possibilities have been studied, such as analog vs. digital energy measurement, hit density-based clustering and the use of single or multiple energy thresholds. We plan to use this PFA-based reconstruction to compare some of the proposed detector technologies and geometries.
Membership determination of open cluster NGC 188 based on the DBSCAN clustering algorithm
NASA Astrophysics Data System (ADS)
Gao, Xin-Hua
2014-02-01
High-precision proper motions and radial velocities of 1046 stars are used to determine member stars using three-dimensional (3D) kinematics for open cluster NGC 188 based on the density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm. By implementing this algorithm, 472 member stars in the cluster are obtained with 3D kinematics. The color-magnitude diagram (CMD) of the 472 member stars using 3D kinematics shows a well-defined main sequence and a red giant branch, which indicate that the DBSCAN clustering algorithm is very effective for membership determination. The DBSCAN clustering algorithm can effectively select probable member stars in 3D kinematic space without any assumption about the distribution of the cluster or field stars. Analysis results show that the CMD of member stars is significantly clearer than the one based on 2D kinematics, which allows us to better constrain the cluster members and estimate their physical parameters. Using the 472 member stars, the average absolute proper motion and radial velocity are determined to be (PMα, PMδ) = (-2.58 ± 0.22, +0.17 ± 0.18) mas yr-1 and Vr = -42.35 ± 0.05 km s-1, respectively. Our values are in good agreement with values derived by other authors.
Application of K- and fuzzy c-means for color segmentation of thermal infrared breast images.
EtehadTavakol, M; Sadri, S; Ng, E Y K
2010-02-01
Color segmentation of infrared thermal images is an important factor in detecting the tumor region. The cancerous tissue with angiogenesis and inflammation emits temperature pattern different from the healthy one. In this paper, two color segmentation techniques, K-means and fuzzy c-means for color segmentation of infrared (IR) breast images are modeled and compared. Using the K-means algorithm in Matlab, some empty clusters may appear in the results. Fuzzy c-means is preferred because the fuzzy nature of IR breast images helps the fuzzy c-means segmentation to provide more accurate results with no empty cluster. Since breasts with malignant tumors have higher temperature than healthy breasts and even breasts with benign tumors, in this study, we look for detecting the hottest regions of abnormal breasts which are the suspected regions. The effect of IR camera sensitivity on the number of clusters in segmentation is also investigated. When the camera is ultra sensitive the number of clusters being considered may be increased. PMID:20192053
ORCA: The Overdense Red-sequence Cluster Algorithm
NASA Astrophysics Data System (ADS)
Murphy, D. N. A.; Geach, J. E.; Bower, R. G.
2012-03-01
We present a new cluster-detection algorithm designed for the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) survey but with generic application to any multiband data. The method makes no prior assumptions about the properties of clusters other than (i) the similarity in colour of cluster galaxies (the 'red sequence'); and (ii) an enhanced projected surface density. The detector has three main steps: (i) it identifies cluster members by photometrically filtering the input catalogue to isolate galaxies in colour-magnitude space; (ii) a Voronoi diagram identifies regions of high surface density; and (iii) galaxies are grouped into clusters with a Friends-of-Friends technique. Where multiple colours are available, we require systems to exhibit sequences in two colours. In this paper, we present the algorithm and demonstrate it on two data sets. The first is a 7-deg2 sample of the deep Sloan Digital Sky Survey (SDSS) equatorial stripe (Stripe 82), from which we detect 97 clusters with z≤ 0.6. Benefitting from deeper data, we are 100 per cent complete in the maxBCG optically selected cluster catalogue (based on shallower single-epoch SDSS data) and find an additional 78 previously unidentified clusters. The second data set is a mock Medium Deep Survey Pan-STARRS catalogue, based on the Λ cold dark matter (ΛCDM) model and a semi-analytic galaxy formation recipe. Knowledge of galaxy-halo memberships in the mock catalogue allows for the quantification of algorithm performance. We detect 305 mock clusters in haloes with mass >1013 h-1 M⊙ at z≲ 0.6 and determine a spurious detection rate of <1 per cent, consistent with tests on the Stripe 82 catalogue. The detector performs well in the recovery of model ΛCDM clusters. At the median redshift of the catalogue, the algorithm achieves >75 per cent completeness down to halo masses of 1013.4 h-1 M⊙ and recovers >75 per cent of the total stellar mass of clusters in haloes down to 1013.8 h-1 M⊙. A companion paper presents the complete cluster catalogue over the full 270-deg2 Stripe 82 catalogue.
Fuzzy ellipsoidal shell clustering algorithm and detection of elliptical shapes
NASA Astrophysics Data System (ADS)
Dave, Rajesh N.; Patel, Kalpesh J.
1991-02-01
Fuzzyc-Efflpsoidal Shell (FCES) algorithm that utilizes hyper-ellipsoidal-shells as cluster prototypes is proposed. FCES is a generalization of the Fuzzy Shell Clustering (FSC) algorithm. The generalization is achieved by allowing the distances measured through a norm inducing matrix that is symmetric positive definite. In case offixed known norms the extension of FcS to FCS is straightforward. Two different strategies are recommended when the norm is unknown. The first strategy considers use of non-linear least-squared fit approach with fuzzy memberships as weights. The second approach considers norm inducing matrix as a variable of optimization thus making FCES an adaptive norm type algorithm. An adaptive norm theorem is presented. The results of first approach is used to detect ellipses having unequal sizes and orientations in two-dimensional data-sets. Non-linear equations of the FCES algorithm are more complex than those of the FSC algorithm. Numerical issues related to both the FCES algorithm and the FSC algorithm are discussed.
Adaptive clustering algorithm for community detection in complex networks.
Ye, Zhenqing; Hu, Songnian; Yu, Jun
2008-10-01
Community structure is common in various real-world networks; methods or algorithms for detecting such communities in complex networks have attracted great attention in recent years. We introduced a different adaptive clustering algorithm capable of extracting modules from complex networks with considerable accuracy and robustness. In this approach, each node in a network acts as an autonomous agent demonstrating flocking behavior where vertices always travel toward their preferable neighboring groups. An optimal modular structure can emerge from a collection of these active nodes during a self-organization process where vertices constantly regroup. In addition, we show that our algorithm appears advantageous over other competing methods (e.g., the Newman-fast algorithm) through intensive evaluation. The applications in three real-world networks demonstrate the superiority of our algorithm to find communities that are parallel with the appropriate organization in reality. PMID:18999501
Coupled cluster algorithms for networks of shared memory parallel processors
NASA Astrophysics Data System (ADS)
Bentz, Jonathan L.; Olson, Ryan M.; Gordon, Mark S.; Schmidt, Michael W.; Kendall, Ricky A.
2007-05-01
As the popularity of using SMP systems as the building blocks for high performance supercomputers increases, so too increases the need for applications that can utilize the multiple levels of parallelism available in clusters of SMPs. This paper presents a dual-layer distributed algorithm, using both shared-memory and distributed-memory techniques to parallelize a very important algorithm (often called the "gold standard") used in computational chemistry, the single and double excitation coupled cluster method with perturbative triples, i.e. CCSD(T). The algorithm is presented within the framework of the GAMESS [M.W. Schmidt, K.K. Baldridge, J.A. Boatz, S.T. Elbert, M.S. Gordon, J.J. Jensen, S. Koseki, N. Matsunaga, K.A. Nguyen, S. Su, T.L. Windus, M. Dupuis, J.A. Montgomery, General atomic and molecular electronic structure system, J. Comput. Chem. 14 (1993) 1347-1363]. (General Atomic and Molecular Electronic Structure System) program suite and the Distributed Data Interface [M.W. Schmidt, G.D. Fletcher, B.M. Bode, M.S. Gordon, The distributed data interface in GAMESS, Comput. Phys. Comm. 128 (2000) 190]. (DDI), however, the essential features of the algorithm (data distribution, load-balancing and communication overhead) can be applied to more general computational problems. Timing and performance data for our dual-level algorithm is presented on several large-scale clusters of SMPs.
A novel hierarchical clustering algorithm for gene sequences
2012-01-01
Background Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of k-tuples in DNA sequence. Afterwards, a hierarchical procedure is applied to clustering DNA sequences based on the feature vectors. Results The proposed distance measure and clustering method are evaluated by clustering functionally related genes and by phylogenetic analysis. This method is also compared with BlastClust, CD-HIT-EST and some others. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences. Conclusions We introduced a novel clustering algorithm which is based on a new sequence similarity measure. It is effective in classifying DNA sequences with similar biological characteristics and in discovering the relationship among the sequences. PMID:22823405
The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey
Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer, Hans; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U., ICG /North Carolina U. /Chicago U., Astron. Astrophys. Ctr. /Chicago U., EFI /Michigan U. /Fermilab /Princeton U. Observ. /Garching, Max Planck Inst., MPE /Pittsburgh U. /Tokyo U., ICRR /Baltimore, Space Telescope Sci. /Penn State U. /Chicago U. /Stavropol, Astrophys. Observ. /Heidelberg, Max Planck Inst. Astron. /INI, SAO
2005-03-01
We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster. However, if we exclude clusters embedded in complex large-scale environments, we find that the velocity dispersion of the remaining clusters is as good an estimator of M{sub 200} as L{sub r}. The final C4 catalog will contain {approx_equal} 2500 clusters using the full SDSS data set and will represent one of the largest and most homogeneous samples of local clusters.
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Johnson, J. K.
1979-01-01
An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
Fast source optimization by clustering algorithm based on lithography properties
NASA Astrophysics Data System (ADS)
Tawada, Masashi; Hashimoto, Takaki; Sakanushi, Keishi; Nojima, Shigeki; Kotani, Toshiya; Yanagisawa, Masao; Togawa, Nozomu
2015-03-01
Lithography is a technology to make circuit patterns on a wafer. UV light diffracted by a photomask forms optical images on a photoresist. Then, a photoresist is melt by an amount of exposed UV light exceeding the threshold. The UV light diffracted by a photomask through lens exposes the photoresist on the wafer. Its lightness and darkness generate patterns on the photoresist. As the technology node advances, the feature sizes on photoresist becomes much smaller. Diffracted UV light is dispersed on the wafer, and then exposing photoresists has become more difficult. Exposure source optimization, SO in short, techniques for optimizing illumination shape have been studied. Although exposure source has hundreds of grid-points, all of previous works deal with them one by one. Then they consume too much running time and that increases design time extremely. How to reduce the parameters to be optimized in SO is the key to decrease source optimization time. In this paper, we propose a variation-resilient and high-speed cluster-based exposure source optimization algorithm. We focus on image log slope (ILS) and use it for generating clusters. When an optical image formed by a source shape has a small ILS value at an EPE (Edge placement error) evaluation point, dose/focus variation much affects the EPE values. When an optical image formed by a source shape has a large ILS value at an evaluation point, dose/focus variation less affects the EPE value. In our algorithm, we cluster several grid-points with similar ILS values and reduce the number of parameters to be simultaneously optimized in SO. Our clustering algorithm is composed of two STEPs: In STEP 1, we cluster grid-points into four groups based on ILS values of grid-points at each evaluation point. In STEP 2, we generate super clusters from the clusters generated in STEP 1. We consider a set of grid-points in each cluster to be a single light source element. As a result, we can optimize the SO problem very fast. Experimental results demonstrate that our algorithm runs speed-up compared to a conventional algorithm with keeping the EPE values.
Improved Gravitation Field Algorithm and Its Application in Hierarchical Clustering
Zheng, Ming; Sun, Ying; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang
2012-01-01
Background Gravitation field algorithm (GFA) is a new optimization algorithm which is based on an imitation of natural phenomena. GFA can do well both for searching global minimum and multi-minima in computational biology. But GFA needs to be improved for increasing efficiency, and modified for applying to some discrete data problems in system biology. Method An improved GFA called IGFA was proposed in this paper. Two parts were improved in IGFA. The first one is the rule of random division, which is a reasonable strategy and makes running time shorter. The other one is rotation factor, which can improve the accuracy of IGFA. And to apply IGFA to the hierarchical clustering, the initial part and the movement operator were modified. Results Two kinds of experiments were used to test IGFA. And IGFA was applied to hierarchical clustering. The global minimum experiment was used with IGFA, GFA, GA (genetic algorithm) and SA (simulated annealing). Multi-minima experiment was used with IGFA and GFA. The two experiments results were compared with each other and proved the efficiency of IGFA. IGFA is better than GFA both in accuracy and running time. For the hierarchical clustering, IGFA is used to optimize the smallest distance of genes pairs, and the results were compared with GA and SA, singular-linkage clustering, UPGMA. The efficiency of IGFA is proved. PMID:23173043
ABCluster: the artificial bee colony algorithm for cluster global optimization.
Zhang, Jun; Dolg, Michael
2015-10-01
Global optimization of cluster geometries is of fundamental importance in chemistry and an interesting problem in applied mathematics. In this work, we introduce a relatively new swarm intelligence algorithm, i.e. the artificial bee colony (ABC) algorithm proposed in 2005, to this field. It is inspired by the foraging behavior of a bee colony, and only three parameters are needed to control it. We applied it to several potential functions of quite different nature, i.e., the Coulomb-Born-Mayer, Lennard-Jones, Morse, Z and Gupta potentials. The benchmarks reveal that for long-ranged potentials the ABC algorithm is very efficient in locating the global minimum, while for short-ranged ones it is sometimes trapped into a local minimum funnel on a potential energy surface of large clusters. We have released an efficient, user-friendly, and free program "ABCluster" to realize the ABC algorithm. It is a black-box program for non-experts as well as experts and might become a useful tool for chemists to study clusters. PMID:26327507
Areibi, Shawki; Yang, Zhen
2004-01-01
Combining global and local search is a strategy used by many successful hybrid optimization approaches. Memetic Algorithms (MAs) are Evolutionary Algorithms (EAs) that apply some sort of local search to further improve the fitness of individuals in the population. Memetic Algorithms have been shown to be very effective in solving many hard combinatorial optimization problems. This paper provides a forum for identifying and exploring the key issues that affect the design and application of Memetic Algorithms. The approach combines a hierarchical design technique, Genetic Algorithms, constructive techniques and advanced local search to solve VLSI circuit layout in the form of circuit partitioning and placement. Results obtained indicate that Memetic Algorithms based on local search, clustering and good initial solutions improve solution quality on average by 35% for the VLSI circuit partitioning problem and 54% for the VLSI standard cell placement problem. PMID:15355604
clusterMaker: a multi-algorithm clustering plugin for Cytoscape
2011-01-01
Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin clusterMaker provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the clusterMaker plugin. clusterMaker is available via the Cytoscape plugin manager. PMID:22070249
Synchronous Firefly Algorithm for Cluster Head Selection in WSN
Baskaran, Madhusudhanan; Sadagopan, Chitra
2015-01-01
Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC. PMID:26495431
Advanced defect detection algorithm using clustering in ultrasonic NDE
NASA Astrophysics Data System (ADS)
Gongzhang, Rui; Gachagan, Anthony
2016-02-01
A range of materials used in industry exhibit scattering properties which limits ultrasonic NDE. Many algorithms have been proposed to enhance defect detection ability, such as the well-known Split Spectrum Processing (SSP) technique. Scattering noise usually cannot be fully removed and the remaining noise can be easily confused with real feature signals, hence becoming artefacts during the image interpretation stage. This paper presents an advanced algorithm to further reduce the influence of artefacts remaining in A-scan data after processing using a conventional defect detection algorithm. The raw A-scan data can be acquired from either traditional single transducer or phased array configurations. The proposed algorithm uses the concept of unsupervised machine learning to cluster segmental defect signals from pre-processed A-scans into different classes. The distinction and similarity between each class and the ensemble of randomly selected noise segments can be observed by applying a classification algorithm. Each class will then be labelled as `legitimate reflector' or `artefacts' based on this observation and the expected probability of defection (PoD) and probability of false alarm (PFA) determined. To facilitate data collection and validate the proposed algorithm, a 5MHz linear array transducer is used to collect A-scans from both austenitic steel and Inconel samples. Each pulse-echo A-scan is pre-processed using SSP and the subsequent application of the proposed clustering algorithm has provided an additional reduction to PFA while maintaining PoD for both samples compared with SSP results alone.
An improved distance matrix computation algorithm for multicore clusters.
Al-Neama, Mohammed W; Reda, Naglaa M; Ghaleb, Fayed F M
2014-01-01
Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations. DistVect algorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes DistVect1 as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulates DistVect1 vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI. PMID:25013779
Comparison of cluster expansion fitting algorithms for interactions at surfaces
NASA Astrophysics Data System (ADS)
Herder, Laura M.; Bray, Jason M.; Schneider, William F.
2015-10-01
Cluster expansions (CEs) are Ising-type interaction models that are increasingly used to model interaction and ordering phenomena at surfaces, such as the adsorbate-adsorbate interactions that control coverage-dependent adsorption or surface-vacancy interactions that control surface reconstructions. CEs are typically fit to a limited set of data derived from density functional theory (DFT) calculations. The CE fitting process involves iterative selection of DFT data points to include in a fit set and selection of interaction clusters to include in the CE. Here we compare the performance of three CE fitting algorithms-the MIT Ab-initio Phase Stability code (MAPS, the default in ATAT software), a genetic algorithm (GA), and a steepest descent (SD) algorithm-against synthetic data. The synthetic data is encoded in model Hamiltonians of varying complexity motivated by the observed behavior of atomic adsorbates on a face-centered-cubic transition metal close-packed (111) surface. We compare the performance of the leave-one-out cross-validation score against the true fitting error available from knowledge of the hidden CEs. For these systems, SD achieves lowest overall fitting and prediction error independent of the underlying system complexity. SD also most accurately predicts cluster interaction energies without ignoring or introducing extra interactions into the CE. MAPS achieves good results in fewer iterations, while the GA performs least well for these particular problems.
An Improved Distance Matrix Computation Algorithm for Multicore Clusters
Al-Neama, Mohammed W.; Reda, Naglaa M.; Ghaleb, Fayed F. M.
2014-01-01
Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations. DistVect algorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes DistVect1 as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulates DistVect1 vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI. PMID:25013779
ICANP2: Isoenergetic cluster algorithm for NP-complete Problems
NASA Astrophysics Data System (ADS)
Zhu, Zheng; Fang, Chao; Katzgraber, Helmut G.
NP-complete optimization problems with Boolean variables are of fundamental importance in computer science, mathematics and physics. Most notably, the minimization of general spin-glass-like Hamiltonians remains a difficult numerical task. There has been a great interest in designing efficient heuristics to solve these computationally difficult problems. Inspired by the rejection-free isoenergetic cluster algorithm developed for Ising spin glasses, we present a generalized cluster update that can be applied to different NP-complete optimization problems with Boolean variables. The cluster updates allow for a wide-spread sampling of phase space, thus speeding up optimization. By carefully tuning the pseudo-temperature (needed to randomize the configurations) of the problem, we show that the method can efficiently tackle problems on topologies with a large site-percolation threshold. We illustrate the ICANP2 heuristic on paradigmatic optimization problems, such as the satisfiability problem and the vertex cover problem.
An Energy Clustering Algorithm for the PIBETA Calorimeter
NASA Astrophysics Data System (ADS)
Slocum, P. L.; Frlež, E.; Koglin, J. E.; Minehart, R. C.; Počanić, D.; Ritt, S.; Brönnimann, C.; Flügel, T.; Krause, B.; Renker, D.; Lawrence, D. W.; Ritchie, B. G.
1997-10-01
The PIBETA calorimeter is a segmented pure CsI electromagnetic shower detector presently being built at the Paul Scherrer Institute (PSI) for a precise determination of the π^+arrowπ^0e^+ν decay rate. An array comprising 44 of the calorimeter modules, or 0.6π steradians, has been studied extensively in beam. A clustering algorithm for summing ADC values of individual CsI detectors has been developed. The algorithm is designed to include only designated CsI crystals in the energy summing according to their proximity to the reconstructed origin of the shower. Cluster size has been varied to optimize calorimeter energy resolution and event reconstruction efficiency. The algorithm has been tested with both 70 MeV beam positrons and a subset of 70 MeV photons from the reaction π^-parrowπ^0n arrowγγ n, and the results compared. The algorithm has also been applied to the analysis of decay products from the reactions μ^+arrow e^+νν and μ^+arrow e^+ννγ. Results have been compared with a realistic GEANT Monte Carlo simulation.
Multilayer cellular neural network and fuzzy C-mean classifiers: comparison and performance analysis
NASA Astrophysics Data System (ADS)
Trujillo San-Martin, Maite; Hlebarov, Vejen; Sadki, Mustapha
2004-11-01
Neural Networks and Fuzzy systems are considered two of the most important artificial intelligent algorithms which provide classification capabilities obtained through different learning schemas which capture knowledge and process it according to particular rule-based algorithms. These methods are especially suited to exploit the tolerance for uncertainty and vagueness in cognitive reasoning. By applying these methods with some relevant knowledge-based rules extracted using different data analysis tools, it is possible to obtain a robust classification performance for a wide range of applications. This paper will focus on non-destructive testing quality control systems, in particular, the study of metallic structures classification according to the corrosion time using a novel cellular neural network architecture, which will be explained in detail. Additionally, we will compare these results with the ones obtained using the Fuzzy C-means clustering algorithm and analyse both classifiers according to its classification capabilities.
Mammographic images segmentation based on chaotic map clustering algorithm
2014-01-01
Background This work investigates the applicability of a novel clustering approach to the segmentation of mammographic digital images. The chaotic map clustering algorithm is used to group together similar subsets of image pixels resulting in a medically meaningful partition of the mammography. Methods The image is divided into pixels subsets characterized by a set of conveniently chosen features and each of the corresponding points in the feature space is associated to a map. A mutual coupling strength between the maps depending on the associated distance between feature space points is subsequently introduced. On the system of maps, the simulated evolution through chaotic dynamics leads to its natural partitioning, which corresponds to a particular segmentation scheme of the initial mammographic image. Results The system provides a high recognition rate for small mass lesions (about 94% correctly segmented inside the breast) and the reproduction of the shape of regions with denser micro-calcifications in about 2/3 of the cases, while being less effective on identification of larger mass lesions. Conclusions We can summarize our analysis by asserting that due to the particularities of the mammographic images, the chaotic map clustering algorithm should not be used as the sole method of segmentation. It is rather the joint use of this method along with other segmentation techniques that could be successfully used for increasing the segmentation performance and for providing extra information for the subsequent analysis stages such as the classification of the segmented ROI. PMID:24666766
Dynamically Incremental K-means++ Clustering Algorithm Based on Fuzzy Rough Set Theory
NASA Astrophysics Data System (ADS)
Li, Wei; Wang, Rujing; Jia, Xiufang; Jiang, Qing
Being classic K-means++ clustering algorithm only for static data, dynamically incremental K-means++ clustering algorithm (DK-Means++) is presented based on fuzzy rough set theory in this paper. Firstly, in DK-Means++ clustering algorithm, the formula of similar degree is improved by weights computed by using of the important degree of attributes which are reduced on the basis of rough fuzzy set theory. Secondly, new data only need match granular which was clustered by K-means++ algorithm or seldom new data is clustered by classic K-means++ algorithm in global data. In this way, that all data is re-clustered each time in dynamic data set is avoided, so the efficiency of clustering is improved. Throughout our experiments showing, DK-Means++ algorithm can objectively and efficiently deal with clustering problem of dynamically incremental data.
Gravitation field algorithm and its application in gene cluster
2010-01-01
Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA) which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM) of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab) are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA. PMID:20854683
A new detection algorithm for microcalcification clusters in mammographic screening
NASA Astrophysics Data System (ADS)
Xie, Weiying; Ma, Yide; Li, Yunsong
2015-05-01
A novel approach for microcalcification clusters detection is proposed. At the first time, we make a short analysis of mammographic images with microcalcification lesions to confirm these lesions have much greater gray values than normal regions. After summarizing the specific feature of microcalcification clusters in mammographic screening, we make more focus on preprocessing step including eliminating the background, image enhancement and eliminating the pectoral muscle. In detail, Chan-Vese Model is used for eliminating background. Then, we do the application of combining morphology method and edge detection method. After the AND operation and Sobel filter, we use Hough Transform, it can be seen that the result have outperformed for eliminating the pectoral muscle which is approximately the gray of microcalcification. Additionally, the enhancement step is achieved by morphology. We make effort on mammographic image preprocessing to achieve lower computational complexity. As well known, it is difficult to robustly achieve mammograms analysis due to low contrast between normal and lesion tissues, there are also much noise in such images. After a serious preprocessing algorithm, a method based on blob detection is performed to microcalcification clusters according their specific features. The proposed algorithm has employed Laplace operator to improve Difference of Gaussians (DoG) function in terms of low contrast images. A preliminary evaluation of the proposed method performs on a known public database namely MIAS, rather than synthetic images. The comparison experiments and Cohen's kappa coefficients all demonstrate that our proposed approach can potentially obtain better microcalcification clusters detection results in terms of accuracy, sensitivity and specificity.
A decentralized fuzzy C-means-based energy-efficient routing protocol for wireless sensor networks.
Alia, Osama Moh'd
2014-01-01
Energy conservation in wireless sensor networks (WSNs) is a vital consideration when designing wireless networking protocols. In this paper, we propose a Decentralized Fuzzy Clustering Protocol, named DCFP, which minimizes total network energy dissipation to promote maximum network lifetime. The process of constructing the infrastructure for a given WSN is performed only once at the beginning of the protocol at a base station, which remains unchanged throughout the network's lifetime. In this initial construction step, a fuzzy C-means algorithm is adopted to allocate sensor nodes into their most appropriate clusters. Subsequently, the protocol runs its rounds where each round is divided into a CH-Election phase and a Data Transmission phase. In the CH-Election phase, the election of new cluster heads is done locally in each cluster where a new multicriteria objective function is proposed to enhance the quality of elected cluster heads. In the Data Transmission phase, the sensing and data transmission from each sensor node to their respective cluster head is performed and cluster heads in turn aggregate and send the sensed data to the base station. Simulation results demonstrate that the proposed protocol improves network lifetime, data delivery, and energy consumption compared to other well-known energy-efficient protocols. PMID:25162060
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1992-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.
A fast clustering algorithm for data with a few labeled instances.
Yang, Jinfeng; Xiao, Yong; Wang, Jiabing; Ma, Qianli; Shen, Yanhua
2015-01-01
The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality. PMID:25861252
Thermodynamic Casimir effect in films: the exchange cluster algorithm.
Hasenbusch, Martin
2015-02-01
We study the thermodynamic Casimir force for films with various types of boundary conditions and the bulk universality class of the three-dimensional Ising model. To this end, we perform Monte Carlo simulations of the improved Blume-Capel model on the simple cubic lattice. In particular, we employ the exchange or geometric cluster cluster algorithm [Heringa and Blöte, Phys. Rev. E 57, 4976 (1998)]. In a previous work, we demonstrated that this algorithm allows us to compute the thermodynamic Casimir force for the plate-sphere geometry efficiently. It turns out that also for the film geometry a substantial reduction of the statistical error can achieved. Concerning physics, we focus on (O,O) boundary conditions, where O denotes the ordinary surface transition. These are implemented by free boundary conditions on both sides of the film. Films with such boundary conditions undergo a phase transition in the universality class of the two-dimensional Ising model. We determine the inverse transition temperature for a large range of thicknesses L(0) of the film and study the scaling of this temperature with L(0). In the neighborhood of the transition, the thermodynamic Casimir force is affected by finite size effects, where finite size refers to a finite transversal extension L of the film. We demonstrate that these finite size effects can be computed by using the universal finite size scaling function of the free energy of the two-dimensional Ising model. PMID:25768461
jClustering, an Open Framework for the Development of 4D Clustering Algorithms
Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J.
2013-01-01
We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913
Textural defect detect using a revised ant colony clustering algorithm
NASA Astrophysics Data System (ADS)
Zou, Chao; Xiao, Li; Wang, Bingwen
2007-11-01
We propose a totally novel method based on a revised ant colony clustering algorithm (ACCA) to explore the topic of textural defect detection. In this algorithm, our efforts are mainly made on the definition of local irregularity measurement and the implementation of the revised ACCA. The local irregular measurement defined evaluates the local textural inconsistency of each pixel against their mini-environment. In our revised ACCA, the behaviors of each ant are divided into two steps: release pheromone and act. The quantity of pheromone released is proportional to the irregularity measurement; the actions of the ants to act next are chosen independently of each other in a stochastic way according to some evaluated heuristic knowledge. The independency of ants implies the inherent parallel computation architecture of this algorithm. We apply the proposed method in some typical textural images with defects. From the series of pheromone distribution map (PDM), it can be clearly seen that the pheromone distribution approaches the textual defects gradually. By some post-processing, the final distribution of pheromone can demonstrate the shape and area of the defects well.
GX-Means: A model-based divide and merge algorithm for geospatial image clustering
Vatsavai, Raju; Symons, Christopher T; Chandola, Varun; Jun, Goo
2011-01-01
One of the practical issues in clustering is the specification of the appropriate number of clusters, which is not obvious when analyzing geospatial datasets, partly because they are huge (both in size and spatial extent) and high dimensional. In this paper we present a computationally efficient model-based split and merge clustering algorithm that incrementally finds model parameters and the number of clusters. Additionally, we attempt to provide insights into this problem and other data mining challenges that are encountered when clustering geospatial data. The basic algorithm we present is similar to the G-means and X-means algorithms; however, our proposed approach avoids certain limitations of these well-known clustering algorithms that are pertinent when dealing with geospatial data. We compare the performance of our approach with the G-means and X-means algorithms. Experimental evaluation on simulated data and on multispectral and hyperspectral remotely sensed image data demonstrates the effectiveness of our algorithm.
NASA Astrophysics Data System (ADS)
Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
ERIC Educational Resources Information Center
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
User-Based Document Clustering by Redescribing Subject Descriptions with a Genetic Algorithm.
ERIC Educational Resources Information Center
Gordon, Michael D.
1991-01-01
Discussion of clustering of documents and queries in information retrieval systems focuses on the use of a genetic algorithm to adapt subject descriptions so that documents become more effective in matching relevant queries. Various types of clustering are explained, and simulation experiments used to test the genetic algorithm are described. (27…
Security clustering algorithm based on reputation in hierarchical peer-to-peer network
NASA Astrophysics Data System (ADS)
Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji
2013-03-01
For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.
Global optimization of clusters of rigid molecules using the artificial bee colony algorithm.
Zhang, Jun; Dolg, Michael
2016-01-28
The global optimization of molecular clusters is an important topic encountered in many fields of chemistry. In our previous work (Phys. Chem. Chem. Phys., 2015, 17, 24173), we successfully applied the recently introduced artificial bee colony (ABC) algorithm to the global optimization of atomic clusters and introduced the corresponding software "ABCluster". In the present work, ABCluster was extended to the optimization of clusters of rigid molecules. Here "rigid" means that all internal degrees of freedom of the constituent molecules are frozen. The algorithm was benchmarked by TIP4P water clusters (H2O)N (N ≤ 20), for which all global minima were successfully located. It was further applied to various clusters of different chemical nature: 10 microhydration clusters, 4 methanol microsolvation clusters, 4 nonpolar clusters and 2 ion-aromatic clusters. In all the cases we obtained results consistent with previous experimental or theoretical studies. PMID:26738568
NASA Technical Reports Server (NTRS)
Lambeck, P. F.; Rice, D. P.
1976-01-01
Signature extension is intended to increase the space-time range over which a set of training statistics can be used to classify data without significant loss of recognition accuracy. A first cluster matching algorithm MASC (Multiplicative and Additive Signature Correction) was developed at the Environmental Research Institute of Michigan to test the concept of using associations between training and recognition area cluster statistics to define an average signature transformation. A more recent signature extension module CROP-A (Cluster Regression Ordered on Principal Axis) has shown evidence of making significant associations between training and recognition area cluster statistics, with the clusters to be matched being selected automatically by the algorithm.
A Formal Algorithm for Verifying the Validity of Clustering Results Based on Model Checking
Huang, Shaobin; Cheng, Yuan; Lang, Dapeng; Chi, Ronghua; Liu, Guofeng
2014-01-01
The limitations in general methods to evaluate clustering will remain difficult to overcome if verifying the clustering validity continues to be based on clustering results and evaluation index values. This study focuses on a clustering process to analyze crisp clustering validity. First, we define the properties that must be satisfied by valid clustering processes and model clustering processes based on program graphs and transition systems. We then recast the analysis of clustering validity as the problem of verifying whether the model of clustering processes satisfies the specified properties with model checking. That is, we try to build a bridge between clustering and model checking. Experiments on several datasets indicate the effectiveness and suitability of our algorithms. Compared with traditional evaluation indices, our formal method can not only indicate whether the clustering results are valid but, in the case the results are invalid, can also detect the objects that have led to the invalidity. PMID:24608823
Multiresolution mean shift clustering algorithm for shape interpolation.
Chu, Hung-Kuo; Lee, Tong-Yee
2009-01-01
In this paper, we solve the problem of 3D shape interpolation with significant pose variation. For an ideal 3D shape interpolation, especially the articulated model, the shape should follow the movement of the underlying articulated structure and be transformed in a way that is as rigid as possible. Given input shapes with compatible connectivity, we propose a novel multiresolution mean shift (MMS) clustering algorithm to automatically extract their near-rigid components. Then, by building the hierarchical relationship among extracted components, we compute a common articulated structure for these input shapes. With the aid of this articulated structure, we solve the shape interpolation by combining 1) a global pose interpolation of near-rigid components from the source shape to the target shape with 2) a local gradient field interpolation for each pair of components, followed by solving a Poisson equation in order to reconstruct an interpolated shape. As a result, an aesthetically pleasing shape interpolation can be generated, with even the poses of shapes varying significantly. In contrast to a recent state-of-the-art work, the proposed approach can achieve comparable or even better results and have better computational efficiency as well. PMID:19590110
A heart disease recognition embedded system with fuzzy cluster algorithm.
de Carvalho, Helton Hugo; Moreno, Robson Luiz; Pimenta, Tales Cleber; Crepaldi, Paulo C; Cintra, Evaldo
2013-06-01
This article presents the viability analysis and the development of heart disease identification embedded system. It offers a time reduction on electrocardiogram - ECG signal processing by reducing the amount of data samples, without any significant loss. The goal of the developed system is the analysis of heart signals. The ECG signals are applied into the system that performs an initial filtering, and then uses a Gustafson-Kessel fuzzy clustering algorithm for the signal classification and correlation. The classification indicated common heart diseases such as angina, myocardial infarction and coronary artery diseases. The system uses the European electrocardiogram ST-T Database (EDB) as a reference for tests and evaluation. The results prove the system can perform the heart disease detection on a data set reduced from 213 to just 20 samples, thus providing a reduction to just 9.4% of the original set, while maintaining the same effectiveness. This system is validated in a Xilinx Spartan(®)-3A FPGA. The field programmable gate array (FPGA) implemented a Xilinx Microblaze(®) Soft-Core Processor running at a 50MHz clock rate. PMID:23394802
Using Clustering Algorithms to Identify Brown Dwarf Characteristics
NASA Astrophysics Data System (ADS)
Choban, Caleb
2016-06-01
Brown dwarfs are stars that are not massive enough to sustain core hydrogen fusion, and thus fade and cool over time. The molecular composition of brown dwarf atmospheres can be determined by observing absorption features in their infrared spectrum, which can be quantified using spectral indices. Comparing these indices to one another, we can determine what kind of brown dwarf it is, and if it is young or metal-poor. We explored a new method for identifying these subgroups through the expectation-maximization machine learning clustering algorithm, which provides a quantitative and statistical way of identifying index pairs which separate rare populations. We specifically quantified two statistics, completeness and concentration, to identify the best index pairs. Starting with a training set, we defined selection regions for young, metal-poor and binary brown dwarfs, and tested these on a large sample of L dwarfs. We present the results of this analysis, and demonstrate that new objects in these classes can be found through these methods.
NASA Astrophysics Data System (ADS)
Morales-Esteban, Antonio; Martínez-Álvarez, Francisco; Scitovski, Sanja; Scitovski, Rudolf
2014-12-01
In this paper we construct an efficient adaptive Mahalanobis k-means algorithm. In addition, we propose a new efficient algorithm to search for a globally optimal partition obtained by using the adoptive Mahalanobis distance-like function. The algorithm is a generalization of the previously proposed incremental algorithm (Scitovski and Scitovski, 2013). It successively finds optimal partitions with k = 2 , 3 , … clusters. Therefore, it can also be used for the estimation of the most appropriate number of clusters in a partition by using various validity indexes. The algorithm has been applied to the seismic catalogues of Croatia and the Iberian Peninsula. Both regions are characterized by a moderate seismic activity. One of the main advantages of the algorithm is its ability to discover not only circular but also elliptical shapes, whose geometry fits the faults better. Three seismogenic zonings are proposed for Croatia and two for the Iberian Peninsula and adjacent areas, according to the clusters discovered by the algorithm.
Clustering performance comparison using K-means and expectation maximization algorithms
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-01-01
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K-means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K-means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results. PMID:26019610
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect. PMID:25435862
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect. PMID:25435862
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
C-element: A New Clustering Algorithm to Find High Quality Functional Modules in PPI Networks
Ghasemi, Mahdieh; Rahgozar, Maseud; Bidkhori, Gholamreza; Masoudi-Nejad, Ali
2013-01-01
Graph clustering algorithms are widely used in the analysis of biological networks. Extracting functional modules in protein-protein interaction (PPI) networks is one such use. Most clustering algorithms whose focuses are on finding functional modules try either to find a clique like sub networks or to grow clusters starting from vertices with high degrees as seeds. These algorithms do not make any difference between a biological network and any other networks. In the current research, we present a new procedure to find functional modules in PPI networks. Our main idea is to model a biological concept and to use this concept for finding good functional modules in PPI networks. In order to evaluate the quality of the obtained clusters, we compared the results of our algorithm with those of some other widely used clustering algorithms on three high throughput PPI networks from Sacchromyces Cerevisiae, Homo sapiens and Caenorhabditis elegans as well as on some tissue specific networks. Gene Ontology (GO) analyses were used to compare the results of different algorithms. Each algorithm's result was then compared with GO-term derived functional modules. We also analyzed the effect of using tissue specific networks on the quality of the obtained clusters. The experimental results indicate that the new algorithm outperforms most of the others, and this improvement is more significant when tissue specific networks are used. PMID:24039752
C-element: a new clustering algorithm to find high quality functional modules in PPI networks.
Ghasemi, Mahdieh; Rahgozar, Maseud; Bidkhori, Gholamreza; Masoudi-Nejad, Ali
2013-01-01
Graph clustering algorithms are widely used in the analysis of biological networks. Extracting functional modules in protein-protein interaction (PPI) networks is one such use. Most clustering algorithms whose focuses are on finding functional modules try either to find a clique like sub networks or to grow clusters starting from vertices with high degrees as seeds. These algorithms do not make any difference between a biological network and any other networks. In the current research, we present a new procedure to find functional modules in PPI networks. Our main idea is to model a biological concept and to use this concept for finding good functional modules in PPI networks. In order to evaluate the quality of the obtained clusters, we compared the results of our algorithm with those of some other widely used clustering algorithms on three high throughput PPI networks from Sacchromyces Cerevisiae, Homo sapiens and Caenorhabditis elegans as well as on some tissue specific networks. Gene Ontology (GO) analyses were used to compare the results of different algorithms. Each algorithm's result was then compared with GO-term derived functional modules. We also analyzed the effect of using tissue specific networks on the quality of the obtained clusters. The experimental results indicate that the new algorithm outperforms most of the others, and this improvement is more significant when tissue specific networks are used. PMID:24039752
A Special Local Clustering Algorithm for Identifying the Genes Associated With Alzheimer’s Disease
Pang, Chao-Yang; Hu, Wei; Hu, Ben-Qiong; Shi, Ying; Vanderburg, Charles R.; Rogers, Jack T.
2010-01-01
Clustering is the grouping of similar objects into a class. Local clustering feature refers to the phenomenon whereby one group of data is separated from another, and the data from these different groups are clustered locally. A compact class is defined as one cluster in which all similar elements cluster tightly within the cluster. Herein, the essence of the local clustering feature, revealed by mathematical manipulation, results in a novel clustering algorithm termed as the special local clustering (SLC) algorithm that was used to process gene microarray data related to Alzheimer’s disease (AD). SLC algorithm was able to group together genes with similar expression patterns and identify significantly varied gene expression values as isolated points. If a gene belongs to a compact class in control data and appears as an isolated point in incipient, moderate and/or severe AD gene microarray data, this gene is possibly associated with AD. Application of a clustering algorithm in disease-associated gene identification such as in AD is rarely reported. PMID:20089478
Deb, Suash; Yang, Xin-She
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
An adaptive spatial clustering algorithm based on the minimum spanning tree-like
NASA Astrophysics Data System (ADS)
Deng, Min; Liu, Qiliang; Li, Guangqiang; Cheng, Tao
2009-10-01
Spatial clustering is an important means for spatial data mining and spatial analysis, and it can be used to discover the potential rules and outliers among the spatial data. Most existing spatial clustering methods cannot deal with the uneven density of the data and usually require predefined parameters which are hard to justify. In order to overcome such limitations, we firstly propose the concept of edge variation factor based upon the definition of distance variation among the entities in the spatial neighborhood. Then, an approach is presented to construct the minimum spanning tree-like (MST-L). Further, an adaptive MST-L based spatial clustering algorithm (AMSTLSC) is developed in this paper. The spatial clustering algorithm only involves the setting of the threshold of edge variation factor as an input parameter, which is easily made with the support of little priori information. Through this parameter, a series of MST-L can be automatically generated from the high-density region to the low-density one, where each MST-L represents a cluster. As a result, the algorithm proposed in this paper can adapt to the change of local density among spatial points. This property is also called the adaptiveness. Finally, two tests are implemented to demonstrate that the AMSTLSC algorithm is very robust and suitable to find the clusters with different shapes. Especially the algorithm has good adaptiveness. A comparative test is made to further prove the AMSTLSC algorithm better than classic DBSCAN algorithm.
A scalable and practical one-pass clustering algorithm for recommender system
NASA Astrophysics Data System (ADS)
Khalid, Asra; Ghazanfar, Mustansar Ali; Azam, Awais; Alahmari, Saad Ali
2015-12-01
KMeans clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple, fast, and accurate. We show empirically that the proposed algorithm outperforms K-Means in terms of recommendation and training time while maintaining a good level of accuracy.
Chinese Text Clustering Algorithm Based k-means
NASA Astrophysics Data System (ADS)
Yao, Mingyu; Pi, Dechang; Cong, Xiangxiang
Text clustering is an important means and method in text mining. The process of Chinese text clustering based on k-means was emphasized, we found that new center of a cluster was easily effected by isolated text after some experiments. Average similarity of one cluster was used as a parameter, and multiplied it with a modulus between 0.75 and 1.25 to get the similarity threshold value, the texts whose similarity with original cluster center was greater than or equal to the threshold value ware collected as a candidate collection, then updated the cluster center with center of candidate collection. The experiments show that improved method averagely increased purity and F value about 10 percent over the original method.
Doostparast Torshizi, Abolfazl; Fazel Zarandi, Mohammad Hossein
2015-09-01
This paper considers microarray gene expression data clustering using a novel two stage meta-heuristic algorithm based on the concept of α-planes in general type-2 fuzzy sets. The main aim of this research is to present a powerful data clustering approach capable of dealing with highly uncertain environments. In this regard, first, a new objective function using α-planes for general type-2 fuzzy c-means clustering algorithm is represented. Then, based on the philosophy of the meta-heuristic optimization framework 'Simulated Annealing', a two stage optimization algorithm is proposed. The first stage of the proposed approach is devoted to the annealing process accompanied by its proposed perturbation mechanisms. After termination of the first stage, its output is inserted to the second stage where it is checked with other possible local optima through a heuristic algorithm. The output of this stage is then re-entered to the first stage until no better solution is obtained. The proposed approach has been evaluated using several synthesized datasets and three microarray gene expression datasets. Extensive experiments demonstrate the capabilities of the proposed approach compared with some of the state-of-the-art techniques in the literature. PMID:25035233
NASA Technical Reports Server (NTRS)
Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William
2006-01-01
We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.
A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
A novel artificial bee colony based clustering algorithm for categorical data.
Ji, Jinchao; Pang, Wei; Zheng, Yanlin; Wang, Zhe; Ma, Zhiqiang
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
NASA Astrophysics Data System (ADS)
Sun, Xu; Yang, Lina; Gao, Lianru; Zhang, Bing; Li, Shanshan; Li, Jun
2015-01-01
Center-oriented hyperspectral image clustering methods have been widely applied to hyperspectral remote sensing image processing; however, the drawbacks are obvious, including the over-simplicity of computing models and underutilized spatial information. In recent years, some studies have been conducted trying to improve this situation. We introduce the artificial bee colony (ABC) and Markov random field (MRF) algorithms to propose an ABC-MRF-cluster model to solve the problems mentioned above. In this model, a typical ABC algorithm framework is adopted in which cluster centers and iteration conditional model algorithm's results are considered as feasible solutions and objective functions separately, and MRF is modified to be capable of dealing with the clustering problem. Finally, four datasets and two indices are used to show that the application of ABC-cluster and ABC-MRF-cluster methods could help to obtain better image accuracy than conventional methods. Specifically, the ABC-cluster method is superior when used for a higher power of spectral discrimination, whereas the ABC-MRF-cluster method can provide better results when used for an adjusted random index. In experiments on simulated images with different signal-to-noise ratios, ABC-cluster and ABC-MRF-cluster showed good stability.
NASA Astrophysics Data System (ADS)
Zhang, Xian-Kun; Tian, Xue; Li, Ya-Nan; Song, Chen
2014-08-01
The label propagation algorithm (LPA) is a graph-based semi-supervised learning algorithm, which can predict the information of unlabeled nodes by a few of labeled nodes. It is a community detection method in the field of complex networks. This algorithm is easy to implement with low complexity and the effect is remarkable. It is widely applied in various fields. However, the randomness of the label propagation leads to the poor robustness of the algorithm, and the classification result is unstable. This paper proposes a LPA based on edge clustering coefficient. The node in the network selects a neighbor node whose edge clustering coefficient is the highest to update the label of node rather than a random neighbor node, so that we can effectively restrain the random spread of the label. The experimental results show that the LPA based on edge clustering coefficient has made improvement in the stability and accuracy of the algorithm.
A Speed-Up Hierarchical Compact Clustering Algorithm for Dynamic Document Collections
NASA Astrophysics Data System (ADS)
Gil-Garca, Reynaldo; Pons-Porrata, Aurora
In this paper, a speed-up version of the Dynamic Hierarchical Compact (DHC) algorithm is presented. Our approach profits from the cluster hierarchy already built to reduce the number of calculated similarities. The experimental results on several benchmark text collections show that the proposed method is significantly faster than DHC while achieving approximately the same clustering quality.
A new clustering algorithm applicable to multispectral and polarimetric SAR images
NASA Technical Reports Server (NTRS)
Wong, Yiu-Fai; Posner, Edward C.
1993-01-01
We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.
Rank-based spatial clustering: an algorithm for rapid outbreak detection
Tsui, Fu-Chiang
2011-01-01
Objective Public health surveillance requires outbreak detection algorithms with computational efficiency sufficient to handle the increasing volume of disease surveillance data. In response to this need, the authors propose a spatial clustering algorithm, rank-based spatial clustering (RSC), that detects rapidly infectious but non-contagious disease outbreaks. Design The authors compared the outbreak-detection performance of RSC with that of three well established algorithms—the wavelet anomaly detector (WAD), the spatial scan statistic (KSS), and the Bayesian spatial scan statistic (BSS)—using real disease surveillance data on to which they superimposed simulated disease outbreaks. Measurements The following outbreak-detection performance metrics were measured: receiver operating characteristic curve, activity monitoring operating curve curve, cluster positive predictive value, cluster sensitivity, and algorithm run time. Results RSC was computationally efficient. It outperformed the other two spatial algorithms in terms of detection timeliness, and outbreak localization. RSC also had overall better timeliness than the time-series algorithm WAD at low false alarm rates. Conclusion RSC is an ideal algorithm for analyzing large datasets when the application of other spatial algorithms is not practical. It also allows timely investigation for public health practitioners by providing early detection and well-localized outbreak clusters. PMID:21486881
NASA Astrophysics Data System (ADS)
Huang, Mingqiang; Hao, Liqing; Guo, Xiaoyong; Hu, Changjin; Gu, Xuejun; Zhao, Weixiong; Wang, Zhenya; Fang, Li; Zhang, Weijun
2013-01-01
Experiments for formation of secondary organic aerosol (SOA) from photooxidation of 1,3,5-trimethylbenzene in the CH3ONO/NO/air mixture were carried out in the laboratory chamber. The size and chemical composition of the resultant individual particles were measured in real-time by an aerosol laser time of flight mass spectrometer (ALTOFMS) recently designed in our group. We also developed Fuzzy C-Means (FCM) algorithm to classify the mass spectra of large numbers of SOA particles. The study first started with mixed particles generated from the standards benzaldehyde, phenol, benzoic acid, and nitrobenzene solutions to test the feasibility of application of the FCM. The FCM was then used to extract out potential aerosol classes in the chamber experiments. The results demonstrate that FCM allowed a clear identification of ten distinct chemical particle classes in this study, namely, 3,5-dimethylbenzoic acid, 3,5-dimethylbenzaldehyde, 2,4,6-trimethyl-5-nitrophenol, 2-methyl-4-oxo-2-pentenal, 2,4,6-trimethylphenol, 3,5-dimethyl-2-furanone, glyoxal, and high-molecular-weight (HMW) components. Compared to offline method such as gas chromatography-mass spectrometry (GC-MS) measurement, the real-time ALTOFMS detection approach coupled with the FCM data processing algorithm can make cluster analysis of SOA successfully and provide more information of products. Thus ALTOFMS is a useful tool to reveal the formation and transformation processes of SOA particles in smog chambers.
Apeltsin, Leonard; Morris, John H.; Babbitt, Patricia C.; Ferrin, Thomas E.
2011-01-01
Motivation: Clustering protein sequence data into functionally specific families is a difficult but important problem in biological research. One useful approach for tackling this problem involves representing the sequence dataset as a protein similarity network, and afterwards clustering the network using advanced graph analysis techniques. Although a multitude of such network clustering algorithms have been developed over the past few years, comparing algorithms is often difficult because performance is affected by the specifics of network construction. We investigate an important aspect of network construction used in analyzing protein superfamilies and present a heuristic approach for improving the performance of several algorithms. Results: We analyzed how the performance of network clustering algorithms relates to thresholding the network prior to clustering. Our results, over four different datasets, show how for each input dataset there exists an optimal threshold range over which an algorithm generates its most accurate clustering output. Our results further show how the optimal threshold range correlates with the shape of the edge weight distribution for the input similarity network. We used this correlation to develop an automated threshold selection heuristic in order to most optimally filter a similarity network prior to clustering. This heuristic allows researchers to process their protein datasets with runtime efficient network clustering algorithms without sacrificing the clustering accuracy of the final results. Availability: Python code for implementing the automated threshold selection heuristic, together with the datasets used in our analysis, are available at http://www.rbvi.ucsf.edu/Research/cytoscape/threshold_scripts.zip. Contact: tef@cgl.ucsf.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21118823
A vector reconstruction based clustering algorithm particularly for large-scale text collection.
Liu, Ming; Wu, Chong; Chen, Lei
2015-03-01
Along with the fast evolvement of internet technology, internet users have to face the large amount of textual data every day. Apparently, organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection, which mainly attributes to the high-dimensional vector space and semantic similarity among texts. To effectively and efficiently cluster large-scale text collection, this paper puts forward a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster's representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature's weight is fine-tuned by iterative process similar to self-organizing-mapping (SOM) algorithm. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster's representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high-quality performances on both small-scale and large-scale text collections. PMID:25539500
CHRONICLE: A Two-Stage Density-Based Clustering Algorithm for Dynamic Networks
NASA Astrophysics Data System (ADS)
Kim, Min-Soo; Han, Jiawei
Information networks, such as social networks and that extracted from bibliographic data, are changing dynamically over time. It is crucial to discover time-evolving communities in dynamic networks. In this paper, we study the problem of finding time-evolving communities such that each community freely forms, evolves, and dissolves for any time period. Although the previous t-partite graph based methods are quite effective for discovering such communities from large-scale dynamic networks, they have some weak points such as finding only stable clusters of single path type and not being scalable w.r.t. the time period. We propose CHRONICLE, an efficient clustering algorithm that discovers not only clusters of single path type but also clusters of path group type. In order to find clusters of both types and also control the dynamicity of clusters, CHRONICLE performs the two-stage density-based clustering, which performs the 2nd-stage density-based clustering for the t-partite graph constructed from the 1st-stage density-based clustering result for each timestamp network. For a given data set, CHRONICLE finds all clusters in a fixed time by using a fixed amount of memory, regardless of the number of clusters and the length of clusters. Experimental results using real data sets show that CHRONICLE finds a wider range of clusters in a shorter time with a much smaller amount of memory than the previous method.
NASA Astrophysics Data System (ADS)
Gatos, Ilias; Tsantis, Stavros; Skouroliakou, Aikaterini; Theotokas, Ioannis; Zoumpoulis, Pavlos S.; Kagadis, George C.
2015-09-01
The aim of the present study is to determine an optimal elasticity cut-off value for discriminating Healthy from Pathological fibrotic patients by means of Fuzzy C-Means automatic segmentation and maximum participation cluster mean value employment in Shear Wave Elastography (SWE) images. The clinical dataset comprised 32 subjects (16 Healthy and 16 histological or Fibroscan verified Chronic Liver Disease). An experienced Radiologist performed SWE measurement placing a region of interest (ROI) on each subject's right liver lobe providing a SWE image for each patient. Subsequently Fuzzy C-Means clustering was performed on every SWE image utilizing 5 clusters. Mean Stiffness value and pixels number of each cluster were calculated. The mean stiffness value feature of the cluster with maximum pixels number was then fed as input for ROC analysis. The selected Mean Stiffness value feature an Area Under the Curve (AUC) of 0.8633 with Optimum Cut-off value of 7.5 kPa with sensitivity and specificity values of 0.8438 and 0.875 and balanced accuracy of 0.8594. Examiner's classification measurements exhibited sensitivity, specificity and balanced accuracy value of 0.8125 with 7.1 kPa cutoff value. A new promising automatic algorithm was implemented with more objective criteria of defining optimum elasticity cut-off values for discriminating fibrosis stages for SWE. More subjects are needed in order to define if this algorithm is an objective tool to outperform manual ROI selection.
Ju, Chunhua
2013-01-01
Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525
An improved clustering algorithm of tunnel monitoring data for cloud computing.
Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing
2014-01-01
With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network
Vimalarani, C.; Subramanian, R.; Sivanandam, S. N.
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption. PMID:26881273
A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966
A hybrid algorithm for clustering of time series data based on affinity search technique.
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network.
Vimalarani, C; Subramanian, R; Sivanandam, S N
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption. PMID:26881273
NASA Astrophysics Data System (ADS)
Brenden, T. O.; Clark, R. D.; Wiley, M. J.; Seelbach, P. W.; Wang, L.
2005-05-01
Remote sensing and geographic information systems have made it possible to attribute variables for streams at increasingly detailed resolutions (e.g., individual river reaches). Nevertheless, management decisions still must be made at large scales because land and stream managers typically lack sufficient resources to manage on an individual reach basis. Managers thus require a method for identifying stream management units that are ecologically similar and that can be expected to respond similarly to management decisions. We have developed a spatially-constrained clustering algorithm that can merge neighboring river reaches with similar ecological characteristics into larger management units. The clustering algorithm is based on the Cluster Affinity Search Technique (CAST), which was developed for clustering gene expression data. Inputs to the clustering algorithm are the neighbor relationships of the reaches that comprise the digital river network, the ecological attributes of the reaches, and an affinity value, which identifies the minimum similarity for merging river reaches. In this presentation, we describe the clustering algorithm in greater detail and contrast its use with other methods (expert opinion, classification approach, regular clustering) for identifying management units using several Michigan watersheds as a backdrop.
An approximation polynomial-time algorithm for a sequence bi-clustering problem
NASA Astrophysics Data System (ADS)
Kel'manov, A. V.; Khamidullin, S. A.
2015-06-01
We consider a strongly NP-hard problem of partitioning a finite sequence of vectors in Euclidean space into two clusters using the criterion of the minimal sum of the squared distances from the elements of the clusters to the centers of the clusters. The center of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The center of the other cluster is fixed at the origin. Moreover, the partition is such that the difference between the indices of two successive vectors in the first cluster is bounded above and below by prescribed constants. A 2-approximation polynomial-time algorithm is proposed for this problem.
Uy, D.L.
1996-02-01
An algorithm for detection and identification of image clusters or {open_quotes}blobs{close_quotes} based on color information for an autonomous mobile robot is developed. The input image data are first processed using a crisp color fuszzyfier, a binary smoothing filter, and a median filter. The processed image data is then inputed to the image clusters detection and identification program. The program employed the concept of {open_quotes}elastic rectangle{close_quotes}that stretches in such a way that the whole blob is finally enclosed in a rectangle. A C-program is develop to test the algorithm. The algorithm is tested only on image data of 8x8 sizes with different number of blobs in them. The algorithm works very in detecting and identifying image clusters.
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream
Ying Wah, Teh
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
A fast density-based clustering algorithm for real-time Internet of Things stream.
Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
γ-ray DBSCAN: A clustering algorithm applied to Fermi-LAT γ-ray data
NASA Astrophysics Data System (ADS)
Tramacere, A.; Vecchio, C.
2012-12-01
The Density Based Spatial Clustering of Applications with Noise (DBSCAN) is a topometric algorithm used to cluster spatial data that are affected by background noise. For the first time, we propose the use of this method for the detection of sources in γ-ray astrophysical images obtained from the Fermi-LAT data, where each point corresponds to the arrival direction of a photon. We investigate the detection performance of the γ-ray DBSCAN in terms of detection efficiency and rejection of spurious clusters. We find that the γ-ray DBSCAN can be successfully used in the detection of clusters in γ-ray data. The significance returned by our algorithm is strongly correlated with that provided by the Maximum Likelihood analysis with standard Fermi-LAT software, and can be used to safely remove spurious clusters.
A Community Detection Algorithm Based on Topology Potential and Spectral Clustering
Wang, Zhixiao; Chen, Zhaotong; Zhao, Ya; Chen, Shaoda
2014-01-01
Community detection is of great value for complex networks in understanding their inherent law and predicting their behavior. Spectral clustering algorithms have been successfully applied in community detection. This kind of methods has two inadequacies: one is that the input matrixes they used cannot provide sufficient structural information for community detection and the other is that they cannot necessarily derive the proper community number from the ladder distribution of eigenvector elements. In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with nodes' topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes. Experiments results showed that the new algorithm gave excellent performance on artificial networks and real world networks and outperforms other community detection methods. PMID:25147846
An Effective Intrusion Detection Algorithm Based on Improved Semi-supervised Fuzzy Clustering
NASA Astrophysics Data System (ADS)
Li, Xueyong; Zhang, Baojian; Sun, Jiaxia; Yan, Shitao
An algorithm for intrusion detection based on improved evolutionary semi- supervised fuzzy clustering is proposed which is suited for situation that gaining labeled data is more difficulty than unlabeled data in intrusion detection systems. The algorithm requires a small number of labeled data only and a large number of unlabeled data and class labels information provided by labeled data is used to guide the evolution process of each fuzzy partition on unlabeled data, which plays the role of chromosome. This algorithm can deal with fuzzy label, uneasily plunges locally optima and is suited to implement on parallel architecture. Experiments show that the algorithm can improve classification accuracy and has high detection efficiency.
A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics
2010-01-01
The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering technique—the Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines. PMID:20689710
Statistical physics based heuristic clustering algorithms with an application to econophysics
NASA Astrophysics Data System (ADS)
Baldwin, Lucia Liliana
Three new approaches to the clustering of data sets are presented. They are heuristic methods and represent forms of unsupervised (non-parametric) clustering. Applied to an unknown set of data these methods automatically determine the number of clusters and their location using no a priori assumptions. All are based on analogies with different physical phenomena. The first technique, named the Percolation Clustering Algorithm, embodies a novel variation on the nearest-neighbor algorithm focusing on the connectivity between sample points. Exploiting the equivalence with a percolation process, this algorithm considers data points to be surrounded by expanding hyperspheres, which bond when they touch each other. Once a sequence of joined spheres spans an entire cluster, percolation occurs and the cluster size remains constant until it merges with a neighboring cluster. The second procedure, named Nucleation and Growth Clustering, exploits the analogy with nucleation and growth which occurs in island formation during epitaxial growth of solids. The original data points are nucleation centers, around which aggregation will occur. Additional "ad-data" that are introduced into the sample space, interact with the data points and stick if located within a threshold distance. These "ad-data" are used as a tool to facilitate the detection of clusters. The third method, named Discrete Deposition Clustering Algorithm, constrains deposition to occur on a grid, which has the advantage of computational efficiency as opposed to the continuous deposition used in the previous method. The original data form the vertexes of a sparse graph and the deposition sites are defined to be the middle points of this graphs edges. Ad-data are introduced on the deposition site and the system is allowed to evolve in a self-organizing regime. This allows the simulation of a phase transition and by monitoring the specific heat capacity of the system one can mark out a "natural" criterion for validating the partition. All of these techniques are competitive with existing algorithms and offer possible advantages for certain types of data distributions. A practical application is presented using the Percolation Clustering Algorithm to determine the taxonomy of the Dow Jones Industrial Average portfolio. The statistical properties of the correlation coefficients between DJIA components are studied along with the eigenvalues of the correlation matrix between the DJIA components.
[The multi-spectra classification algorithm based on K-means clustering and spectral angle cosine].
Wei, Jun-xia; Xiangli, Bin; Gao, Xiao-hui; Duan, Xiao-feng
2011-05-01
The classification and de-aliasing methods with respect to multi-spectra and hyper-spectra have been widely studied in recent years. And both K-mean clustering algorithm and spectral similarity algorithm are familiar classification methods. The present paper improved the K-mean clustering algorithm by using spectral similarity match algorithm to perform a new spectral classification algorithm. Two spectra with the farthest distance first were chosen as reference spectra. The Euclidean distance method or spectral angle cosine method then were used to classify data cube on the basis of the two reference spectra, and delete the spectra which belongs to the two reference spectra. The rest data cube was used to perform new classification according to a third spectrum, which is the farthest distance or the biggest angle one corresponding to the two reference spectra. Multi-spectral data cube was applied in the experimental test. The results of K-mean clustering classification by ENVI, compared with simulation results of the improved K-mean algorithm and the spectral angle cosine method, demonstrated that the latter two classify two air bubbles explicitly and effectively, and the improved K-mean algorithm classifies backgrounds better, especially the Euclidean distance method can classify the backgrounds integrally. PMID:21800600
Simulation of DNA damage clustering after proton irradiation using an adapted DBSCAN algorithm.
Francis, Ziad; Villagrasa, Carmen; Clairand, Isabelle
2011-03-01
In this work the "Density Based Spatial Clustering of Applications with Noise" (DBSCAN) algorithm was adapted to early stage DNA damage clustering calculations. The resulting algorithm takes into account the distribution of energy deposit induced by ionising particles and a damage probability function that depends on the total energy deposit amount. Proton track simulations were carried out in small micrometric volumes representing small DNA containments. The algorithm was used to determine the damage concentration clusters and thus to deduce the DSB/SSB ratios created by protons between 500keV and 50MeV. The obtained results are compared to other calculations and to available experimental data of fibroblast and plasmid cells irradiations, both extracted from literature. PMID:21232812
Clustering WHO-ART terms using semantic distance and machine learning algorithms.
Iavindrasana, Jimison; Bousquet, Cedric; Degoulet, Patrice; Jaulent, Marie-Christine
2006-01-01
WHO-ART was developed by the WHO collaborating centre for international drug monitoring in order to code adverse drug reactions. We assume that computation of semantic distance between WHO-ART terms may be an efficient way to group related medical conditions in the WHO database in order to improve signal detection. Our objective was to develop a method for clustering WHO-ART terms according to some proximity of their meanings. Our material comprises 758 WHO-ART terms. A formal definition was acquired for each term as a list of elementary concepts belonging to SNOMED international axes and characterized by modifier terms in some cases. Clustering was implemented as a terminology service on a J2EE server. Two different unsupervised machine learning algorithms (KMeans, Pvclust) clustered WHO-ART terms according to a semantic distance operator previously described. Pvclust grouped 51% of WHO-ART terms. K-Means grouped 100% of WHO-ART terms but 25% clusters were heterogeneous with k = 180 clusters and 6% clusters were heterogeneous with k = 32 clusters. Clustering algorithms associated to semantic distance could suggest potential groupings of WHO-ART terms that need validation according to the user's requirements. PMID:17238365
Plot enchaining algorithm: a novel approach for clustering flocks of birds
NASA Astrophysics Data System (ADS)
Büyükaksoy Kaplan, Gülay; Lana, Adnan
2014-06-01
In this study, an intuitive way for tracking flocks of birds is proposed and compared to simple cluster-seeking algorithm for real radar observations. For group of targets such as flock of birds, there is no need to track each target individually. Instead a cluster can be used to represent closely spaced tracks of a possible group. Considering a group of targets as a single target for tracking provides significant performance improvement with almost no loss of information.
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-01-01
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network.
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-01-01
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
Node Non-Uniform Deployment Based on Clustering Algorithm for Underwater Sensor Networks
Jiang, Peng; Liu, Jun; Wu, Feng
2015-01-01
A node non-uniform deployment based on clustering algorithm for underwater sensor networks (UWSNs) is proposed in this study. This algorithm is proposed because optimizing network connectivity rate and network lifetime is difficult for the existing node non-uniform deployment algorithms under the premise of improving the network coverage rate for UWSNs. A high network connectivity rate is achieved by determining the heterogeneous communication ranges of nodes during node clustering. Moreover, the concept of aggregate contribution degree is defined, and the nodes with lower aggregate contribution degrees are used to substitute the dying nodes to decrease the total movement distance of nodes and prolong the network lifetime. Simulation results show that the proposed algorithm can achieve a better network coverage rate and network connectivity rate, as well as decrease the total movement distance of nodes and prolong the network lifetime. PMID:26633408
Node Non-Uniform Deployment Based on Clustering Algorithm for Underwater Sensor Networks.
Jiang, Peng; Liu, Jun; Wu, Feng
2015-01-01
A node non-uniform deployment based on clustering algorithm for underwater sensor networks (UWSNs) is proposed in this study. This algorithm is proposed because optimizing network connectivity rate and network lifetime is difficult for the existing node non-uniform deployment algorithms under the premise of improving the network coverage rate for UWSNs. A high network connectivity rate is achieved by determining the heterogeneous communication ranges of nodes during node clustering. Moreover, the concept of aggregate contribution degree is defined, and the nodes with lower aggregate contribution degrees are used to substitute the dying nodes to decrease the total movement distance of nodes and prolong the network lifetime. Simulation results show that the proposed algorithm can achieve a better network coverage rate and network connectivity rate, as well as decrease the total movement distance of nodes and prolong the network lifetime. PMID:26633408
Discovery of rules in urban public facility distribution based on DBSCAN clustering algorithm
NASA Astrophysics Data System (ADS)
Li, Xinyan; Li, Deren
2007-11-01
Recently Spatial Data Mining (SDM) has been recognized as a powerful technology that can complement traditional GIS to facilitate urban planning and management since it can be used to discover interesting, implicit knowledge from spatial database. DBSCAN spatial clustering algorithm as a SDM method is able to effectively discover clusters of arbitrary shape in large database with noise points. In this paper we applied this algorithm to detect distribution patterns of urban public facilities in a developed city, including primary school, high school and commercial facilities. Both qualitative and quantitative analysis were carried out to investigate how to determine optimal values of input parameters for DBSCAN algorithm, and the distribution patterns of public facilities were assessed against urban planning design standard using the algorithm.
Two generalizations of Kohonen clustering
NASA Technical Reports Server (NTRS)
Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.
1993-01-01
The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.
A seed expanding cluster algorithm for deriving upwelling areas on sea surface temperature images
NASA Astrophysics Data System (ADS)
Nascimento, Susana; Casca, Sérgio; Mirkin, Boris
2015-12-01
In this paper a novel clustering algorithm is proposed as a version of the seeded region growing (SRG) approach for the automatic recognition of coastal upwelling from sea surface temperature (SST) images. The new algorithm, one seed expanding cluster (SEC), takes advantage of the concept of approximate clustering due to Mirkin (1996, 2013) to derive a homogeneity criterion in the format of a product rather than the conventional difference between a pixel value and the mean of values over the region of interest. It involves a boundary-oriented pixel labeling so that the cluster growing is performed by expanding its boundary iteratively. The starting point is a cluster consisting of just one seed, the pixel with the coldest temperature. The baseline version of the SEC algorithm uses Otsu's thresholding method to fine-tune the homogeneity threshold. Unfortunately, this method does not always lead to a satisfactory solution. Therefore, we introduce a self-tuning version of the algorithm in which the homogeneity threshold is locally derived from the approximation criterion over a window around the pixel under consideration. The window serves as a boundary regularizer. These two unsupervised versions of the algorithm have been applied to a set of 28 SST images of the western coast of mainland Portugal, and compared against a supervised version fine-tuned by maximizing the F-measure with respect to manually labeled ground-truth maps. The areas built by the unsupervised versions of the SEC algorithm are significantly coincident over the ground-truth regions in the cases at which the upwelling areas consist of a single continuous fragment of the SST map.
An Effective Tri-Clustering Algorithm Combining Expression Data with Gene Regulation Information
Li, Ao; Tuck, David
2009-01-01
Motivation Bi-clustering algorithms aim to identify sets of genes sharing similar expression patterns across a subset of conditions. However direct interpretation or prediction of gene regulatory mechanisms may be difficult as only gene expression data is used. Information about gene regulators may also be available, most commonly about which transcription factors may bind to the promoter region and thus control the expression level of a gene. Thus a method to integrate gene expression and gene regulation information is desirable for clustering and analyzing. Methods By incorporating gene regulatory information with gene expression data, we define regulated expression values (REV) as indicators of how a gene is regulated by a specific factor. Existing bi-clustering methods are extended to a three dimensional data space by developing a heuristic TRI-Clustering algorithm. An additional approach named Automatic Boundary Searching algorithm (ABS) is introduced to automatically determine the boundary threshold. Results Results based on incorporating ChIP-chip data representing transcription factor-gene interactions show that the algorithms are efficient and robust for detecting tri-clusters. Detailed analysis of the tri-cluster extracted from yeast sporulation REV data shows genes in this cluster exhibited significant differences during the middle and late stages. The implicated regulatory network was then reconstructed for further study of defined regulatory mechanisms. Topological and statistical analysis of this network demonstrated evidence of significant changes of TF activities during the different stages of yeast sporulation, and suggests this approach might be a general way to study regulatory networks undergoing transformations. PMID:19838334
Experimental realization of the Deutsch-Jozsa algorithm with a six-qubit cluster state
Vallone, Giuseppe; Donati, Gaia; Bruno, Natalia; Chiuri, Andrea; Mataloni, Paolo
2010-05-15
We describe an experimental realization of the Deutsch-Jozsa quantum algorithm to evaluate the properties of a two-bit Boolean function in the framework of one-way quantum computation. For this purpose, a two-photon six-qubit cluster state was engineered. Its peculiar topological structure is the basis of the original measurement pattern allowing the algorithm realization. The good agreement of the experimental results with the theoretical predictions, obtained at {approx}1 kHz success rate, demonstrates the correct implementation of the algorithm.
Lee, Chongdeuk; Jeong, Taegwon
2011-01-01
Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms. PMID:22163905
Solving the depth of the repeated texture areas based on the clustering algorithm
NASA Astrophysics Data System (ADS)
Xiong, Zhang; Zhang, Jun; Tian, Jinwen
2015-12-01
The reconstruction of the 3D scene in the monocular stereo vision needs to get the depth of the field scenic points in the picture scene. But there will inevitably be error matching in the process of image matching, especially when there are a large number of repeat texture areas in the images, there will be lots of error matches. At present, multiple baseline stereo imaging algorithm is commonly used to eliminate matching error for repeated texture areas. This algorithm can eliminate the ambiguity correspond to common repetition texture. But this algorithm has restrictions on the baseline, and has low speed. In this paper, we put forward an algorithm of calculating the depth of the matching points in the repeat texture areas based on the clustering algorithm. Firstly, we adopt Gauss Filter to preprocess the images. Secondly, we segment the repeated texture regions in the images into image blocks by using spectral clustering segmentation algorithm based on super pixel and tag the image blocks. Then, match the two images and solve the depth of the image. Finally, the depth of the image blocks takes the median in all depth values of calculating point in the bock. So the depth of repeated texture areas is got. The results of a lot of image experiments show that the effect of our algorithm for calculating the depth of repeated texture areas is very good.
A multilevel gamma-clustering layout algorithm for visualization of biological networks.
Hruz, Tomas; Wyss, Markus; Lucas, Christoph; Laule, Oliver; von Rohr, Peter; Zimmermann, Philip; Bleuler, Stefan
2013-01-01
Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel γ -clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs. PMID:23864855
A Multilevel Gamma-Clustering Layout Algorithm for Visualization of Biological Networks
Hruz, Tomas; Lucas, Christoph; Laule, Oliver; Zimmermann, Philip
2013-01-01
Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel γ-clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs. PMID:23864855
Tame, M. S.; Kim, M. S.
2010-09-15
We show that fundamental versions of the Deutsch-Jozsa and Bernstein-Vazirani quantum algorithms can be performed using a small entangled cluster state resource of only six qubits. We then investigate the minimal resource states needed to demonstrate general n-qubit versions and a scalable method to produce them. For this purpose, we propose a versatile photonic on-chip setup.
Borodovsky, M; Peresetsky, A
1994-09-01
Non-homogeneous Markov chain models can represent biologically important regions of DNA sequences. The statistical pattern that is described by these models is usually weak and was found primarily because of strong biological indications. The general method for extracting similar patterns is presented in the current paper. The algorithm incorporates cluster analysis, multiple alignment and entropy minimization. The method was first tested using the set of DNA sequences produced by Markov chain generators. It was shown that artificial gene sequences, which initially have been randomly set up along the multiple alignment panels, are aligned according to the hidden triplet phase. Then the method was applied to real protein-coding sequences and the resulting alignment clearly indicated the triplet phase and produced the parameters of the optimal 3-periodic non-homogeneous Markov chain model. These Markov models were already employed in the GeneMark gene prediction algorithm, which is used in genome sequencing projects. The algorithm can also handle the case in which the sequences to be aligned reveal different statistical patterns, such as Escherichia coli protein-coding sequences belonging to Class II and Class III. The algorithm accepts a random mix of sequences from different classes, and is able to separate them into two groups (clusters), align each cluster separately, and define a non-homogeneous Markov chain model for each sequence cluster. PMID:7952897
An effective trust-based recommendation method using a novel graph clustering algorithm
NASA Astrophysics Data System (ADS)
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms
Melnykov, Volodymyr; Chen, Wei-Chen; Maitra, Ranjan
2012-01-01
The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models. All features of the package are illustrated in great detail. The utility of the package is highlighted through a small comparison study of several popular clustering algorithms.
Kim, Chang Sik; Bae, Cheol Soo; Tcha, Hong Joon
2008-01-01
Background The previous studies of genome-wide expression patterns show that a certain percentage of genes are cell cycle regulated. The expression data has been analyzed in a number of different ways to identify cell cycle dependent genes. In this study, we pose the hypothesis that cell cycle dependent genes are considered as oscillating systems with a rhythm, i.e. systems producing response signals with period and frequency. Therefore, we are motivated to apply the theory of multivariate phase synchronization for clustering cell cycle specific genome-wide expression data. Results We propose the strategy to find groups of genes according to the specific biological process by analyzing cell cycle specific gene expression data. To evaluate the propose method, we use the modified Kuramoto model, which is a phase governing equation that provides the long-term dynamics of globally coupled oscillators. With this equation, we simulate two groups of expression signals, and the simulated signals from each group shares their own common rhythm. Then, the simulated expression data are mixed with randomly generated expression data to be used as input data set to the algorithm. Using these simulated expression data, it is shown that the algorithm is able to identify expression signals that are involved in the same oscillating process. We also evaluate the method with yeast cell cycle expression data. It is shown that the output clusters by the proposed algorithm include genes, which are closely associated with each other by sharing significant Gene Ontology terms of biological process and/or having relatively many known biological interactions. Therefore, the evaluation analysis indicates that the method is able to identify expression signals according to the specific biological process. Our evaluation analysis also indicates that some portion of output by the proposed algorithm is not obtainable by the traditional clustering algorithm with Euclidean distance or linear correlation. Conclusion Based on the evaluation experiments, we draw the conclusion as follows: 1) Based on the theory of multivariate phase synchronization, it is feasible to find groups of genes, which have relevant biological interactions and/or significantly shared GO slim terms of biological process, using cell cycle specific gene expression signals. 2) Among all the output clusters by the proposed algorithm, the cluster with relatively large size has a tendency to include more known interactions than the one with relatively small size. 3) It is feasible to understand the cell cycle specific gene expression patterns as the phenomenon of collective synchronization. 4) The proposed algorithm is able to find prominent groups of genes, which are not obtainable by traditional clustering algorithm. PMID:18221564
An X-Ray Spectral Classification Algorithm with Application to Young Stellar Clusters
NASA Astrophysics Data System (ADS)
Hojnacki, S. M.; Kastner, J. H.; Micela, G.; Feigelson, E. D.; LaLonde, S. M.
2007-04-01
A large volume of low signal-to-noise, multidimensional data is available from the CCD imaging spectrometers aboard the Chandra X-Ray Observatory and the X-Ray Multimirror Mission (XMM-Newton). To make progress analyzing this data, it is essential to develop methods to sort, classify, and characterize the vast library of X-ray spectra in a nonparametric fashion (complementary to current parametric model fits). We have developed a spectral classification algorithm that handles large volumes of data and operates independently of the requirement of spectral model fits. We use proven multivariate statistical techniques including principal component analysis and an ensemble classifier consisting of agglomerative hierarchical clustering and K-means clustering applied for the first time for spectral classification. The algorithm positions the sources in a multidimensional spectral sequence and then groups the ordered sources into clusters based on their spectra. These clusters appear more distinct for sources with harder observed spectra. The apparent diversity of source spectra is reduced to a three-dimensional locus in principal component space, with spectral outliers falling outside this locus. The algorithm was applied to a sample of 444 strong sources selected from the 1616 X-ray emitting sources detected in deep Chandra imaging spectroscopy of the Orion Nebula Cluster. Classes form sequences in NH, AV, and accretion activity indicators, demonstrating that the algorithm efficiently sorts the X-ray sources into a physically meaningful sequence. The algorithm also isolates important classes of very deeply embedded, active young stellar objects, and yields trends between X-ray spectral parameters and stellar parameters for the lowest mass, pre-main-sequence stars.
Haplotype-based quantitative trait mapping using a clustering algorithm
Li, Jing; Zhou, Yingyao; Elston, Robert C
2006-01-01
Background With the availability of large-scale, high-density single-nucleotide polymorphism (SNP) markers, substantial effort has been made in identifying disease-causing genes using linkage disequilibrium (LD) mapping by haplotype analysis of unrelated individuals. In addition to complex diseases, many continuously distributed quantitative traits are of primary clinical and health significance. However the development of association mapping methods using unrelated individuals for quantitative traits has received relatively less attention. Results We recently developed an association mapping method for complex diseases by mining the sharing of haplotype segments (i.e., phased genotype pairs) in affected individuals that are rarely present in normal individuals. In this paper, we extend our previous work to address the problem of quantitative trait mapping from unrelated individuals. The method is non-parametric in nature, and statistical significance can be obtained by a permutation test. It can also be incorporated into the one-way ANCOVA (analysis of covariance) framework so that other factors and covariates can be easily incorporated. The effectiveness of the approach is demonstrated by extensive experimental studies using both simulated and real data sets. The results show that our haplotype-based approach is more robust than two statistical methods based on single markers: a single SNP association test (SSA) and the Mann-Whitney U-test (MWU). The algorithm has been incorporated into our existing software package called HapMiner, which is available from our website at . Conclusion For QTL (quantitative trait loci) fine mapping, to identify QTNs (quantitative trait nucleotides) with realistic effects (the contribution of each QTN less than 10% of total variance of the trait), large samples sizes (≥ 500) are needed for all the methods. The overall performance of HapMiner is better than that of the other two methods. Its effectiveness further depends on other factors such as recombination rates and the density of typed SNPs. Haplotype-based methods might provide higher power than methods based on a single SNP when using tag SNPs selected from a small number of samples or some other sources (such as HapMap data). Rank-based statistics usually have much lower power, as shown in our study. PMID:16709248
NASA Astrophysics Data System (ADS)
Rahmah, Nadia; Sukaesih Sitanggang, Imas
2016-01-01
In this work we determine the optimal epsilon value on peatland on DBSCAN Algorithm to clustering data on peatland hotspots in sumatera. DBSCAN is a base algorithm for density based data clustering which contain noise and outliers. We found using this method that the area which has the highest density of hotspots in Sumatra in 2013 peatland is contained in cluster 1 of Riau Province that is equal to 2112 hotspots.
The RedGOLD cluster detection algorithm and its cluster candidate catalogue for the CFHT-LS W1
NASA Astrophysics Data System (ADS)
Licitra, Rossella; Mei, Simona; Raichoor, Anand; Erben, Thomas; Hildebrandt, Hendrik
2016-01-01
We present RedGOLD (Red-sequence Galaxy Overdensity cLuster Detector), a new optical/NIR galaxy cluster detection algorithm, and apply it to the CFHT-LS W1 field. RedGOLD searches for red-sequence galaxy overdensities while minimizing contamination from dusty star-forming galaxies. It imposes an Navarro-Frenk-White profile and calculates cluster detection significance and richness. We optimize these latter two parameters using both simulations and X-ray-detected cluster catalogues, and obtain a catalogue ˜80 per cent pure up to z ˜ 1, and ˜100 per cent (˜70 per cent) complete at z ≤ 0.6 (z ≲ 1) for galaxy clusters with M ≳ 1014 M⊙ at the CFHT-LS Wide depth. In the CFHT-LS W1, we detect 11 cluster candidates per deg2 out to z ˜ 1.1. When we optimize both completeness and purity, RedGOLD obtains a cluster catalogue with higher completeness and purity than other public catalogues, obtained using CFHT-LS W1 observations, for M ≳ 1014 M⊙. We use X-ray-detected cluster samples to extend the study of the X-ray temperature-optical richness relation to a lower mass threshold, and find a mass scatter at fixed richness of σlnM|λ = 0.39 ± 0.07 and σlnM|λ = 0.30 ± 0.13 for the Gozaliasl et al. and Mehrtens et al. samples. When considering similar mass ranges as previous work, we recover a smaller scatter in mass at fixed richness. We recover 93 per cent of the redMaPPer detections, and find that its richness estimates is on average ˜40-50 per cent larger than ours at z > 0.3. RedGOLD recovers X-ray cluster spectroscopic redshifts at better than 5 per cent up to z ˜ 1, and the centres within a few tens of arcseconds.
A priori data-driven multi-clustered reservoir generation algorithm for echo state network.
Li, Xiumin; Zhong, Ling; Xue, Fangzheng; Zhang, Anguo
2015-01-01
Echo state networks (ESNs) with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision. PMID:25875296
FctClus: A Fast Clustering Algorithm for Heterogeneous Information Networks.
Yang, Jing; Chen, Limin; Zhang, Jianpei
2015-01-01
It is important to cluster heterogeneous information networks. A fast clustering algorithm based on an approximate commute time embedding for heterogeneous information networks with a star network schema is proposed in this paper by utilizing the sparsity of heterogeneous information networks. First, a heterogeneous information network is transformed into multiple compatible bipartite graphs from the compatible point of view. Second, the approximate commute time embedding of each bipartite graph is computed using random mapping and a linear time solver. All of the indicator subsets in each embedding simultaneously determine the target dataset. Finally, a general model is formulated by these indicator subsets, and a fast algorithm is derived by simultaneously clustering all of the indicator subsets using the sum of the weighted distances for all indicators for an identical target object. The proposed fast algorithm, FctClus, is shown to be efficient and generalizable and exhibits high clustering accuracy and fast computation speed based on a theoretic analysis and experimental verification. PMID:26090857
K-Boost: a scalable algorithm for high-quality clustering of microarray gene expression data.
Geraci, Filippo; Leoncini, Mauro; Montangero, Manuela; Pellegrini, Marco; Renda, M Elena
2009-06-01
Microarray technology for profiling gene expression levels is a popular tool in modern biological research. Applications range from tissue classification to the detection of metabolic networks, from drug discovery to time-critical personalized medicine. Given the increase in size and complexity of the data sets produced, their analysis is becoming problematic in terms of time/quality trade-offs. Clustering genes with similar expression profiles is a key initial step for subsequent manipulations and the increasing volumes of data to be analyzed requires methods that are at the same time efficient (completing an analysis in minutes rather than hours) and effective (identifying significant clusters with high biological correlations). In this paper, we propose K-Boost, a clustering algorithm based on a combination of the furthest-point-first (FPF) heuristic for solving the metric k-center problem, a stability-based method for determining the number of clusters, and a k-means-like cluster refinement. K-Boost runs in O (|N| x k) time, where N is the input matrix and k is the number of proposed clusters. Experiments show that this low complexity is usually coupled with a very good quality of the computed clusterings, which we measure using both internal and external criteria. Supporting data can be found as online Supplementary Material at www.liebertonline.com. PMID:19522668
A Source Classification Algorithm for Astronomical X-ray Imagery of Stellar Clusters
NASA Astrophysics Data System (ADS)
Hojnacki, Susan M.
The Chandra X-ray Observatory (Chandra) is producing images with outstanding spatial resolution using low-noise, fast-readout CCDs. Among many other things, X-ray images and spectra help astronomers study star formation and galactic evolution. Currently, X-ray astronomers classify one X-ray source at a time by visual inspection and use of model-fitting software. This approach is useful for studying the physics of bright individual sources but is time consuming for analyzing large images of rich fields of X-ray sources, such as stellar clusters. Objective and efficient techniques from the fields of multivariate statistics, pattern recognition, and hyperspectral image processing, are needed to analyze the growing Chandra image archive. An image processing algorithm has been developed that orders the given X-ray sources based on hard versus soft X-ray emission and then groups the ordered X-ray sources into clusters based on their spectral attributes. The algorithm was applied to imaging spectroscopy of the Orion Nebula Cluster (ONC) population of more than 1000 X-ray emitting stars. As an initial test of the algorithm, images of the ONC from the Chandra archive were analyzed. The final spectral classification algorithm was applied to a sample of sources selected from among the more than 1600 X-ray sources detected in the Chandra Orion Ultradeep Project. Clustering results have been compared with known optical and infrared properties of the population of the ONC to assess the algorithm's ability to identify groups of sources that share common attributes.
Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs
NASA Astrophysics Data System (ADS)
Choi, Woo-Yong; Chatterjee, Mainak
2015-03-01
With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.
Uchiyama, Ikuo
2006-01-01
Ortholog identification is a crucial first step in comparative genomics. Here, we present a rapid method of ortholog grouping which is effective enough to allow the comparison of many genomes simultaneously. The method takes as input all-against-all similarity data and classifies genes based on the traditional hierarchical clustering algorithm UPGMA. In the course of clustering, the method detects domain fusion or fission events, and splits clusters into domains if required. The subsequent procedure splits the resulting trees such that intra-species paralogous genes are divided into different groups so as to create plausible orthologous groups. As a result, the procedure can split genes into the domains minimally required for ortholog grouping. The procedure, named DomClust, was tested using the COG database as a reference. When comparing several clustering algorithms combined with the conventional bidirectional best-hit (BBH) criterion, we found that our method generally showed better agreement with the COG classification. By comparing the clustering results generated from datasets of different releases, we also found that our method showed relatively good stability in comparison to the BBH-based methods. PMID:16436801
NASA Astrophysics Data System (ADS)
Quintanilla-Domínguez, Joel; Ojeda-Magaña, Benjamín; Marcano-Cedeño, Alexis; Cortina-Januchs, María G.; Vega-Corona, Antonio; Andina, Diego
2011-12-01
A new method for detecting microcalcifications in regions of interest (ROIs) extracted from digitized mammograms is proposed. The top-hat transform is a technique based on mathematical morphology operations and, in this paper, is used to perform contrast enhancement of the mi-crocalcifications. To improve microcalcification detection, a novel image sub-segmentation approach based on the possibilistic fuzzy c-means algorithm is used. From the original ROIs, window-based features, such as the mean and standard deviation, were extracted; these features were used as an input vector in a classifier. The classifier is based on an artificial neural network to identify patterns belonging to microcalcifications and healthy tissue. Our results show that the proposed method is a good alternative for automatically detecting microcalcifications, because this stage is an important part of early breast cancer detection.
KD-tree based clustering algorithm for fast face recognition on large-scale data
NASA Astrophysics Data System (ADS)
Wang, Yuanyuan; Lin, Yaping; Yang, Junfeng
2015-07-01
This paper proposes an acceleration method for large-scale face recognition system. When dealing with a large-scale database, face recognition is time-consuming. In order to tackle this problem, we employ the k-means clustering algorithm to classify face data. Specifically, the data in each cluster are stored in the form of the kd-tree, and face feature matching is conducted with the kd-tree based nearest neighborhood search. Experiments on CAS-PEAL and self-collected database show the effectiveness of our proposed method.
Fast randomized Hough transformation track initiation algorithm based on multi-scale clustering
NASA Astrophysics Data System (ADS)
Wan, Minjie; Gu, Guohua; Chen, Qian; Qian, Weixian; Wang, Pengcheng
2015-10-01
A fast randomized Hough transformation track initiation algorithm based on multi-scale clustering is proposed to overcome existing problems in traditional infrared search and track system(IRST) which cannot provide movement information of the initial target and select the threshold value of correlation automatically by a two-dimensional track association algorithm based on bearing-only information . Movements of all the targets are presumed to be uniform rectilinear motion throughout this new algorithm. Concepts of space random sampling, parameter space dynamic linking table and convergent mapping of image to parameter space are developed on the basis of fast randomized Hough transformation. Considering the phenomenon of peak value clustering due to shortcomings of peak detection itself which is built on threshold value method, accuracy can only be ensured on condition that parameter space has an obvious peak value. A multi-scale idea is added to the above-mentioned algorithm. Firstly, a primary association is conducted to select several alternative tracks by a low-threshold .Then, alternative tracks are processed by multi-scale clustering methods , through which accurate numbers and parameters of tracks are figured out automatically by means of transforming scale parameters. The first three frames are processed by this algorithm in order to get the first three targets of the track , and then two slightly different gate radius are worked out , mean value of which is used to be the global threshold value of correlation. Moreover, a new model for curvilinear equation correction is applied to the above-mentioned track initiation algorithm for purpose of solving the problem of shape distortion when a space three-dimensional curve is mapped to a two-dimensional bearing-only space. Using sideways-flying, launch and landing as examples to build models and simulate, the application of the proposed approach in simulation proves its effectiveness , accuracy , and adaptivity of correlation threshold selection.
Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
NASA Astrophysics Data System (ADS)
Singh, Sudhakar; Garg, Rakhi; Mishra, P. K.
2015-10-01
Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm. We emphasize and investigate the significance of these three data structures for Apriori algorithm on Hadoop cluster, which has not been given attention yet. Experiments are carried out on both real life and synthetic datasets which shows that hash table trie data structures performs far better than trie and hash tree in terms of execution time. Moreover the performance in case of hash tree becomes worst.
A new energy-efficient hierarchical clustering algorithm for wireless sensor networks
NASA Astrophysics Data System (ADS)
Zhou, Zude; Ai, Qingsong
2005-10-01
Recent advances in wireless communications and microelectro-mechanical systems have motivated the development of extremely small, low-cost sensors that possess sensing, signal processing and wireless communication capabilities. A wireless network consisting of a large number of small sensors with low-power transceivers can be an effective tool for gathering data in a variety of environments. The data collected by each sensor is communicated through the network to a single processing center that uses all reported data to determine characteristics of the environment or detect an event. The communication or message passing process must be designed to conserve the limited energy resources of the sensors. Clustering sensors into groups, so that sensors communicate information only to clusterheads and then the clusterheads communicate the aggregated information to the processing center, may save energy. In this paper, we propose a distributed, randomized clustering algorithm to organize the sensors in a wireless sensor network into clusters. Our algorithm generates a hierarchy of clusterheads and observes that the energy savings increase with the number of levels in the hierarchy. Results in stochastic geometry are used to derive solutions for the values of parameters of our algorithm that minimize the total energy spent in the network when all sensors report data through the clusterheads to the processing center.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
Evaluation of particle clustering algorithms in the prediction of brownout dust clouds
NASA Astrophysics Data System (ADS)
Govindarajan, Bharath Madapusi
2011-07-01
A study of three Lagrangian particle clustering methods has been conducted with application to the problem of predicting brownout dust clouds that develop when rotorcraft land over surfaces covered with loose sediment. A significant impediment in performing such particle modeling simulations is the extremely large number of particles needed to obtain dust clouds of acceptable fidelity. Computing the motion of each and every individual sediment particle in a dust cloud (which can reach into tens of billions per cubic meter) is computationally prohibitive. The reported work involved the development of computationally efficient clustering algorithms that can be applied to the simulation of dilute gas-particle suspensions at low Reynolds numbers of the relative particle motion. The Gaussian distribution, k-means and Osiptsov's clustering methods were studied in detail to highlight the nuances of each method for a prototypical flow field that mimics the highly unsteady, two-phase vortical particle flow obtained when rotorcraft encounter brownout conditions. It is shown that although clustering algorithms can be problem dependent and have bounds of applicability, they offer the potential to significantly reduce computational costs while retaining the overall accuracy of a brownout dust cloud solution.
NASA Astrophysics Data System (ADS)
Inclan, Eric; Geohegan, David; Yoon, Mina
2015-03-01
Nanostructured TiO2 materials have interesting properties that are highly relevant to energy and device applications. However, precise control of their morphologies and characterization are still a grand challenge in the field. Using a hybrid optimization algorithm we theoretically explored configuration spaces of energetically metastable TiO2 nanostructures. Our approach is to minimize the total energy of TiO2 clusters in order to identify the structural characteristics and energy landscape of plausible (TiO2)n (n = 1-100). The hybrid algorithm includes a modified differential evolution algorithm, a permutation operator to perform global optimization on a set of randomly generated structures, and then structure refinement using a BFGS Quasi-Newton algorithm. The results were compared against known physical structures and numerical results in the literature as well as our experimentally synthesized structures. Although the global minimum became more computationally expensive to locate with increasing number of TiO2 units, the optimizer successfully identified numerous plausible structures along a range of energies close to the global minimum energy structure for all clusters in the given range. This work is supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division.
A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets.
Zhang, Yipu; Wang, Ping
2015-01-01
New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. PMID:26236718
A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets
Zhang, Yipu; Wang, Ping
2015-01-01
New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. PMID:26236718
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
Dynamic scheduling study on engineering machinery of clusters using multi-agent system ant algorithm
NASA Astrophysics Data System (ADS)
Gao, Qiang; Wang, Hongli; Guo, Long; Xiang, Jianping
2005-12-01
In the process of road surface construction, dispatchers' scheduling was experiential and blindfold in some degree and static scheduling restricted the continuity of the construction. Serious problems such as labor holdup, material awaiting and scheduling delay could occur when the old scheduling technique was used. This paper presents ant colony algorithm based on MAS that has the abilities of intelligentized modeling and dynamic scheduling. MAS model deals with single agent's communication and corresponding in engineering machinery of clusters firstly, next we apply ant colony algorithm to solve dynamic scheduling in the plant. Ant colony algorithm can optimize the match of agents and make the system dynamic balance. The effectiveness of the proposed method is demonstrated with MATLAB simulations.
Karimi, Abbas; Afsharfarnia, Abbas; Zarafshan, Faraneh; Al-Haddad, S A R
2014-01-01
The stability of clusters is a serious issue in mobile ad hoc networks. Low stability of clusters may lead to rapid failure of clusters, high energy consumption for reclustering, and decrease in the overall network stability in mobile ad hoc network. In order to improve the stability of clusters, weight-based clustering algorithms are utilized. However, these algorithms only use limited features of the nodes. Thus, they decrease the weight accuracy in determining node's competency and lead to incorrect selection of cluster heads. A new weight-based algorithm presented in this paper not only determines node's weight using its own features, but also considers the direct effect of feature of adjacent nodes. It determines the weight of virtual links between nodes and the effect of the weights on determining node's final weight. By using this strategy, the highest weight is assigned to the best choices for being the cluster heads and the accuracy of nodes selection increases. The performance of new algorithm is analyzed by using computer simulation. The results show that produced clusters have longer lifetime and higher stability. Mathematical simulation shows that this algorithm has high availability in case of failure. PMID:25114965
Karimi, Abbas; Afsharfarnia, Abbas; Zarafshan, Faraneh; Al-Haddad, S. A. R.
2014-01-01
The stability of clusters is a serious issue in mobile ad hoc networks. Low stability of clusters may lead to rapid failure of clusters, high energy consumption for reclustering, and decrease in the overall network stability in mobile ad hoc network. In order to improve the stability of clusters, weight-based clustering algorithms are utilized. However, these algorithms only use limited features of the nodes. Thus, they decrease the weight accuracy in determining node's competency and lead to incorrect selection of cluster heads. A new weight-based algorithm presented in this paper not only determines node's weight using its own features, but also considers the direct effect of feature of adjacent nodes. It determines the weight of virtual links between nodes and the effect of the weights on determining node's final weight. By using this strategy, the highest weight is assigned to the best choices for being the cluster heads and the accuracy of nodes selection increases. The performance of new algorithm is analyzed by using computer simulation. The results show that produced clusters have longer lifetime and higher stability. Mathematical simulation shows that this algorithm has high availability in case of failure. PMID:25114965
A new concept of wildland-urban interface based on city clustering algorithm
NASA Astrophysics Data System (ADS)
Kanevski, M.; Champendal, A.; Vega Orozco, C.; Tonini, M.; Conedera, M.
2012-04-01
Wildland-Urban-Interface (WUI) is a widely used term in the context of wild and forest fires to indicate areas where human infrastructures interact with wildland/forest areas. Many complex problems are associated to the WUI; but the most relevant ones are those related to forest fire hazard and management in dense populated areas where fire regime is dominated by anthropogenic-induced ignition fires. This coexistence enhances both anthropogenic-ignition sources and flammable fuels. Furthermore, the growing trend of the WUI and global change effects may even worsening the situation in the near future. Therefore, many studies are dedicated to the WUI problem, focusing on refinement of its definition, development of mapping methods, implementation of measures into specific fire management plans and the validation of the proposed approaches. The present study introduces a new concept of WUI based on city clustering algorithm (CCA) introduced in Rosenfeld et al., 2008. CCA was proposed as an automatic tool for studying the definition of cities and their distribution. The algorithm uses demographic data - either on a regular or non-regular grid in space - where a city (urban zone) is detected as a cluster of connected populated cells with maximal size. In the present study the CCA is proposed as a tool to develop a new concept of population dynamic analysis crucial to define and to localise WUI. The real case study is based on demographic/census data - organised in a regular grid with a resolution of 100 m and the forest fire ignition points database from canton Ticino, Switzerland. By changing spatial scales of demographic cells the relationships between urban zones (demographic clusters) and forest fire events were statistically analyzed. Corresponding scaling laws were used to understand the interaction between urban zones and forest fires. The first results are good and indicate that the method can be applied to define WUI in an innovative way. Keywords: forest fires, wild-land-user interface, city clustering algorithms.
A contour-line color layer separation algorithm based on fuzzy clustering and region growing
NASA Astrophysics Data System (ADS)
Liu, Tiange; Miao, Qiguang; Xu, Pengfei; Tong, Yubing; Song, Jianfeng; Xia, Ge; Yang, Yun; Zhai, Xiaojie
2016-03-01
The color layers of contour-lines separated from scanned topographic map are the basis of contour-line extraction, but it is difficult to separate them well due to the color aliasing and mixed color problems. This paper will focus us on contour-line color layer separation and presents a novel approach for it based on fuzzy clustering and Single-prototype Region Growing for Contour-line Layer (SRGCL). The purpose of this paper is to provide a solution for processing scanned topographic maps on which contour-lines are abundant and densely distributed, for example, in the condition similar to hilly areas and mountainous regions, the contour-lines always occupy the largest proportion in linear features and the contour-line separation is the most difficult task. The proposed approach includes steps as follows. First step, line features are extracted from the map to reduce the interference from area features in fuzzy clustering. Second step, fuzzy clustering algorithm is employed to obtain membership matrix of pixels in the line map. Third step, based on the membership matrix, we obtain the most-similar prototype and the second-similar prototype of each pixel as the indicators of the pixel in SRGCL. The spatial relationship and the fuzzy similarity of color features are used in SRGCL to overcome the inaccurate classification of ambiguous pixels. The procedure focusing on single contour-line layer will improve the accuracy of contour-line segmentation result of SRGCL relative to general segmentation methods. We verified the algorithm on several USGS historical maps, the experimental results show that our algorithm produces contour-line color layers with good continuity and few noises, which verifies the improvement in contour-line color layer separation of our algorithm relative to two general segmentation methods.
NASA Astrophysics Data System (ADS)
Bus, Schelte J.
2006-09-01
Dynamical asteroid families are defined as groupings of objects in proper orbital element space that are believed to result from the collisional disruptions of once larger parent bodies. Spectroscopic studies have been used to test the genetic relationships that should exist between family members, and have revealed a high degree of spectral homogeneity within each family. Given this finding, a method of searching for asteroid families was proposed by Bus (1999, MIT Ph.D. thesis) that combines both orbital and spectral parameters in the clustering process. Here I describe an improved search for spectro-dynamical asteroid families that utilizes a shared nearest neighbor (SNN) clustering algorithm. This algorithm is more robust for family identification than the Hierarchical Clustering Method (HCM) used in earlier stages of this study. Orbital elements are combined with spectral data from three major surveys, the first and second phases of the Small Main-belt Asteroid Spectroscopic Survey (SMASS, Xu et al. 1995, Icarus 115:1-16, Bus and Binzel 2002, Icarus 158:106-145) and the Small Solar System Objects Spectroscopic Survey (S3OS2, Lazzaro et al. 2004, Icarus 172:179-220), resulting in an analysis of 2074 asteroids and a search for families across the entire main belt. This new search for asteroid families has many advantages over previous studies that have relied solely on the distributions of orbital elements. By including spectral parameters in the clustering analysis, boundaries of known families can be more accurately determined, close or overlapping families in orbital space can be separated, non-family members (interlopers) can be identified, and older, more diffuse families can be more easily recognized. Family memberships will be presented, along with details of the clustering procedure.
Guo, Xiao-yong; Fang, Li; Zhao, Wen-wu; Gu, Xue-jun; Zheng, Hai-yang; Zhang, Wei-jun
2008-08-01
On-line measurement of size and composition of single particle using an aerosol time-of-flight Laser mass spectrometry (ATOFLMS) had been designed in our lab. Each particle's aerodynamic diameter is determined by measuring the delay time between two continuous-wave lasers, A Nd : YAG laser desorbs and ionizes molecules from the particle, and the time-of-flight mass spectrometer collects a mass spectrum of the generated ions. Then the composition of single particle is obtained. ATOFLMS generates large amount of data during the process period. How to process these data and extract valuable information is one of the key problems for the ATOFLMS. In this paper, the fuzzy clustering used to classify large numbers of mass spectral of air indoor by an ATOFLMS. Each revised spectrum is converted to a normalized 300-point vector, each point representing one mass unit. Then the positive ion mass spectra of a single particle are described as 300-dimensional data vectors using the ion masses as dimensions and the ion signal peak areas as values. The data vectors of all particles measured are written into a classification matrix. Each spectrum's data was stored as one row in this matrix. The Fuzzy c-means algorithm is an iterative method starting the calculation with random class centers to find a substructure in the data. The procedure works in such a way that finally similar objects (particle spectra) have a minimum distance between their corresponding data vectors, on the one hand, and to the center of a cluster, on the other hand. So the aim of the iteration is to find local minima in the N-dimensional space where N is the number of evaluated peak masses. The particle data used in this study were collected over a period one day in Hefei. During the campaign, inorganic salts, mineral particles, and carbonaceous particles, with varying degrees of secondary components, were identified. The detection results of particle size exhibit that aerosol is predominanantly in the form of fine particles, and the particles whose diameter larger than 1 microm are scare. The particles whose diameter less than 1 microm are make up of 95% of the total particles, and these particles are major distributed in 0.4-0.8 microm. PMID:18975786
Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm
NASA Technical Reports Server (NTRS)
Mitra, Sunanda; Pemmaraju, Surya
1992-01-01
Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.
Detecting low-frequency functional connectivity in fMRI using unsupervised clustering algorithms
NASA Astrophysics Data System (ADS)
Lange, Oliver; Meyer-Bäse, Anke; Wismüller, Axel
2006-04-01
Recent research in functional magnetic resonance imaging (fMRI) revealed slowly varying temporally correlated fluctuations between functionally related areas. These low-frequency oscillations of less than 0.08 Hz appear to be a property of symmetric cortices, and they are known to be present in the motor cortex among others. These low-frequency data are difficult to detect and quantify in fMRI. Traditionally, user-based regions of interests (ROI) or "seed clusters" have been the primary analysis method. We propose in this paper to employ unsupervised clustering algorithms employing arbitrary distance measures to detect the resting state of functional connectivity. There are two main benefits using unsupervised algorithms instead of traditional techniques: (1) the scan time is reduced by finding directly the activation data set, and (2) the whole data set is considered and not a relative correlation map. The achieved results are evaluated for different distance metrics. The Euclidian metric implemented by the standard unsupervised clustering approaches is compared with a more general topographic mapping of proximities based on the correlation and the prediction error between time courses. Thus, we are able to detect functional connectivity based on model-free analysis methods implementing arbitrary distance metrics.
Crowded Cluster Cores. Algorithms for Deblending in Dark Energy Survey Images
Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J.; Rykoff, Eli; Song, Jeeseon
2015-10-26
Deep optical images are often crowded with overlapping objects. We found that this is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores inmore » Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. Our paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.« less
Detection and clustering of features in aerial images by neuron network-based algorithm
NASA Astrophysics Data System (ADS)
Vozenilek, Vit
2015-12-01
The paper presents the algorithm for detection and clustering of feature in aerial photographs based on artificial neural networks. The presented approach is not focused on the detection of specific topographic features, but on the combination of general features analysis and their use for clustering and backward projection of clusters to aerial image. The basis of the algorithm is a calculation of the total error of the network and a change of weights of the network to minimize the error. A classic bipolar sigmoid was used for the activation function of the neurons and the basic method of backpropagation was used for learning. To verify that a set of features is able to represent the image content from the user's perspective, the web application was compiled (ASP.NET on the Microsoft .NET platform). The main achievements include the knowledge that man-made objects in aerial images can be successfully identified by detection of shapes and anomalies. It was also found that the appropriate combination of comprehensive features that describe the colors and selected shapes of individual areas can be useful for image analysis.
Crowded Cluster Cores. Algorithms for Deblending in Dark Energy Survey Images
Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J.; Rykoff, Eli; Song, Jeeseon
2015-10-26
Deep optical images are often crowded with overlapping objects. We found that this is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores in Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. Our paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.
An improved scheduling algorithm for 3D cluster rendering with platform LSF
NASA Astrophysics Data System (ADS)
Xu, Wenli; Zhu, Yi; Zhang, Liping
2013-10-01
High-quality photorealistic rendering of 3D modeling needs powerful computing systems. On this demand highly efficient management of cluster resources develops fast to exert advantages. This paper is absorbed in the aim of how to improve the efficiency of 3D rendering tasks in cluster. It focuses research on a dynamic feedback load balance (DFLB) algorithm, the work principle of load sharing facility (LSF) and optimization of external scheduler plug-in. The algorithm can be applied into match and allocation phase of a scheduling cycle. Candidate hosts is prepared in sequence in match phase. And the scheduler makes allocation decisions for each job in allocation phase. With the dynamic mechanism, new weight is assigned to each candidate host for rearrangement. The most suitable one will be dispatched for rendering. A new plugin module of this algorithm has been designed and integrated into the internal scheduler. Simulation experiments demonstrate the ability of improved plugin module is superior to the default one for rendering tasks. It can help avoid load imbalance among servers, increase system throughput and improve system utilization.
2014-01-01
Background Accurate genotype calling is a pre-requisite of a successful Genome-Wide Association Study (GWAS). Although most genotyping algorithms can achieve an accuracy rate greater than 99% for genotyping DNA samples without copy number alterations (CNAs), almost all of these algorithms are not designed for genotyping tumor samples that are known to have large regions of CNAs. Results This study aims to develop a statistical method that can accurately genotype tumor samples with CNAs. The proposed method adds a Bayesian layer to a cluster regression model and is termed a Bayesian Cluster Regression-based genotyping algorithm (BCRgt). We demonstrate that high concordance rates with HapMap calls can be achieved without using reference/training samples, when CNAs do not exist. By adding a training step, we have obtained higher genotyping concordance rates, without requiring large sample sizes. When CNAs exist in the samples, accuracy can be dramatically improved in regions with DNA copy loss and slightly improved in regions with copy number gain, comparing with the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM). Conclusions In conclusion, we have demonstrated that BCRgt can provide accurate genotyping calls for tumor samples with CNAs. PMID:24629125
NASA Astrophysics Data System (ADS)
Bansod, Babankumar S.; Pandey, O. P.
2013-05-01
Within-field variability is a well-known phenomenon and its study is at the centre of precision agriculture (PA). In this paper, site-specific spatial variability (SSSV) of apparent Electrical Conductivity (ECa) and crop yield apart from pH, moisture, temperature and di-electric constant information was analyzed to construct spatial distribution maps. Principal component analysis (PCA) and fuzzy c-means (FCM) clustering algorithm were then performed to delineate management zones (MZs). Various performance indices such as Normalized Classification Entropy (NCE) and Fuzzy Performance Index (FPI) were calculated to determine the clustering performance. The geo-referenced sensor data was analyzed for within-field classification. Results revealed that the variables could be aggregated into MZs that characterize spatial variability in soil chemical properties and crop productivity. The resulting classified MZs showed favorable agreement between ECa and crop yield variability pattern. This enables reduction in number of soil analysis needed to create application maps for certain cultivation operations.
Development of a Genetic Algorithm to Automate Clustering of a Dependency Structure Matrix
NASA Technical Reports Server (NTRS)
Rogers, James L.; Korte, John J.; Bilardo, Vincent J.
2006-01-01
Much technology assessment and organization design data exists in Microsoft Excel spreadsheets. Tools are needed to put this data into a form that can be used by design managers to make design decisions. One need is to cluster data that is highly coupled. Tools such as the Dependency Structure Matrix (DSM) and a Genetic Algorithm (GA) can be of great benefit. However, no tool currently combines the DSM and a GA to solve the clustering problem. This paper describes a new software tool that interfaces a GA written as an Excel macro with a DSM in spreadsheet format. The results of several test cases are included to demonstrate how well this new tool works.
CLUSTAG & WCLUSTAG: Hierarchical Clustering Algorithms for Efficient Tag-SNP Selection
NASA Astrophysics Data System (ADS)
Ao, Sio-Iong
More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clustering algorithms have been proposed for efficient tag SNP selection.
Reduced-cost sparsity-exploiting algorithm for solving coupled-cluster equations.
Brabec, Jiri; Yang, Chao; Epifanovsky, Evgeny; Krylov, Anna I; Ng, Esmond
2016-05-01
We present an algorithm for reducing the computational work involved in coupled-cluster (CC) calculations by sparsifying the amplitude correction within a CC amplitude update procedure. We provide a theoretical justification for this approach, which is based on the convergence theory of inexact Newton iterations. We demonstrate by numerical examples that, in the simplest case of the CCD equations, we can sparsify the amplitude correction by setting, on average, roughly 90% nonzero elements to zeros without a major effect on the convergence of the inexact Newton iterations. PMID:26804120
Multispectral image classification of MRI data using an empirically-derived clustering algorithm
Horn, K.M.; Osbourn, G.C.; Bouchard, A.M.; Sanders, J.A. |
1998-08-01
Multispectral image analysis of magnetic resonance imaging (MRI) data has been performed using an empirically-derived clustering algorithm. This algorithm groups image pixels into distinct classes which exhibit similar response in the T{sub 2} 1st and 2nd-echo, and T{sub 1} (with ad without gadolinium) MRI images. The grouping is performed in an n-dimensional mathematical space; the n-dimensional volumes bounding each class define each specific tissue type. The classification results are rendered again in real-space by colored-coding each grouped class of pixels (associated with differing tissue types). This classification method is especially well suited for class volumes with complex boundary shapes, and is also expected to robustly detect abnormal tissue classes. The classification process is demonstrated using a three dimensional data set of MRI scans of a human brain tumor.
Meanie3D - a mean-shift based, multivariate, multi-scale clustering and tracking algorithm
NASA Astrophysics Data System (ADS)
Simon, Jürgen-Lorenz; Malte, Diederich; Silke, Troemel
2014-05-01
Project OASE is the one of 5 work groups at the HErZ (Hans Ertel Centre for Weather Research), an ongoing effort by the German weather service (DWD) to further research at Universities concerning weather prediction. The goal of project OASE is to gain an object-based perspective on convective events by identifying them early in the onset of convective initiation and follow then through the entire lifecycle. The ability to follow objects in this fashion requires new ways of object definition and tracking, which incorporate all the available data sets of interest, such as Satellite imagery, weather Radar or lightning counts. The Meanie3D algorithm provides the necessary tool for this purpose. Core features of this new approach to clustering (object identification) and tracking are the ability to identify objects using the mean-shift algorithm applied to a multitude of variables (multivariate), as well as the ability to detect objects on various scales (multi-scale) using elements of Scale-Space theory. The algorithm works in 2D as well as 3D without modifications. It is an extension of a method well known from the field of computer vision and image processing, which has been tailored to serve the needs of the meteorological community. In spite of the special application to be demonstrated here (like convective initiation), the algorithm is easily tailored to provide clustering and tracking for a wide class of data sets and problems. In this talk, the demonstration is carried out on two of the OASE group's own composite sets. One is a 2D nationwide composite of Germany including C-Band Radar (2D) and Satellite information, the other a 3D local composite of the Bonn/Jülich area containing a high-resolution 3D X-Band Radar composite.
Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Fong, Simon
2012-01-01
Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm. PMID:22619492
A computational algorithm for functional clustering of proteome dynamics during development.
Wang, Yaqun; Wang, Ningtao; Hao, Han; Guo, Yunqian; Zhen, Yan; Shi, Jisen; Wu, Rongling
2014-06-01
Phenotypic traits, such as seed development, are a consequence of complex biochemical interactions among genes, proteins and metabolites, but the underlying mechanisms that operate in a coordinated and sequential manner remain elusive. Here, we address this issue by developing a computational algorithm to monitor proteome changes during the course of trait development. The algorithm is built within the mixture-model framework in which each mixture component is modeled by a specific group of proteins that display a similar temporal pattern of expression in trait development. A nonparametric approach based on Legendre orthogonal polynomials was used to fit dynamic changes of protein expression, increasing the power and flexibility of protein clustering. By analyzing a dataset of proteomic dynamics during early embryogenesis of the Chinese fir, the algorithm has successfully identified several distinct types of proteins that coordinate with each other to determine seed development in this forest tree commercially and environmentally important to China. The algorithm will find its immediate applications for the characterization of mechanistic underpinnings for any other biological processes in which protein abundance plays a key role. PMID:24955031
Fong, Simon
2012-01-01
Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm. PMID:22619492
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-01-01
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network’s running and the degree of candidate nodes’ effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-01-01
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network's running and the degree of candidate nodes' effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
Farah, Ihsen; Nguyen, Thi Nguyet Que; Groh, Audrey; Guenot, Dominique; Jeannesson, Pierre; Gobinet, Cyril
2016-05-23
The coupling between Fourier-transform infrared (FTIR) imaging and unsupervised classification is effective in revealing the different structures of human tissues based on their specific biomolecular IR signatures; thus the spectral histology of the studied samples is achieved. However, the most widely applied clustering methods in spectral histology are local search algorithms, which converge to a local optimum, depending on initialization. Multiple runs of the techniques estimate multiple different solutions. Here, we propose a memetic algorithm, based on a genetic algorithm and a k-means clustering refinement, to perform optimal clustering. In addition, this approach was applied to the acquired FTIR images of normal human colon tissues originating from five patients. The results show the efficiency of the proposed memetic algorithm to achieve the optimal spectral histology of these samples, contrary to k-means. PMID:27110605
KANTS: a stigmergic ant algorithm for cluster analysis and swarm art.
Fernandes, Carlos M; Mora, Antonio M; Merelo, Juan J; Rosa, Agostinho C
2014-06-01
KANTS is a swarm intelligence clustering algorithm inspired by the behavior of social insects. It uses stigmergy as a strategy for clustering large datasets and, as a result, displays a typical behavior of complex systems: self-organization and global patterns emerging from the local interaction of simple units. This paper introduces a simplified version of KANTS and describes recent experiments with the algorithm in the context of a contemporary artistic and scientific trend called swarm art, a type of generative art in which swarm intelligence systems are used to create artwork or ornamental objects. KANTS is used here for generating color drawings from the input data that represent real-world phenomena, such as electroencephalogram sleep data. However, the main proposal of this paper is an art project based on well-known abstract paintings, from which the chromatic values are extracted and used as input. Colors and shapes are therefore reorganized by KANTS, which generates its own interpretation of the original artworks. The project won the 2012 Evolutionary Art, Design, and Creativity Competition. PMID:23912505
A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm
de Brito, Daniel M.; Maracaja-Coutinho, Vinicius; de Farias, Savio T.; Batista, Leonardo V.; do Rêgo, Thaís G.
2016-01-01
Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP—Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657
Are judgments a form of data clustering? Reexamining contrast effects with the k-means algorithm.
Boillaud, Eric; Molina, Guylaine
2015-04-01
A number of theories have been proposed to explain in precise mathematical terms how statistical parameters and sequential properties of stimulus distributions affect category ratings. Various contextual factors such as the mean, the midrange, and the median of the stimuli; the stimulus range; the percentile rank of each stimulus; and the order of appearance have been assumed to influence judgmental contrast. A data clustering reinterpretation of judgmental relativity is offered wherein the influence of the initial choice of centroids on judgmental contrast involves 2 combined frequency and consistency tendencies. Accounts of the k-means algorithm are provided, showing good agreement with effects observed on multiple distribution shapes and with a variety of interaction effects relating to the number of stimuli, the number of response categories, and the method of skewing. Experiment 1 demonstrates that centroid initialization accounts for contrast effects obtained with stretched distributions. Experiment 2 demonstrates that the iterative convergence inherent to the k-means algorithm accounts for the contrast reduction observed across repeated blocks of trials. The concept of within-cluster variance minimization is discussed, as is the applicability of a backward k-means calculation method for inferring, from empirical data, the values of the centroids that would serve as a representation of the judgmental context. PMID:25706770
A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.
de Brito, Daniel M; Maracaja-Coutinho, Vinicius; de Farias, Savio T; Batista, Leonardo V; do Rêgo, Thaís G
2016-01-01
Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657
A contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation
Theiler, J.; Gisler, G.
1997-07-01
The recent and continuing construction of multi and hyper spectral imagers will provide detailed data cubes with information in both the spatial and spectral domain. This data shows great promise for remote sensing applications ranging from environmental and agricultural to national security interests. The reduction of this voluminous data to useful intermediate forms is necessary both for downlinking all those bits and for interpreting them. Smart onboard hardware is required, as well as sophisticated earth bound processing. A segmented image (in which the multispectral data in each pixel is classified into one of a small number of categories) is one kind of intermediate form which provides some measure of data compression. Traditional image segmentation algorithms treat pixels independently and cluster the pixels according only to their spectral information. This neglects the implicit spatial information that is available in the image. We will suggest a simple approach; a variant of the standard k-means algorithm which uses both spatial and spectral properties of the image. The segmented image has the property that pixels which are spatially contiguous are more likely to be in the same class than are random pairs of pixels. This property naturally comes at some cost in terms of the compactness of the clusters in the spectral domain, but we have found that the spatial contiguity and spectral compactness properties are nearly orthogonal, which means that we can make considerable improvements in the one with minimal loss in the other.
Analysis Clustering of Electricity Usage Profile Using K-Means Algorithm
NASA Astrophysics Data System (ADS)
Amri, Yasirli; Lailatul Fadhilah, Amanda; Fatmawati; Setiani, Novi; Rani, Septia
2016-01-01
Electricity is one of the most important needs for human life in many sectors. Demand for electricity will increase in line with population and economic growth. Adjustment of the amount of electricity production in specified time is important because the cost of storing electricity is expensive. For handling this problem, we need knowledge about the electricity usage pattern of clients. This pattern can be obtained by using clustering techniques. In this paper, clustering is used to obtain the similarity of electricity usage patterns in a specified time. We use K-Means algorithm to employ clustering on the dataset of electricity consumption from 370 clients that collected in a year. Result of this study, we obtained an interesting pattern that there is a big group of clients consume the lowest electric load in spring season, but in another group, the lowest electricity consumption occurred in winter season. From this result, electricity provider can make production planning in specified season based on pattern of electricity usage profile.
Jiang, Peng; Xu, Yiming; Wu, Feng
2016-01-01
Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption. PMID:26784193
Jiang, Peng; Xu, Yiming; Wu, Feng
2016-01-01
Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption. PMID:26784193
Xue, Zhong; Shen, Dinggang; Li, Hai; Wong, Stephen
2010-01-01
The traditional fuzzy clustering algorithm and its extensions have been successfully applied in medical image segmentation. However, because of the variability of tissues and anatomical structures, the clustering results might be biased by the tissue population and intensity differences. For example, clustering-based algorithms tend to over-segment white matter tissues of MR brain images. To solve this problem, we introduce a tissue probability map constrained clustering algorithm and apply it to serial MR brain image segmentation, i.e., a series of 3-D MR brain images of the same subject at different time points. Using the new serial image segmentation algorithm in the framework of the CLASSIC framework, which iteratively segments the images and estimates the longitudinal deformations, we improved both accuracy and robustness for serial image computing, and at the mean time produced longitudinally consistent segmentation and stable measures. In the algorithm, the tissue probability maps consist of both the population-based and subject-specific segmentation priors. Experimental study using both simulated longitudinal MR brain data and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data confirmed that using both priors more accurate and robust segmentation results can be obtained. The proposed algorithm can be applied in longitudinal follow up studies of MR brain imaging with subtle morphological changes for neurological disorders. PMID:26566399
Falcon: neural fuzzy control and decision systems using FKP and PFKP clustering algorithms.
Tung, W L; Quek, C
2004-02-01
Neural fuzzy networks proposed in the literature can be broadly classified into two groups. The first group is essentially fuzzy systems with self-tuning capabilities and requires an initial rule base to be specified prior to training. The second group of neural fuzzy networks, on the other hand, is able to automatically formulate the fuzzy rules from the numerical training data. Examples are the Falcon-ART, and the POPFNN family of networks. A cluster analysis is first performed on the training data and the fuzzy rules are subsequently derived through the proper connections of these computed clusters. This correspondence proposes two new networks: Falcon-FKP and Falcon-PFKP. They are extensions of the Falcon-ART network, and aimed to overcome the shortcomings faced by the Falcon-ART network itself, i.e., poor classification ability when the classes of input data are very similar to each other, termination of training cycle depends heavily on a preset error parameter, the fuzzy rule base of the Falcon-ART network may not be consistent Nauck, there is no control over the number of fuzzy rules generated, and learning efficiency may deteriorate by using complementarily coded training data. These deficiencies are essentially inherent to the fuzzy ART, clustering technique employed by the Falcon-ART network. Hence, two clustering techniques--Fuzzy Kohonen Partitioning (FKP) and its pseudo variant PFKP, are synthesized with the basic Falcon structure to compute the fuzzy sets and to automatically derive the fuzzy rules from the training data. The resultant neural fuzzy networks are Falcon-FKP and Falcon-PFKP, respectively. These two proposed networks have a lean and efficient training algorithm and consistent fuzzy rule bases. Extensive simulations are conducted using the two networks and their performances are encouraging when benchmarked against other neural and neural fuzzy systems. PMID:15369109
`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny
NASA Astrophysics Data System (ADS)
Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila
2010-10-01
Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.
Clustering by fuzzy neural gas and evaluation of fuzzy clusters.
Geweniger, Tina; Fischer, Lydia; Kaden, Marika; Lange, Mandy; Villmann, Thomas
2013-01-01
We consider some modifications of the neural gas algorithm. First, fuzzy assignments as known from fuzzy c-means and neighborhood cooperativeness as known from self-organizing maps and neural gas are combined to obtain a basic Fuzzy Neural Gas. Further, a kernel variant and a simulated annealing approach are derived. Finally, we introduce a fuzzy extension of the ConnIndex to obtain an evaluation measure for clusterings based on fuzzy vector quantization. PMID:24396342
Structure of neutral aluminum clusters Aln (2⩽n⩽23) : Genetic algorithm tight-binding calculations
NASA Astrophysics Data System (ADS)
Chuang, Feng-Chuan; Wang, C. Z.; Ho, K. H.
2006-03-01
We performed global structural optimizations for neutral aluminum clusters Aln ( n up to 23) using a genetic algorithm (GA) coupled with a tight-binding interatomic potential. Structural candidates obtained from our GA search were further optimized by using first-principles total energy calculations. We report the lowest energy structures of neutral Aln (n=2-23) . We found that the icosahedral structure of Al13 serves as the core for the growth of aluminum clusters from Al14 to Al18 .
OpenACC programs of the Swendsen-Wang multi-cluster spin flip algorithm
NASA Astrophysics Data System (ADS)
Komura, Yukihiro
2015-12-01
We present sample OpenACC programs of the Swendsen-Wang multi-cluster spin flip algorithm. OpenACC is a directive-based programming model for accelerators without requiring modification to the underlying CPU code itself. In this paper, we deal with the classical spin models as with the sample CUDA programs (Komura and Okabe, 2014), that is, two-dimensional (2D) Ising model, three-dimensional (3D) Ising model, 2D Potts model, 3D Potts model, 2D XY model and 3D XY model. We explain the details of sample OpenACC programs and compare the performance of the present OpenACC implementations with that of the CUDA implementations for the 2D and 3D Ising models and the 2D and 3D XY models.
Wu, Jia-Rui; Guo, Wei-Xian; Zhang, Xiao-Meng; Yang, Bing; Zhang, Bing
2014-02-01
Based on the data mining methods of association rules and clustering algorithm, the 188 prescriptions for cough that built by Yan Zhenghua were collected and analyzed to get the frequency of drug usage and the relationship between drugs. From which we could conclude the experiences of Yan Zhenghua for the treatment of cough. The results of the analysis were that 20 core combinations were dig out, such as Bambusae Caulis in Taenias-Almond-Sactmarsh Aster. And there were 10 new prescriptions were found out, such as Sactmarsh Aster-Scutellariae Radix-Album Viscum-Bambusae Caulis in Taenian-Eriobotryae Folium. The results of the analysis were proved that Yan Zhenghua was good at curing cough by using the traditional Chinese medicine that can dispel wind and heat from the body, and remove heat from the lung to relieve cough. PMID:25204134
2016-01-01
The early diagnosis of breast cancer is an important step in a fight against the disease. Machine learning techniques have shown promise in improving our understanding of the disease. As medical datasets consist of data points which cannot be precisely assigned to a class, fuzzy methods have been useful for studying of these datasets. Sometimes breast cancer datasets are described by categorical features. Many fuzzy clustering algorithms have been developed for categorical datasets. However, in most of these methods Hamming distance is used to define the distance between the two categorical feature values. In this paper, we use a probabilistic distance measure for the distance computation among a pair of categorical feature values. Experiments demonstrate that the distance measure performs better than Hamming distance for Wisconsin breast cancer data. PMID:27022504
NASA Astrophysics Data System (ADS)
Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok
2015-01-01
This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, 'Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.
Kanters, René P F; Donald, Kelling J
2014-12-01
A new flexible implementation of a genetic algorithm for locating unique low energy minima of isomers of clusters is described and tested. The strategy employed can be applied to molecular or atomic clusters and has a flexible input structure so that a system with several different elements can be built up from a set of individual atoms or from fragments made up of groups of atoms. This cluster program is tested on several systems, and the results are compared to computational and experimental data from previous studies. The quality of the algorithm for locating reliably the most competitive low energy structures of an assembly of atoms is examined for strongly bound Si-Li clusters, and ZnF2 clusters, and the more weakly interacting water trimers. The use of the nuclear repulsion energy as a duplication criterion, an increasing population size, and avoiding mutation steps without loss of efficacy are distinguishing features of the program. For the Si-Li clusters, a few new low energy minima are identified in the testing of the algorithm, and our results for the metal fluorides and water show very good agreement with the literature. PMID:26583254
2011-01-01
Background With the completion of the international HapMap project, many studies have been conducted to investigate the association between complex diseases and haplotype variants. Such haplotype-based association studies, however, often face two difficulties; one is the large number of haplotype configurations in the chromosome region under study, and the other is the ambiguity in haplotype phase when only genotype data are observed. The latter complexity may be handled based on an EM algorithm with family data incorporated, whereas the former can be more problematic, especially when haplotypes of rare frequencies are involved. Here based on family data we propose to cluster long haplotypes of linked SNPs in a biological sense, so that the number of haplotypes can be reduced and the power of statistical tests of association can be increased. Results In this paper we employ family genotype data and combine a clustering scheme with a likelihood ratio statistic to test the association between quantitative phenotypes and haplotype variants. Haplotypes are first grouped based on their evolutionary closeness to establish a set containing core haplotypes. Then, we construct for each family the transmission and non-transmission phase in terms of these core haplotypes, taking into account simultaneously the phase ambiguity as weights. The likelihood ratio test (LRT) is next conducted with these weighted and clustered haplotypes to test for association with disease. This combination of evolution-guided haplotype clustering and weighted assignment in LRT is able, via its core-coding system, to incorporate into analysis both haplotype phase ambiguity and transmission uncertainty. Simulation studies show that this proposed procedure is more informative and powerful than three family-based association tests, FAMHAP, FBAT, and an LRT with a group consisting exclusively of rare haplotypes. Conclusions The proposed procedure takes into account the uncertainty in phase determination and in transmission, utilizes the evolutionary information contained in haplotypes, reduces the dimension in haplotype space and the degrees of freedom in tests, and performs better in association studies. This evolution-guided clustering procedure is particularly useful for long haplotypes containing linked SNPs, and is applicable to other haplotype-based association tests. This procedure is now implemented in R and is free for download. PMID:21592403
NASA Astrophysics Data System (ADS)
Bagheripour, Parisa; Asoodeh, Mojtaba
2013-12-01
Porosity, the void portion of reservoir rocks, determines the volume of hydrocarbon accumulation and has a great control on assessment and development of hydrocarbon reservoirs. Accurate determination of porosity from core analysis is highly cost, time, and labor intensive. Therefore, the mission of finding an accurate, fast and cheap way of determining porosity is unavoidable. On the other hand, conventional well log data, available in almost all wells contain invaluable implicit information about the porosity. Therefore, an intelligent system can explicate this information. Fuzzy logic is a powerful tool for handling geosciences problem which is associated with uncertainty. However, determination of the best fuzzy formulation is still an issue. This study purposes an improved strategy, called hybrid genetic algorithm-pattern search (GA-PS) technique, against the widely held subtractive clustering (SC) method for setting up fuzzy rules between core porosity and petrophysical logs. Hybrid GA-PS technique is capable of extracting optimal parameters for fuzzy clusters (membership functions) which consequently results in the best fuzzy formulation. Results indicate that GA-PS technique manipulates both mean and variance of Gaussian membership functions contrary to SC that only has a control on mean of Gaussian membership functions. A comparison between hybrid GA-PS technique and SC method confirmed the superiority of GA-PS technique in setting up fuzzy rules. The proposed strategy was successfully applied to one of the Iranian carbonate reservoir rocks.
Cancer Subtype Discovery and Biomarker Identification via a New Robust Network Clustering Algorithm
Wu, Meng-Yun; Dai, Dao-Qing; Zhang, Xiao-Fei; Zhu, Yuan
2013-01-01
In cancer biology, it is very important to understand the phenotypic changes of the patients and discover new cancer subtypes. Recently, microarray-based technologies have shed light on this problem based on gene expression profiles which may contain outliers due to either chemical or electrical reasons. These undiscovered subtypes may be heterogeneous with respect to underlying networks or pathways, and are related with only a few of interdependent biomarkers. This motivates a need for the robust gene expression-based methods capable of discovering such subtypes, elucidating the corresponding network structures and identifying cancer related biomarkers. This study proposes a penalized model-based Students t clustering with unconstrained covariance (PMT-UC) to discover cancer subtypes with cluster-specific networks, taking gene dependencies into account and having robustness against outliers. Meanwhile, biomarker identification and network reconstruction are achieved by imposing an adaptive penalty on the means and the inverse scale matrices. The model is fitted via the expectation maximization algorithm utilizing the graphical lasso. Here, a network-based gene selection criterion that identifies biomarkers not as individual genes but as subnetworks is applied. This allows us to implicate low discriminative biomarkers which play a central role in the subnetwork by interconnecting many differentially expressed genes, or have cluster-specific underlying network structures. Experiment results on simulated datasets and one available cancer dataset attest to the effectiveness, robustness of PMT-UC in cancer subtype discovering. Moveover, PMT-UC has the ability to select cancer related biomarkers which have been verified in biochemical or biomedical research and learn the biological significant correlation among genes. PMID:23799085
Collaborative fuzzy clustering from multiple weighted views.
Jiang, Yizhang; Chung, Fu-Lai; Wang, Shitong; Deng, Zhaohong; Wang, Jun; Qian, Pengjiang
2015-04-01
Clustering with multiview data is becoming a hot topic in data mining, pattern recognition, and machine learning. In order to realize an effective multiview clustering, two issues must be addressed, namely, how to combine the clustering result from each view and how to identify the importance of each view. In this paper, based on a newly proposed objective function which explicitly incorporates two penalty terms, a basic multiview fuzzy clustering algorithm, called collaborative fuzzy c-means (Co-FCM), is firstly proposed. It is then extended into its weighted view version, called weighted view collaborative fuzzy c-means (WV-Co-FCM), by identifying the importance of each view. The WV-Co-FCM algorithm indeed tackles the above two issues simultaneously. Its relationship with the latest multiview fuzzy clustering algorithm Collaborative Fuzzy K-Means (Co-FKM) is also revealed. Extensive experimental results on various multiview datasets indicate that the proposed WV-Co-FCM algorithm outperforms or is at least comparable to the existing state-of-the-art multitask and multiview clustering algorithms and the importance of different views of the datasets can be effectively identified. PMID:25069132
NASA Astrophysics Data System (ADS)
Ashton, Douglas J.; Liu, Jiwen; Luijten, Erik; Wilding, Nigel B.
2010-11-01
Highly size-asymmetrical fluid mixtures arise in a variety of physical contexts, notably in suspensions of colloidal particles to which much smaller particles have been added in the form of polymers or nanoparticles. Conventional schemes for simulating models of such systems are hamstrung by the difficulty of relaxing the large species in the presence of the small one. Here we describe how the rejection-free geometrical cluster algorithm of Liu and Luijten [J. Liu and E. Luijten, Phys. Rev. Lett. 92, 035504 (2004)] can be embedded within a restricted Gibbs ensemble to facilitate efficient and accurate studies of fluid phase behavior of highly size-asymmetrical mixtures. After providing a detailed description of the algorithm, we summarize the bespoke analysis techniques of [Ashton et al., J. Chem. Phys. 132, 074111 (2010)] that permit accurate estimates of coexisting densities and critical-point parameters. We apply our methods to study the liquid-vapor phase diagram of a particular mixture of Lennard-Jones particles having a 10:1 size ratio. As the reservoir volume fraction of small particles is increased in the range of 0%-5%, the critical temperature decreases by approximately 50%, while the critical density drops by some 30%. These trends imply that in our system, adding small particles decreases the net attraction between large particles, a situation that contrasts with hard-sphere mixtures where an attractive depletion force occurs.
Ashton, Douglas J; Liu, Jiwen; Luijten, Erik; Wilding, Nigel B
2010-11-21
Highly size-asymmetrical fluid mixtures arise in a variety of physical contexts, notably in suspensions of colloidal particles to which much smaller particles have been added in the form of polymers or nanoparticles. Conventional schemes for simulating models of such systems are hamstrung by the difficulty of relaxing the large species in the presence of the small one. Here we describe how the rejection-free geometrical cluster algorithm of Liu and Luijten [J. Liu and E. Luijten, Phys. Rev. Lett. 92, 035504 (2004)] can be embedded within a restricted Gibbs ensemble to facilitate efficient and accurate studies of fluid phase behavior of highly size-asymmetrical mixtures. After providing a detailed description of the algorithm, we summarize the bespoke analysis techniques of [Ashton et al., J. Chem. Phys. 132, 074111 (2010)] that permit accurate estimates of coexisting densities and critical-point parameters. We apply our methods to study the liquid-vapor phase diagram of a particular mixture of Lennard-Jones particles having a 10:1 size ratio. As the reservoir volume fraction of small particles is increased in the range of 0%-5%, the critical temperature decreases by approximately 50%, while the critical density drops by some 30%. These trends imply that in our system, adding small particles decreases the net attraction between large particles, a situation that contrasts with hard-sphere mixtures where an attractive depletion force occurs. PMID:21090849
Study of cluster reconstruction and track fitting algorithms for CGEM-IT at BESIII
NASA Astrophysics Data System (ADS)
Yue, Guo; Liang-Liang, Wang; Xu-Dong, Ju; Ling-Hui, Wu; Qing-Lei, Xiu; Hai-Xia, Wang; Ming-Yi, Dong; Jing-Ran, Hu; Wei-Dong, Li; Wei-Guo, Li; Huai-Min, Liu; Qun, Ou-Yang; Xiao-Yan, Shen; Ye, Yuan; Yao, Zhang
2016-01-01
Considering the effects of aging on the existing Inner Drift Chamber (IDC) of BESIII, a GEM-based inner tracker, the Cylindrical-GEM Inner Tracker (CGEM-IT), is proposed to be designed and constructed as an upgrade candidate for the IDC. This paper introduces a full simulation package for the CGEM-IT with a simplified digitization model, and describes the development of software for cluster reconstruction and track fitting, using a track fitting algorithm based on the Kalman filter method. Preliminary results for the reconstruction algorithms which are obtained using a Monte Carlo sample of single muon events in the CGEM-IT, show that the CGEM-IT has comparable momentum resolution and transverse vertex resolution to the IDC, and a better z-direction resolution than the IDC. Supported by National Key Basic Research Program of China (2015CB856700), National Natural Science Foundation of China (11205184, 11205182) and Joint Funds of National Natural Science Foundation of China (U1232201)
Kristensen, David M.; Kannan, Lavanya; Coleman, Michael K.; Wolf, Yuri I.; Sorokin, Alexander; Koonin, Eugene V.; Mushegian, Arcady
2010-01-01
Motivation: Identifying orthologous genes in multiple genomes is a fundamental task in comparative genomics. Construction of intergenomic symmetrical best matches (SymBets) and joining them into clusters is a popular method of ortholog definition, embodied in several software programs. Despite their wide use, the computational complexity of these programs has not been thoroughly examined. Results: In this work, we show that in the standard approach of iteration through all triangles of SymBets, the memory scales with at least the number of these triangles, O(g3) (where g = number of genomes), and construction time scales with the iteration through each pair, i.e. O(g6). We propose the EdgeSearch algorithm that iterates over edges in the SymBet graph rather than triangles of SymBets, and as a result has a worst-case complexity of only O(g3log g). Several optimizations reduce the run-time even further in realistically sparse graphs. In two real-world datasets of genomes from bacteriophages (POGs) and Mollicutes (MOGs), an implementation of the EdgeSearch algorithm runs about an order of magnitude faster than the original algorithm and scales much better with increasing number of genomes, with only minor differences in the final results, and up to 60 times faster than the popular OrthoMCL program with a 90% overlap between the identified groups of orthologs. Availability and implementation: C++ source code freely available for download at ftp.ncbi.nih.gov/pub/wolf/COGs/COGsoft/ Contact: dmk@stowers.org Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:20439257
NASA Technical Reports Server (NTRS)
Dasarathy, B. V.
1976-01-01
An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.
Wang, Xueyi
2011-01-01
The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 106 records and 104 dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces. PMID:22247818
Marchal, Rémi; Carbonnière, Philippe; Pouchan, Claude
2015-01-22
The study of atomic clusters has become an increasingly active area of research in the recent years because of the fundamental interest in studying a completely new area that can bridge the gap between atomic and solid state physics. Due to their specific properties, such compounds are of great interest in the field of nanotechnology [1,2]. Here, we would present our GSAM algorithm based on a DFT exploration of the PES to find the low lying isomers of such compounds. This algorithm includes the generation of an intial set of structure from which the most relevant are selected. Moreover, an optimization process, called raking optimization, able to discard step by step all the non physically reasonnable configurations have been implemented to reduce the computational cost of this algorithm. Structural properties of Ga{sub n}Asm clusters will be presented as an illustration of the method.
Possibilistic clustering for shape recognition
NASA Technical Reports Server (NTRS)
Keller, James M.; Krishnapuram, Raghu
1993-01-01
Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, the clustering problem was cast into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. An appropriate objective function whose minimum will characterize a good possibilistic partition of the data was constructed, and the membership and prototype update equations from necessary conditions for minimization of our criterion function were derived. The ability of this approach to detect linear and quartic curves in the presence of considerable noise is shown.
NASA Astrophysics Data System (ADS)
Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin
2011-04-01
To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.
Blood vessel extraction and optic disc removal using curvelet transform and kernel fuzzy c-means.
Kar, Sudeshna Sil; Maity, Santi P
2016-03-01
This paper proposes an automatic blood vessel extraction method on retinal images using matched filtering in an integrated system design platform that involves curvelet transform and kernel based fuzzy c-means. Since curvelet transform represents the lines, the edges and the curvatures very well and in compact form (by less number of coefficients) compared to other multi-resolution techniques, this paper uses curvelet transform for enhancement of the retinal vasculature. Matched filtering is then used to intensify the blood vessels' response which is further employed by kernel based fuzzy c-means algorithm that extracts the vessel silhouette from the background through non-linear mapping. For pathological images, in addition to matched filtering, Laplacian of Gaussian filter is also employed to distinguish the step and the ramp like signal from that of vessel structure. To test the efficacy of the proposed method, the algorithm has also been applied to images in presence of additive white Gaussian noise where the curvelet transform has been used for image denoising. Performance is evaluated on publicly available DRIVE, STARE and DIARETDB1 databases and is compared with the large number of existing blood vessel extraction methodologies. Simulation results demonstrate that the proposed method is very much efficient in detecting the long and the thick as well as the short and the thin vessels with an average accuracy of 96.16% for the DRIVE and 97.35% for the STARE database wherein the existing methods fail to extract the tiny and the thin vessels. PMID:26848729
NASA Technical Reports Server (NTRS)
Dasarathy, B. V.
1976-01-01
Learning of discriminant hyperplanes in imperfectly supervised or unsupervised training sample sets with unreliably labeled samples along the fuzzy joint boundaries between sample clusters is discussed, with the discriminant hyperplane designed to be a least-squares fit to the unreliably labeled data points. (Samples along the fuzzy boundary jump back and forth from one cluster to the other in recursive cluster stabilization and are considered unreliably labeled.) Minimization of the distances of these unreliably labeled samples from the hyperplanes does not sacrifice the ability to discriminate between classes represented by reliably labeled subsets of samples. An equivalent unconstrained linear inequality problem is formulated and algorithms for its solution are indicated. Landsat earth sensing data were used in confirming the validity and computational feasibility of the approach, which should be useful in deriving discriminant hyperplanes separating clusters with fuzzy boundaries, given supervised training sample sets with unreliably labeled boundary samples.
Fleisch, Markus C.; Maxell, Christopher A.; Kuper, Claudia K.; Brown, Erika T.; Parvin, Bahram; Barcellos-Hoff, Mary-Helen; Costes,Sylvain V.
2006-03-08
Centrosomes are small organelles that organize the mitoticspindle during cell division and are also involved in cell shape andpolarity. Within epithelial tumors, such as breast cancer, and somehematological tumors, centrosome abnormalities (CA) are common, occurearly in disease etiology, and correlate with chromosomal instability anddisease stage. In situ quantification of CA by optical microscopy ishampered by overlap and clustering of these organelles, which appear asfocal structures. CA has been frequently associated with Tp53 status inpremalignant lesions and tumors. Here we describe an approach toaccurately quantify centrosomes in tissue sections and tumors.Considering proliferation and baseline amplification rate the resultingpopulation based ratio of centrosomes per nucleus allow the approximationof the proportion of cells with CA. Using this technique we show that20-30 percent of cells have amplified centrosomes in Tp53 null mammarytumors. Combining fluorescence detection, deconvolution microscopy and amathematical algorithm applied to a maximum intensity projection we showthat this approach is superior to traditional investigator based visualanalysis or threshold-based techniques.
Rough-fuzzy clustering for grouping functionally similar genes from microarray data.
Maji, Pradipta; Paul, Sushmita
2013-01-01
Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets. PMID:22848138
Cloud classification from satellite data using a fuzzy sets algorithm: A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1988-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
An Algorithm for Testing Unidimensionality and Clustering Items in Rasch Measurement
ERIC Educational Resources Information Center
Debelak, Rudolf; Arendasy, Martin
2012-01-01
A new approach to identify item clusters fitting the Rasch model is described and evaluated using simulated and real data. The proposed method is based on hierarchical cluster analysis and constructs clusters of items that show a good fit to the Rasch model. It thus gives an estimate of the number of independent scales satisfying the postulates of…
Yang, Liu; Lu, Yinzhi; Zhong, Yuanchang; Wu, Xuegang; Yang, Simon X.
2015-01-01
Energy resource limitation is a severe problem in traditional wireless sensor networks (WSNs) because it restricts the lifetime of network. Recently, the emergence of energy harvesting techniques has brought with them the expectation to overcome this problem. In particular, it is possible for a sensor node with energy harvesting abilities to work perpetually in an Energy Neutral state. In this paper, a Multi-hop Energy Neutral Clustering (MENC) algorithm is proposed to construct the optimal multi-hop clustering architecture in energy harvesting WSNs, with the goal of achieving perpetual network operation. All cluster heads (CHs) in the network act as routers to transmit data to base station (BS) cooperatively by a multi-hop communication method. In addition, by analyzing the energy consumption of intra- and inter-cluster data transmission, we give the energy neutrality constraints. Under these constraints, every sensor node can work in an energy neutral state, which in turn provides perpetual network operation. Furthermore, the minimum network data transmission cycle is mathematically derived using convex optimization techniques while the network information gathering is maximal. Simulation results show that our protocol can achieve perpetual network operation, so that the consistent data delivery is guaranteed. In addition, substantial improvements on the performance of network throughput are also achieved as compared to the famous traditional clustering protocol LEACH and recent energy harvesting aware clustering protocols. PMID:26712764
Yang, Liu; Lu, Yinzhi; Zhong, Yuanchang; Wu, Xuegang; Yang, Simon X
2015-01-01
Energy resource limitation is a severe problem in traditional wireless sensor networks (WSNs) because it restricts the lifetime of network. Recently, the emergence of energy harvesting techniques has brought with them the expectation to overcome this problem. In particular, it is possible for a sensor node with energy harvesting abilities to work perpetually in an Energy Neutral state. In this paper, a Multi-hop Energy Neutral Clustering (MENC) algorithm is proposed to construct the optimal multi-hop clustering architecture in energy harvesting WSNs, with the goal of achieving perpetual network operation. All cluster heads (CHs) in the network act as routers to transmit data to base station (BS) cooperatively by a multi-hop communication method. In addition, by analyzing the energy consumption of intra- and inter-cluster data transmission, we give the energy neutrality constraints. Under these constraints, every sensor node can work in an energy neutral state, which in turn provides perpetual network operation. Furthermore, the minimum network data transmission cycle is mathematically derived using convex optimization techniques while the network information gathering is maximal. Simulation results show that our protocol can achieve perpetual network operation, so that the consistent data delivery is guaranteed. In addition, substantial improvements on the performance of network throughput are also achieved as compared to the famous traditional clustering protocol LEACH and recent energy harvesting aware clustering protocols. PMID:26712764
NASA Technical Reports Server (NTRS)
1975-01-01
The implementation of the algorithms used in the flight program to approximate elementary functions and mathematical procedures was checked. This was done by verifying that at least one, and in most cases, more than one function computed through the use of the algorithms was calculated properly. The following algorithms were checked: sine-cosine, arctangent, natural logarithm, square root, inverse square root, as well as the vector dot and cross products.
Automatic segmentation of corpus callosum using Gaussian mixture modeling and Fuzzy C means methods.
İçer, Semra
2013-10-01
This paper presents a comparative study of the success and performance of the Gaussian mixture modeling and Fuzzy C means methods to determine the volume and cross-sectionals areas of the corpus callosum (CC) using simulated and real MR brain images. The Gaussian mixture model (GMM) utilizes weighted sum of Gaussian distributions by applying statistical decision procedures to define image classes. In the Fuzzy C means (FCM), the image classes are represented by certain membership function according to fuzziness information expressing the distance from the cluster centers. In this study, automatic segmentation for midsagittal section of the CC was achieved from simulated and real brain images. The volume of CC was obtained using sagittal sections areas. To compare the success of the methods, segmentation accuracy, Jaccard similarity and time consuming for segmentation were calculated. The results show that the GMM method resulted by a small margin in more accurate segmentation (midsagittal section segmentation accuracy 98.3% and 97.01% for GMM and FCM); however the FCM method resulted in faster segmentation than GMM. With this study, an accurate and automatic segmentation system that allows opportunity for quantitative comparison to doctors in the planning of treatment and the diagnosis of diseases affecting the size of the CC was developed. This study can be adapted to perform segmentation on other regions of the brain, thus, it can be operated as practical use in the clinic. PMID:23871683
NASA Astrophysics Data System (ADS)
Valaparla, Sunil K.; Peng, Qi; Gao, Feng; Clarke, Geoffrey D.
2014-03-01
Accurate measurements of human body fat distribution are desirable because excessive body fat is associated with impaired insulin sensitivity, type 2 diabetes mellitus (T2DM) and cardiovascular disease. In this study, we hypothesized that the performance of water suppressed (WS) MRI is superior to non-water suppressed (NWS) MRI for volumetric assessment of abdominal subcutaneous (SAT), intramuscular (IMAT), visceral (VAT), and total (TAT) adipose tissues. We acquired T1-weighted images on a 3T MRI system (TIM Trio, Siemens), which was analyzed using semi-automated segmentation software that employs a fuzzy c-means (FCM) clustering algorithm. Sixteen contiguous axial slices, centered at the L4-L5 level of the abdomen, were acquired in eight T2DM subjects with water suppression (WS) and without (NWS). Histograms from WS images show improved separation of non-fatty tissue pixels from fatty tissue pixels, compared to NWS images. Paired t-tests of WS versus NWS showed a statistically significant lower volume of lipid in the WS images for VAT (145.3 cc less, p=0.006) and IMAT (305 cc less, p<0.001), but not SAT (14.1 cc more, NS). WS measurements of TAT also resulted in lower fat volumes (436.1 cc less, p=0.002). There is strong correlation between WS and NWS quantification methods for SAT measurements (r=0.999), but poorer correlation for VAT studies (r=0.845). These results suggest that NWS pulse sequences may overestimate adipose tissue volumes and that WS pulse sequences are more desirable due to the higher contrast generated between fatty and non-fatty tissues.
Sumithra, Subramaniam; Victoire, T. Aruldoss Albert
2015-01-01
Due to large dimension of clusters and increasing size of sensor nodes, finding the optimal route and cluster for large wireless sensor networks (WSN) seems to be highly complex and cumbersome. This paper proposes a new method to determine a reasonably better solution of the clustering and routing problem with the highest concern of efficient energy consumption of the sensor nodes for extending network life time. The proposed method is based on the Differential Evolution (DE) algorithm with an improvised search operator called Diversified Vicinity Procedure (DVP), which models a trade-off between energy consumption of the cluster heads and delay in forwarding the data packets. The obtained route using the proposed method from all the gateways to the base station is comparatively lesser in overall distance with less number of data forwards. Extensive numerical experiments demonstrate the superiority of the proposed method in managing energy consumption of the WSN and the results are compared with the other algorithms reported in the literature. PMID:26516635
BiCluE - Exact and heuristic algorithms for weighted bi-cluster editing of biomedical data
2013-01-01
Background The explosion of biological data has dramatically reformed today's biology research. The biggest challenge to biologists and bioinformaticians is the integration and analysis of large quantity of data to provide meaningful insights. One major problem is the combined analysis of data from different types. Bi-cluster editing, as a special case of clustering, which partitions two different types of data simultaneously, might be used for several biomedical scenarios. However, the underlying algorithmic problem is NP-hard. Results Here we contribute with BiCluE, a software package designed to solve the weighted bi-cluster editing problem. It implements (1) an exact algorithm based on fixed-parameter tractability and (2) a polynomial-time greedy heuristics based on solving the hardest part, edge deletions, first. We evaluated its performance on artificial graphs. Afterwards we exemplarily applied our implementation on real world biomedical data, GWAS data in this case. BiCluE generally works on any kind of data types that can be modeled as (weighted or unweighted) bipartite graphs. Conclusions To our knowledge, this is the first software package solving the weighted bi-cluster editing problem. BiCluE as well as the supplementary results are available online at http://biclue.mpi-inf.mpg.de. PMID:24565035
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Malek, H.
1978-01-01
A clustering method, CLASSY, was developed, which alternates maximum likelihood iteration with a procedure for splitting, combining, and eliminating the resulting statistics. The method maximizes the fit of a mixture of normal distributions to the observed first through fourth central moments of the data and produces an estimate of the proportions, means, and covariances in this mixture. The mathematical model which is the basic for CLASSY and the actual operation of the algorithm is described. Data comparing the performances of CLASSY and ISOCLS on simulated and actual LACIE data are presented.
Li, Weizhong
2011-10-12
San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
Li, Weizhong [San Diego Supercomputer Center
2013-01-22
San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
El Harchaoui, Nour-Eddine; Ait Kerroum, Mounir; Hammouch, Ahmed; Ouadou, Mohamed; Aboutajdine, Driss
2013-01-01
The analysis and processing of large data are a challenge for researchers. Several approaches have been used to model these complex data, and they are based on some mathematical theories: fuzzy, probabilistic, possibilistic, and evidence theories. In this work, we propose a new unsupervised classification approach that combines the fuzzy and possibilistic theories; our purpose is to overcome the problems of uncertain data in complex systems. We used the membership function of fuzzy c-means (FCM) to initialize the parameters of possibilistic c-means (PCM), in order to solve the problem of coinciding clusters that are generated by PCM and also overcome the weakness of FCM to noise. To validate our approach, we used several validity indexes and we compared them with other conventional classification algorithms: fuzzy c-means, possibilistic c-means, and possibilistic fuzzy c-means. The experiments were realized on different synthetics data sets and real brain MR images. PMID:24489535
El Harchaoui, Nour-Eddine; Ait Kerroum, Mounir; Hammouch, Ahmed; Ouadou, Mohamed; Aboutajdine, Driss
2013-01-01
The analysis and processing of large data are a challenge for researchers. Several approaches have been used to model these complex data, and they are based on some mathematical theories: fuzzy, probabilistic, possibilistic, and evidence theories. In this work, we propose a new unsupervised classification approach that combines the fuzzy and possibilistic theories; our purpose is to overcome the problems of uncertain data in complex systems. We used the membership function of fuzzy c-means (FCM) to initialize the parameters of possibilistic c-means (PCM), in order to solve the problem of coinciding clusters that are generated by PCM and also overcome the weakness of FCM to noise. To validate our approach, we used several validity indexes and we compared them with other conventional classification algorithms: fuzzy c-means, possibilistic c-means, and possibilistic fuzzy c-means. The experiments were realized on different synthetics data sets and real brain MR images. PMID:24489535
Erny, Guillaume L; Acunha, Tanize; Simó, Carolina; Cifuentes, Alejandro; Alves, Arminda
2016-01-15
Various algorithms have been developed to improve the quantity and quality of information that can be extracted from complex datasets obtained using hyphenated mass spectrometric techniques. While different approaches are possible, the key step often consists in arranging the data into a large series of profiles known as extracted ion profiles. Those profiles, similar to mono-dimensional separation profiles, are then processed to detect potential chromatographic peaks. This allows extracting from the dataset a large number of peaks that are characteristics of the compounds that have been separated. However, with mass spectrometry (MS) detection, the response is usually a complex signal whose pattern depends on the analyte, the MS instrument and the ionization method. When converted to ionic profiles, a single separated analyte will have multiple images at different m/z range. In this manuscript we present a hierarchical agglomerative clustering algorithm to group profiles with very similar feature. Each group aims to contain all profiles that are due to the transport and monitoring of a single analyte. Clustering results are then used to generate a 2 dimensional representation, called clusters plot, which allows an in-depth analysis of the MS dataset including the visualization of poorly separated compounds even when their intensity differs by more than two orders of magnitude. The usefulness of this new approach has been validated with data from capillary electrophoresis time of flight mass spectrometry hyphenated via an electrospray ionization. Using a mixture of 17 low molecular endogenous compounds it was verified that ionic profiles belonging to each compounds were correctly clustered even with very low degree of separation (R below 0.03). The approach was also validated using a urine sample. While with the total ion profile 15 peaks could be distinguished, 70 clusters were obtained allowing a much thorough analysis. In this particular example, the total computing took less than 10 min. PMID:26711157
Davis, Jack B A; Shayeghi, Armin; Horswell, Sarah L; Johnston, Roy L
2015-09-01
A new open-source parallel genetic algorithm, the Birmingham parallel genetic algorithm, is introduced for the direct density functional theory global optimisation of metallic nanoparticles. The program utilises a pool genetic algorithm methodology for the efficient use of massively parallel computational resources. The scaling capability of the Birmingham parallel genetic algorithm is demonstrated through its application to the global optimisation of iridium clusters with 10 to 20 atoms, a catalytically important system with interesting size-specific effects. This is the first study of its type on Iridium clusters of this size and the parallel algorithm is shown to be capable of scaling beyond previous size restrictions and accurately characterising the structures of these larger system sizes. By globally optimising the system directly at the density functional level of theory, the code captures the cubic structures commonly found in sub-nanometre sized Ir clusters. PMID:26239404
NASA Astrophysics Data System (ADS)
Davis, Jack B. A.; Shayeghi, Armin; Horswell, Sarah L.; Johnston, Roy L.
2015-08-01
A new open-source parallel genetic algorithm, the Birmingham parallel genetic algorithm, is introduced for the direct density functional theory global optimisation of metallic nanoparticles. The program utilises a pool genetic algorithm methodology for the efficient use of massively parallel computational resources. The scaling capability of the Birmingham parallel genetic algorithm is demonstrated through its application to the global optimisation of iridium clusters with 10 to 20 atoms, a catalytically important system with interesting size-specific effects. This is the first study of its type on Iridium clusters of this size and the parallel algorithm is shown to be capable of scaling beyond previous size restrictions and accurately characterising the structures of these larger system sizes. By globally optimising the system directly at the density functional level of theory, the code captures the cubic structures commonly found in sub-nanometre sized Ir clusters.A new open-source parallel genetic algorithm, the Birmingham parallel genetic algorithm, is introduced for the direct density functional theory global optimisation of metallic nanoparticles. The program utilises a pool genetic algorithm methodology for the efficient use of massively parallel computational resources. The scaling capability of the Birmingham parallel genetic algorithm is demonstrated through its application to the global optimisation of iridium clusters with 10 to 20 atoms, a catalytically important system with interesting size-specific effects. This is the first study of its type on Iridium clusters of this size and the parallel algorithm is shown to be capable of scaling beyond previous size restrictions and accurately characterising the structures of these larger system sizes. By globally optimising the system directly at the density functional level of theory, the code captures the cubic structures commonly found in sub-nanometre sized Ir clusters. Electronic supplementary information (ESI) available. See DOI: 10.1039/C5NR03774C
Algorithmic Identification for Wings in Butterfly Diagrams.
NASA Astrophysics Data System (ADS)
Illarionov, E. A.; Sokolov, D. D.
2012-12-01
We investigate to what extent the wings of solar butterfly diagrams can be separated without an explicit usage of Hale's polarity law as well as the location of the solar equator. Two algorithms of cluster analysis, namely DBSCAN and C-means, have demonstrated their ability to separate the wings of contemporary butterfly diagrams based on the sunspot group density in the diagram only. Here we generalize the method for continuous tracers, give results concerning the migration velocities and presented clusters for 12 - 20 cycles.
Aslan, Mikail; Davis, Jack B A; Johnston, Roy L
2016-03-01
The global optimisation of small bimetallic PdCo binary nanoalloys are systematically investigated using the Birmingham Cluster Genetic Algorithm (BCGA). The effect of size and composition on the structures, stability, magnetic and electronic properties including the binding energies, second finite difference energies and mixing energies of Pd-Co binary nanoalloys are discussed. A detailed analysis of Pd-Co structural motifs and segregation effects is also presented. The maximal mixing energy corresponds to Pd atom compositions for which the number of mixed Pd-Co bonds is maximised. Global minimum clusters are distinguished from transition states by vibrational frequency analysis. HOMO-LUMO gap, electric dipole moment and vibrational frequency analyses are made to enable correlation with future experiments. PMID:26872088
NASA Astrophysics Data System (ADS)
Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.
2014-06-01
The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test Γ coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.
NASA Astrophysics Data System (ADS)
Vrtilek, Saeqa Dil; Boroson, Bram S.; Richards, Joseph
2014-06-01
We have explored recent developments in machine learning algorithms, such as diffusion mapping (Richards et al. 2009) which allow us to identify physically similar clusters independent of prior knowledge. We have successfully used this method to separate out different classes of X-ray binaries and of different spectral states within a given system. Beyond the immediate astronomical application, a strength of our approach is to offer new and useful insight into the vast and rapidly growing multi-dimensional data collections in essentially all fields of investigation, not only the astrophysical ones which form our testbed and the immediate focus of our scientific interest.
Abedini, Mohammad; Moradi, Mohammad H; Hosseinian, S M
2016-03-01
This paper proposes a novel method to address reliability and technical problems of microgrids (MGs) based on designing a number of self-adequate autonomous sub-MGs via adopting MGs clustering thinking. In doing so, a multi-objective optimization problem is developed where power losses reduction, voltage profile improvement and reliability enhancement are considered as the objective functions. To solve the optimization problem a hybrid algorithm, named HS-GA, is provided, based on genetic and harmony search algorithms, and a load flow method is given to model different types of DGs as droop controller. The performance of the proposed method is evaluated in two case studies. The results provide support for the performance of the proposed method. PMID:26767800
A possibilistic approach to clustering
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Keller, James M.
1993-01-01
Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering methods in that total commitment of a vector to a given class is not required at each image pattern recognition iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from the 'Fuzzy C-Means' (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Recently, we cast the clustering problem into the framework of possibility theory using an approach in which the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.
Shamsi, Hamed; Ozbek, I Yucel
2015-01-01
This work presents a detailed framework to detect the location of heart sound within the respiratory sound based on temporal fuzzy c-means (TFCM) algorithm. In the proposed method, respiratory sound is first divided into frames and for each frame, the logarithmic energy features are calculated. Then, these features are used to classify the respiratory sound as heart sound (HS containing lung sound) and non-HS (only lung sound) by the TFCM algorithm. The TFCM is the modified version fuzzy c-means (FCM) algorithm. While the FCM algorithm uses only the local information about the current frame, the TFCM algorithm uses the temporal information from both the current and the neighboring frames in decision making. To measure the detection performance of the proposed method, several experiments have been conducted on a database of 24 healthy subjects. The experimental results show that the average false-negative rate values are 0.8 ± 1.1 and 1.5 ± 1.4 %, and the normalized area under detection error curves are 0.0145 and 0.0269 for the TFCM method in the low and medium respiratory flow rates, respectively. These average values are significantly lower than those obtained by FCM algorithm and by the other compared methods in the literature, which demonstrates the efficiency of the proposed TFCM algorithm. On the other hand, the average elapsed time of the TFCM for a data with length of 0.2 ± 0.05 s is 0.2 ± 0.05 s, which is slightly higher than that of the FCM and lower than those of the other compared methods. PMID:25326867
NASA Astrophysics Data System (ADS)
Khu, Soon-Thiam; Madsen, Henrik; di Pierro, Francesco
2008-10-01
The use of distributed data for model calibration is becoming more popular in the advent of the availability of spatially distributed observations. Hydrological model calibration has traditionally been carried out using single objective optimisation and only recently has been extended to a multi-objective optimisation domain. By formulating the calibration problem with several objectives, each objective relating to a set of observations, the parameter sets can be constrained more effectively. However, many previous multi-objective calibration studies do not consider individual observations or catchment responses separately, but instead utilises some form of aggregation of objectives. This paper proposes a multi-objective calibration approach that can efficiently handle many objectives using both clustering and preference ordered ranking. The algorithm is applied to calibrate the MIKE SHE distributed hydrologic model and tested on the Karup catchment in Denmark. The results indicate that the preferred solutions selected using the proposed algorithm are good compromise solutions and the parameter values are well defined. Clustering with Kohonen mapping was able to reduce the number of objective functions from 18 to 5. Calibration using the standard deviation of groundwater level residuals enabled us to identify a group of wells that may not be simulated properly, thus highlighting potential problems with the model parameterisation.
Yin, Jiandong; Yang, Jiawen; Guo, Qiyong
2014-01-01
During dynamic susceptibility contrast-magnetic resonance imaging (DSC-MRI), it has been demonstrated that the arterial input function (AIF) can be obtained using fuzzy c-means (FCM) and k-means clustering methods. However, due to the dependence on the initial centers of clusters, both clustering methods have poor reproducibility between the calculation and recalculation steps. To address this problem, the present study developed an alternative clustering technique based on the agglomerative hierarchy (AH) method for AIF determination. The performance of AH method was evaluated using simulated data and clinical data based on comparisons with the two previously demonstrated clustering-based methods in terms of the detection accuracy, calculation reproducibility, and computational complexity. The statistical analysis demonstrated that, at the cost of a significantly longer execution time, AH method obtained AIFs more in line with the expected AIF, and it was perfectly reproducible at different time points. In our opinion, the disadvantage of AH method in terms of the execution time can be alleviated by introducing a professional high-performance workstation. The findings of this study support the feasibility of using AH clustering method for detecting the AIF automatically. PMID:24932638
Fuzzy clustering in Intelligent Scissors.
Wieclawek, W; Pietka, E
2012-07-01
In this study a modified Live-Wire approach is presented. A Fuzzy C-Means (FCM) clustering procedure has been implemented before the wavelet transform cost map function is defined. This shrinks the area to be searched resulting in a significant reduction of the computational complexity. The method has been employed to computed tomography (CT) and magnetic resonance (MR) studies. The 2D segmentation of lungs, abdominal structures and knee joint has been performed in order to evaluate the method. Significant numerical complexity reduction of the Live-Wire algorithm as well as improvement of the object delineation with a decreased number of user interactions have been obtained. PMID:22483373
Kandalla, Krishna; Subramoni, Hari; Vishnu, Abhinav; Panda, Dhabaleswar K.
2010-04-01
Modern high performance computing systems are being increasingly deployed in a hierarchical fashion with multi-core computing platforms forming the base of the hierarchy. These systems are usually comprised of multiple racks, with each rack consisting of a finite number of chassis, with each chassis having multiple compute nodes or blades, based on multi-core architectures. The networks are also hierarchical with multiple levels of switches. Message exchange operations between processes that belong to different racks involve multiple hops across different switches and this directly affects the performance of collective operations. In this paper, we take on the challenges involved in detecting the topology of large scale InfiniBand clusters and leveraging this knowledge to design efficient topology-aware algorithms for collective operations. We also propose a communication model to analyze the communication costs involved in collective operations on large scale supercomputing systems. We have analyzed the performance characteristics of two collectives, MPI_Gather and MPI_Scatter on such systems and we have proposed topology-aware algorithms for these operations. Our experimental results have shown that the proposed algorithms can improve the performance of these collective operations by almost 54% at the micro-benchmark level.
Oña, Ofelia B; Ferraro, Marta B; Facelli, Julio C
2011-01-01
The characterization and prediction of the structures of metal silicon clusters is important for nanotechnology research because these clusters can be used as building blocks for nano devices, integrated circuits and solar cells. Several authors have postulated that there is a transition between exo to endo absorption of Cu in Si(n) clusters and showed that for n larger than 9 it is possible to find endohedral clusters. Unfortunately, no global searchers have confirmed this observation, which is based on local optimizations of plausible structures. Here we use parallel Genetic Algorithms (GA), as implemented in our MGAC software, directly coupled with DFT energy calculations to show that the global search of CuSi(n) cluster structures does not find endohedral clusters for n < 8 but finds them for n ≥ 10. PMID:21785526
Possibilistic clustering for shape recognition
NASA Technical Reports Server (NTRS)
Keller, James M.; Krishnapuram, Raghu
1992-01-01
Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, we cast the clustering problem into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We constructed an appropriate objective function whose minimum will characterize a good possibilistic partition of the data, and we derived the membership and prototype update equations from necessary conditions for minimization of our criterion function. In this paper, we show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.
A pixel-based color image segmentation using support vector machine and fuzzy C-means.
Wang, Xiang-Yang; Zhang, Xian-Jin; Yang, Hong-Ying; Bu, Juan
2012-09-01
Image segmentation is an important tool in image processing and can serve as an efficient front end to sophisticated algorithms and thereby simplify subsequent processing. In this paper, we present a pixel-based color image segmentation using Support Vector Machine (SVM) and Fuzzy C-Means (FCM). Firstly, the pixel-level color feature and texture feature of the image, which is used as input of the SVM model (classifier), are extracted via the local spatial similarity measure model and Steerable filter. Then, the SVM model (classifier) is trained by using FCM with the extracted pixel-level features. Finally, the color image is segmented with the trained SVM model (classifier). This image segmentation can not only take full advantage of the local information of the color image but also the ability of the SVM classifier. Experimental evidence shows that the proposed method has a very effective computational behavior and effectiveness, and decreases the time and increases the quality of color image segmentation in comparison with the state-of-the-art segmentation methods recently proposed in the literature. PMID:22647833
NASA Astrophysics Data System (ADS)
Sun, Xiaoming; Wang, Guiling
2008-09-01
This study explores the applicability of data-driven clustering analysis in predicting vegetation distribution over two continents where water is an important controlling factor for vegetation growth, South America and Africa, and compares the ability of clustering analysis with that of a physically based dynamic vegetation model to predict vegetation distribution. A clustering analysis algorithm based on the genetic-algorithm-based K-means is tested, with the number of clusters determined a priori according to the primary plant functional types observed to exist in the study domain. The most important variables upon which the clustering analysis is based include available water, its seasonality, and evaporative demand. The dynamic vegetation model used is the Community Land Model version 3 coupled with a Dynamic Global Vegetation Model (CLM3-DGVM) with modifications targeted to address some known biases of the model. Results from both the clustering analysis and the modified CLM3-DGVM are compared against observations derived from the Moderate Resolution Imaging Spectroradiometer (MODIS). Both methods reasonably reproduced the general pattern of dominant plant functional type distribution. There is no clear winner between the two methods, as the DGVM outperforms the clustering analysis approach in some aspects and is outperformed in others. It is therefore suggested that clustering analysis can be a useful tool in biogeography estimation, although it cannot be used in mechanistic studies as the process-based DGVMs are.
Colorectal Cancer Staging Using Three Clustering Methods Based on Preoperative Clinical Findings.
Pourahmad, Saeedeh; Pourhashemi, Soudabeh; Mohammadianpanah, Mohammad
2016-01-01
Determination of the colorectal cancer stage is possible only after surgery based on pathology results. However, sometimes this may prove impossible. The aim of the present study was to determine colorectal cancer stage using three clustering methods based on preoperative clinical findings. All patients referred to the Colorectal Research Center of Shiraz University of Medical Sciences for colorectal cancer surgery during 2006 to 2014 were enrolled in the study. Accordingly, 117 cases participated. Three clustering algorithms were utilized including k-means, hierarchical and fuzzy c-means clustering methods. External validity measures such as sensitivity, specificity and accuracy were used for evaluation of the methods. The results revealed maximum accuracy and sensitivity values for the hierarchical and a maximum specificity value for the fuzzy c-means clustering methods. Furthermore, according to the internal validity measures for the present data set, the optimal number of clusters was two (silhouette coefficient) and the fuzzy c-means algorithm was more appropriate than the k-means clustering approach by increasing the number of clusters. PMID:26925686
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2016-03-01
We present new versions of sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. In this update, we add the method of GPU-based cluster-labeling algorithm without the use of conventional iteration (Komura, 2015) to those programs. For high-precision calculations, we also add a random-number generator in the cuRAND library. Moreover, we fix several bugs and remove the extra usage of shared memory in the kernel functions.
Wu, Jia-Rui; Guo, Wei-Xian; Zhang, Xiao-Meng; Huang, Xiu-Qin; Yang, Bing
2014-02-01
Analyzed the prescriptions for phlegm retention syndrome that built by Ma Peizhi by the association rules and clustering algorithm, the frequency of drug usage and the relationship between drugs could be get. And from that we could conclude the experiences for phlegm retention syndrome of Ma Peizhi of menghe medical genre. The results of the analysis were that 18 core combinations were dig out, such as Citri Exocarpium Rubrum-Eriobotryae Folium-Citri Reticulatae Pericarpium. And there were 9 new prescriptions were found out such as Aurantii Fructus-Citri Exocarpium Rubium-Eriobotryae Folium-Citri Reticulatae Pericarpium. The results of the analysis were proved that Ma Peizhi of Menghe Medical Genre was good at curing phlegm retention syndrome by using the traditional Chinese medicine of mild and light, such as the medicines of mild tonification, and clearing damp and promoting diuresis. PMID:25204136
Dynamic critical behavio(u)r of a cluster algorithm for the Ashkin-Teller model
NASA Astrophysics Data System (ADS)
Salas, Jesús; Sokal, Alan D.
1996-03-01
We study the dynamic critical behavior of a Swendsen-Wang-type algorithm for the Ashkin-Teller model. We find that the Li-Sokal bound on the autocorrelation time ( τint, ɛ ≥ const × C H) holds along the self-dual curve of the symmetric Ashkin-Teller model, but this bound is apparently not sharp. The ratio τint, ɛ / C H appears to tend to infinity either as a logarithm or as a small power (0.05 ≲ p ≲ 0.12).
Khotanlou, Hassan; Afrasiabi, Mahlagha
2011-01-01
This paper introduces a novel methodology for the segmentation of brain MS lesions in MRI volumes using a new clustering algorithm named SCPFCM. SCPFCM uses membership, typicality and spatial information to cluster each voxel. The proposed method relies on an initial segmentation of MS lesions in T1-w and T2-w images by applying SCPFCM algorithm, and the T1 image is then used as a mask and is compared with T2 image. The proposed method was applied to 10 clinical MRI datasets. The results obtained on different types of lesions have been evaluated by comparison with manual segmentations. PMID:22606670
Garmire, Lana X.; Garmire, David G.; Huang, Wendy; Yao, Joyee; Glass, Christopher K.; Subramaniam, Shankar
2011-01-01
Identification of diffuse signals from the chromatin immunoprecipitation and high-throughput massively parallel sequencing (ChIP-Seq) technology poses significant computational challenges, and there are few methods currently available. We present a novel global clustering approach to enrich diffuse CHIP-Seq signals of RNA polymerase II and histone 3 lysine 4 trimethylation (H3K4Me3) and apply it to identify putative long intergenic non-coding RNAs (lincRNAs) in macrophage cells. Our global clustering method compares favorably to the local clustering method SICER that was also designed to identify diffuse CHIP-Seq signals. The validity of the algorithm is confirmed at several levels. First, 8 out of a total of 11 selected putative lincRNA regions in primary macrophages respond to lipopolysaccharides (LPS) treatment as predicted by our computational method. Second, the genes nearest to lincRNAs are enriched with biological functions related to metabolic processes under resting conditions but with developmental and immune-related functions under LPS treatment. Third, the putative lincRNAs have conserved promoters, modestly conserved exons, and expected secondary structures by prediction. Last, they are enriched with motifs of transcription factors such as PU.1 and AP.1, previously shown to be important lineage determining factors in macrophages, and 83% of them overlap with distal enhancers markers. In summary, GCLS based on RNA polymerase II and H3K4Me3 CHIP-Seq method can effectively detect putative lincRNAs that exhibit expected characteristics, as exemplified by macrophages in the study. PMID:21980340
Delineation and quantitation of brain lesions by fuzzy clustering in positron emission tomography.
Boudraa, A E; Champier, J; Cinotti, L; Bordet, J C; Lavenne, F; Mallet, J J
1996-01-01
In this study, we investigate the application of the fuzzy clustering to the anatomical localization and quantitation of brain lesions in Positron Emission Tomography (PET) images. The method is based on the Fuzzy C-Means (FCM) algorithm. The algorithm segments the PET image data points into a given number of clusters. Each cluster is an homogeneous region of the brain (e.g. tumor). A feature vector is assigned to a cluster which has the highest membership degree. Having the label affected by the FCM algorithm to a cluster, one may easily compute the corresponding spatial localization, area and perimeter. Studies concerning the evolution of a tumor after different treatments in two patients are presented. PMID:8891420
Muhammad, Durreshahwar; Foret, Jessica; Brady, Siobhan M.; Ducoste, Joel J.; Tuck, James; Long, Terri A.; Williams, Cranos
2015-01-01
Time course transcriptome datasets are commonly used to predict key gene regulators associated with stress responses and to explore gene functionality. Techniques developed to extract causal relationships between genes from high throughput time course expression data are limited by low signal levels coupled with noise and sparseness in time points. We deal with these limitations by proposing the Cluster and Differential Alignment Algorithm (CDAA). This algorithm was designed to process transcriptome data by first grouping genes based on stages of activity and then using similarities in gene expression to predict influential connections between individual genes. Regulatory relationships are assigned based on pairwise alignment scores generated using the expression patterns of two genes and some inferred delay between the regulator and the observed activity of the target. We applied the CDAA to an iron deficiency time course microarray dataset to identify regulators that influence 7 target transcription factors known to participate in the Arabidopsis thaliana iron deficiency response. The algorithm predicted that 7 regulators previously unlinked to iron homeostasis influence the expression of these known transcription factors. We validated over half of predicted influential relationships using qRT-PCR expression analysis in mutant backgrounds. One predicted regulator-target relationship was shown to be a direct binding interaction according to yeast one-hybrid (Y1H) analysis. These results serve as a proof of concept emphasizing the utility of the CDAA for identifying unknown or missing nodes in regulatory cascades, providing the fundamental knowledge needed for constructing predictive gene regulatory networks. We propose that this tool can be used successfully for similar time course datasets to extract additional information and infer reliable regulatory connections for individual genes. PMID:26317202
Nagwani, Naresh Kumar; Deo, Shirish V.
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.
Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le
2013-01-01
Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches. PMID:24091399
A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining
NASA Astrophysics Data System (ADS)
Tsiafoulis, S.; Zorkadis, V. C.; Karras, D. A.
The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ℓ-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.
Polat, Kemal
2012-08-01
In this paper, attribute weighting method based on the cluster centers with aim of increasing the discrimination between classes has been proposed and applied to nonlinear separable datasets including two medical datasets (mammographic mass dataset and bupa liver disorders dataset) and 2-D spiral dataset. The goals of this method are to gather the data points near to cluster center all together to transform from nonlinear separable datasets to linear separable dataset. As clustering algorithm, k-means clustering, fuzzy c-means clustering, and subtractive clustering have been used. The proposed attribute weighting methods are k-means clustering based attribute weighting (KMCBAW), fuzzy c-means clustering based attribute weighting (FCMCBAW), and subtractive clustering based attribute weighting (SCBAW) and used prior to classifier algorithms including C4.5 decision tree and adaptive neuro-fuzzy inference system (ANFIS). To evaluate the proposed method, the recall, precision value, true negative rate (TNR), G-mean1, G-mean2, f-measure, and classification accuracy have been used. The results have shown that the best attribute weighting method was the subtractive clustering based attribute weighting with respect to classification performance in the classification of three used datasets. PMID:21611787
A new method based on Dempster-Shafer theory and fuzzy c-means for brain MRI segmentation
NASA Astrophysics Data System (ADS)
Liu, Jie; Lu, Xi; Li, Yunpeng; Chen, Xiaowu; Deng, Yong
2015-10-01
In this paper, a new method is proposed to decrease sensitiveness to motion noise and uncertainty in magnetic resonance imaging (MRI) segmentation especially when only one brain image is available. The method is approached with considering spatial neighborhood information by fusing the information of pixels with their neighbors with Dempster-Shafer (DS) theory. The basic probability assignment (BPA) of each single hypothesis is obtained from the membership function of applying fuzzy c-means (FCM) clustering to the gray levels of the MRI. Then multiple hypotheses are generated according to the single hypothesis. Then we update the objective pixel’s BPA by fusing the BPA of the objective pixel and those of its neighbors to get the final result. Some examples in MRI segmentation are demonstrated at the end of the paper, in which our method is compared with some previous methods. The results show that the proposed method is more effective than other methods in motion-blurred MRI segmentation.
An Automatic Image Inpainting Algorithm Based on FCM
Liu, Jiansheng; Liu, Hui; Qiao, Shangping; Yue, Guangxue
2014-01-01
There are many existing image inpainting algorithms in which the repaired area should be manually determined by users. Aiming at this drawback of the traditional image inpainting algorithms, this paper proposes an automatic image inpainting algorithm which automatically identifies the repaired area by fuzzy C-mean (FCM) algorithm. FCM algorithm classifies the image pixels into a number of categories according to the similarity principle, making the similar pixels clustering into the same category as possible. According to the provided gray value of the pixels to be inpainted, we calculate the category whose distance is the nearest to the inpainting area and this category is to be inpainting area, and then the inpainting area is restored by the TV model to realize image automatic inpainting. PMID:24516358
Risk Mapping of Cutaneous Leishmaniasis via a Fuzzy C Means-based Neuro-Fuzzy Inference System
NASA Astrophysics Data System (ADS)
Akhavan, P.; Karimi, M.; Pahlavani, P.
2014-10-01
Finding pathogenic factors and how they are spread in the environment has become a global demand, recently. Cutaneous Leishmaniasis (CL) created by Leishmania is a special parasitic disease which can be passed on to human through phlebotomus of vector-born. Studies show that economic situation, cultural issues, as well as environmental and ecological conditions can affect the prevalence of this disease. In this study, Data Mining is utilized in order to predict CL prevalence rate and obtain a risk map. This case is based on effective environmental parameters on CL and a Neuro-Fuzzy system was also used. Learning capacity of Neuro-Fuzzy systems in neural network on one hand and reasoning power of fuzzy systems on the other, make it very efficient to use. In this research, in order to predict CL prevalence rate, an adaptive Neuro-fuzzy inference system with fuzzy inference structure of fuzzy C Means clustering was applied to determine the initial membership functions. Regarding to high incidence of CL in Ilam province, counties of Ilam, Mehran, and Dehloran have been examined and evaluated. The CL prevalence rate was predicted in 2012 by providing effective environmental map and topography properties including temperature, moisture, annual, rainfall, vegetation and elevation. Results indicate that the model precision with fuzzy C Means clustering structure rises acceptable RMSE values of both training and checking data and support our analyses. Using the proposed data mining technology, the pattern of disease spatial distribution and vulnerable areas become identifiable and the map can be used by experts and decision makers of public health as a useful tool in management and optimal decision-making.
NASA Astrophysics Data System (ADS)
Tramacere, A.; Vecchio, C.
2013-01-01
Context. The density based spatial clustering of applications with noise (DBSCAN) is a topometric algorithm used to cluster spatial data that are affected by background noise. For the first time, we propose this method to detect sources in γ-ray astrophysical images obtained from the Fermi-LAT data, where each point corresponds to the arrival direction of a photon. Aims: We investigate the detection performance of the γ-ray DBSCAN in terms of detection efficiency and rejection of spurious clusters. Methods: We used a parametric approach, exploring a large volume of the γ-ray DBSCAN parameter space. By means of simulated data we statistically characterized the γ-ray DBSCAN, finding signatures that distinguish purely random fields from fields with sources. We defined a significance level for the detected clusters and successfully tested this significance with our simulated data. We applied the method to real data and found an excellent agreement with the results obtained with simulated data. Results.We find that the γ-ray DBSCAN can be successfully used in detecting clusters in γ-ray data. The significance returned by our algorithm is strongly correlated with that provided by the maximum likelihood analysis with standard Fermi-LAT software, and can be used to safely remove spurious clusters. The positional accuracy of the reconstructed cluster centroid compares to that returned by standard maximum likelihood analysis, allowing one to look for astrophysical counterparts in narrow regions, which minimizes the chance probability in the counterpart association. Conclusions.We found that γ-ray DBSCAN is a powerful tool for detecting of clusters in γ-ray data. It can be used to look for both point-like sources and extended sources, and can be potentially applied to any astrophysical field related to detecting clusters in data. In a companion paper we will present the application of the γ-ray DBSCAN to the full Fermi-LAT sky, discussing the potential of the algorithm to discover new sources.
Winter, André; Rieger, Heiko; Vojta, Matthias; Bulla, Ralf
2009-01-23
A continuous time cluster algorithm for two-level systems coupled to a dissipative bosonic bath is presented and applied to the sub-Ohmic spin-boson model. When the power s of the spectral function Jomega proportional, variant omegas is smaller than 1/2, the critical exponents are found to be classical, mean-field like. Potential sources for the discrepancy with recent renormalization group predictions are traced back to the effect of a dangerously irrelevant variable. PMID:19257337
NASA Technical Reports Server (NTRS)
Werth, L. F. (Principal Investigator)
1981-01-01
Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,000 color infrared aerial photography. Confusion matrices for the two clustering algorithms were generated and studied to determine which is most applicable to forest and rangeland inventories in future projects. In an unsupervised mode, ISOCLS requires many trial-and-error runs to find the proper parameters to separate desired information classes. CLASSY tells more in a single run concerning the classes that can be separated, shows more promise for forest stratification than ISOCLS, and shows more promise for consistency. One major drawback to CLASSY is that important forest and range classes that are smaller than a minimum cluster size will be combined with other classes. The algorithm requires so much computer storage that only data sets as small as a quadrangle can be used at one time.
NASA Astrophysics Data System (ADS)
Zhan, Qun; Zhao, Nanxiang
2011-08-01
In the 2D non-contacted body measurement, the transform model which converts the human body 2D girth data to the 3D girth data is required. However, the integrate model is hardly to be obtained for the different human body type categories determine the different model parameter. So, the work of human body type accuracy classification based on the measure data is very important. The canonical transformation method is used to strengthen the similar of data features of the same type and broaden the diversity of the data features of the different type. The "accumulating dead bodies" ant colony algorithm is improved in the paper in the way of employing the road information densities to help the ant to select the probable path lead to site of the accumulating dead bodies when it moves the data. By the way, the randomness and blindness of the ants' walking are eliminated, and the speed of the algorithm convergence is improved. For avoiding the unevenness of the data unit visited times in the algorithm, the access mechanism of the union data is employed, which avoid the algorithm to get into the local foul trap. The clustering validity function is selected to verify the clustering result of the human measure data. The experiment results indicate the affectivity and efficiency of the human body clustering work based on the improved ant colony algorithm. Basing the sorting result, the accuracy 3D body data transforming model can be founded, which should improve the accuracy of the non-contacted body measurement.
A graph-based watershed merging using fuzzy C-means and simulated annealing for image segmentation
NASA Astrophysics Data System (ADS)
Vadiveloo, Mogana; Abdullah, Rosni; Rajeswari, Mandava
2015-12-01
In this paper, we have addressed the issue of over-segmented regions produced in watershed by merging the regions using global feature. The global feature information is obtained from clustering the image in its feature space using Fuzzy C-Means (FCM) clustering. The over-segmented regions produced by performing watershed on the gradient of the image are then mapped to this global information in the feature space. Further to this, the global feature information is optimized using Simulated Annealing (SA). The optimal global feature information is used to derive the similarity criterion to merge the over-segmented watershed regions which are represented by the region adjacency graph (RAG). The proposed method has been tested on digital brain phantom simulated dataset to segment white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF) soft tissues regions. The experiments showed that the proposed method performs statistically better, with average of 95.242% regions are merged, than the immersion watershed and average accuracy improvement of 8.850% in comparison with RAG-based immersion watershed merging using global and local features.
NASA Astrophysics Data System (ADS)
Balafar, M. A.; Ramli, Abd. Rahman; Saripan, M. Iqbal; Mahmud, Rozi; Mashohor, Syamsiah
Accurate segmentation of medical images is very essential in medical applications. We proposed a new method, based on combination of Learning Vector Quantization (LVQ), FCM and user interaction to make segmentation more robust against inequality of content with semantic, low contrast, in homogeneity and noise. In the postulated method, noise is decreased using Stationary wavelet Transform (SWT); input image is clustered using FCM to the n clusters where n is the number of target classes, afterwards, user selects some of the clusters to be partitioned again; each user selected cluster is clustered to two sub clusters using FCM. This process continues until user to be satisfied. Then, user selects clusters for each target class; user selected clusters are used to train LVQ. After training LVQ, image pixels are clustered by LVQ. Segmentation of simulated and real images is demonstrated to show effectiveness of new method.
NASA Astrophysics Data System (ADS)
Nandy, Subhajit; Chaudhury, Pinaki; Bhattacharyya, S. P.
2010-06-01
We present a genetic algorithm based investigation of structural fragmentation in dicationic noble gas clusters, Arn+2, Krn+2, and Xen+2, where n denotes the size of the cluster. Dications are predicted to be stable above a threshold size of the cluster when positive charges are assumed to remain localized on two noble gas atoms and the Lennard-Jones potential along with bare Coulomb and ion-induced dipole interactions are taken into account for describing the potential energy surface. Our cutoff values are close to those obtained experimentally [P. Scheier and T. D. Mark, J. Chem. Phys. 11, 3056 (1987)] and theoretically [J. G. Gay and B. J. Berne, Phys. Rev. Lett. 49, 194 (1982)]. When the charges are allowed to be equally distributed over four noble gas atoms in the cluster and the nonpolarization interaction terms are allowed to remain unchanged, our method successfully identifies the size threshold for stability as well as the nature of the channels of dissociation as function of cluster size. In Arn2+, for example, fissionlike fragmentation is predicted for n =55 while for n =43, the predicted outcome is nonfission fragmentation in complete agreement with earlier work [Golberg et al., J. Chem. Phys. 100, 8277 (1994)].
Sensitivity evaluation of dynamic speckle activity measurements using clustering methods
Etchepareborda, Pablo; Federico, Alejandro; Kaufmann, Guillermo H.
2010-07-01
We evaluate and compare the use of competitive neural networks, self-organizing maps, the expectation-maximization algorithm, K-means, and fuzzy C-means techniques as partitional clustering methods, when the sensitivity of the activity measurement of dynamic speckle images needs to be improved. The temporal history of the acquired intensity generated by each pixel is analyzed in a wavelet decomposition framework, and it is shown that the mean energy of its corresponding wavelet coefficients provides a suited feature space for clustering purposes. The sensitivity obtained by using the evaluated clustering techniques is also compared with the well-known methods of Konishi-Fujii, weighted generalized differences, and wavelet entropy. The performance of the partitional clustering approach is evaluated using simulated dynamic speckle patterns and also experimental data.
Generalized fuzzy clustering for segmentation of multi-spectral magnetic resonance images.
He, Renjie; Datta, Sushmita; Sajja, Balasrinivasa Rao; Narayana, Ponnada A
2008-07-01
An integrated approach for multi-spectral segmentation of MR images is presented. This method is based on the fuzzy c-means (FCM) and includes bias field correction and contextual constraints over spatial intensity distribution and accounts for the non-spherical cluster's shape in the feature space. The bias field is modeled as a linear combination of smooth polynomial basis functions for fast computation in the clustering iterations. Regularization terms for the neighborhood continuity of intensity are added into the FCM cost functions. To reduce the computational complexity, the contextual regularizations are separated from the clustering iterations. Since the feature space is not isotropic, distance measure adopted in Gustafson-Kessel (G-K) algorithm is used instead of the Euclidean distance, to account for the non-spherical shape of the clusters in the feature space. These algorithms are quantitatively evaluated on MR brain images using the similarity measures. PMID:18387784
NASA Astrophysics Data System (ADS)
Wagstaff, Kiri L.
2012-03-01
On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained clustering, in which some partial information about item assignments or other components of the resulting output are already known and must be accommodated by the solution. Some algorithms seek a partition of the data set into distinct clusters, while others build a hierarchy of nested clusters that can capture taxonomic relationships. Some produce a single optimal solution, while others construct a probabilistic model of cluster membership. More formally, clustering algorithms operate on a data set X composed of items represented by one or more features (dimensions). These could include physical location, such as right ascension and declination, as well as other properties such as brightness, color, temporal change, size, texture, and so on. Let D be the number of dimensions used to represent each item, xi ∈ RD. The clustering goal is to produce an organization P of the items in X that optimizes an objective function f : P -> R, which quantifies the quality of solution P. Often f is defined so as to maximize similarity within a cluster and minimize similarity between clusters. To that end, many algorithms make use of a measure d : X x X -> R of the distance between two items. A partitioning algorithm produces a set of clusters P = {c1, . . . , ck} such that the clusters are nonoverlapping (c_i intersected with c_j = empty set, i != j) subsets of the data set (Union_i c_i=X). Hierarchical algorithms produce a series of partitions P = {p1, . . . , pn }. For a complete hierarchy, the number of partitions n’= n, the number of items in the data set; the top partition is a single cluster containing all items, and the bottom partition contains n clusters, each containing a single item. For model-based clustering, each cluster c_j is represented by a model m_j , such as the cluster center or a Gaussian distribution. The wide array of available clustering algorithms may seem bewildering, and covering all of them is beyond the scope of this chapter. Choosing among them for a particular application involves considerations of the kind of data being analyzed, algorithm runtime efficiency, and how much prior knowledge is available about the problem domain, which can dictate the nature of clusters sought. Fundamentally, the clustering method and its representations of clusters carries with it a definition of what a cluster is, and it is important that this be aligned with the analysis goals for the problem at hand. In this chapter, I emphasize this point by identifying for each algorithm the cluster representation as a model, m_j , even for algorithms that are not typically thought of as creating a “model.” This chapter surveys a basic collection of clustering methods useful to any practitioner who is interested in applying clustering to a new data set. The algorithms include k-means (Section 25.2), EM (Section 25.3), agglomerative (Section 25.4), and spectral (Section 25.5) clustering, with side mentions of variants such as kernel k-means and divisive clustering. The chapter also discusses each algorithm’s strengths and limitations and provides pointers to additional in-depth reading for each subject. Section 25.6 discusses methods for incorporating domain knowledge into the clustering process. This chapter concludes with a brief survey of interesting applications of clustering methods to astronomy data (Section 25.7). The chapter begins with k-means because it is both generally accessible and so widely used that understanding it can be considered a necessary prerequisite for further work in the field. EM can be viewed as a more sophisticated version of k-means that uses a generative model for each cluster and probabilistic item assignments. Agglomerative clustering is the most basic form of hierarchical clustering and provides a basis for further exploration of algorithms in that vein. Spectral clustering permits a departure from feature-vector-based clustering and can operate on data sets instead represented as affinity, or similarity matrices—cases in which only pairwise information is known. The list of algorithms covered in this chapter is representative of those most commonly in use, but it is by no means comprehensive. There is an extensive collection of existing books on clustering that provide additional background and depth. Three early books that remain useful today are Anderberg’s Cluster Analysis for Applications [3], Hartigan’s Clustering Algorithms [25], and Gordon’s Classification [22]. The latter covers basics on similarity measures, partitioning and hierarchical algorithms, fuzzy clustering, overlapping clustering, conceptual clustering, validations methods, and visualization or data reduction techniques such as principal components analysis (PCA),multidimensional scaling, and self-organizing maps. More recently, Jain et al. provided a useful and informative survey [27] of a variety of different clustering algorithms, including those mentioned here as well as fuzzy, graph-theoretic, and evolutionary clustering. Everitt’s Cluster Analysis [19] provides a modern overview of algorithms, similarity measures, and evaluation methods.
Wu, Shandong; Weinstein, Susan P.; Conant, Emily F.; Kontos, Despina
2013-12-15
Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's pairedt-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readers manual segmentation, the proposed FCM-Atlas method achieves a correlation ofr = 0.92 for FGT% and r = 0.93 for |FGT|, and the automated segmentation is not statistically significantly different (p = 0.46 for FGT% and p = 0.55 for |FGT|). The bilateral correlation between left breasts and right breasts for the FGT% is 0.94, 0.92, and 0.95 for reader 1, reader 2, and the FCM-Atlas, respectively; likewise, for the |FGT|, it is 0.92, 0.92, and 0.93, respectively. For the spatial segmentation agreement, the automated algorithm achieves a DSC of 0.69 0.1 when compared to reader 1 and 0.61 0.1 for reader 2, respectively, while the DSC between the two readers manual segmentation is 0.67 0.15. Additional robustness analysis shows that the segmentation performance of the authors' method is stable both with respect to selecting different cases and to varying the number of cases needed to construct the prior probability atlas. The authors' results also show that the proposed FCM-Atlas method outperforms the commonly used two-cluster FCM-alone method. The authors' method runs at ?5 min for each 3D bilateral MR scan (56 slices) for computing the FGT% and |FGT|, compared to ?55 min needed for manual segmentation for the same purpose. Conclusions: The authors' method achieves robust segmentation and can serve as an efficient tool for processing large clinical datasets for quantifying the fibroglandular tissue content in breast MRI. It holds a great potential to support clinical applications in the future including breast cancer risk assessment.
Wu, Shandong; Weinstein, Susan P.; Conant, Emily F.; Kontos, Despina
2013-12-15
Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's pairedt-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readers’ manual segmentation, the proposed FCM-Atlas method achieves a correlation ofr = 0.92 for FGT% and r = 0.93 for |FGT|, and the automated segmentation is not statistically significantly different (p = 0.46 for FGT% and p = 0.55 for |FGT|). The bilateral correlation between left breasts and right breasts for the FGT% is 0.94, 0.92, and 0.95 for reader 1, reader 2, and the FCM-Atlas, respectively; likewise, for the |FGT|, it is 0.92, 0.92, and 0.93, respectively. For the spatial segmentation agreement, the automated algorithm achieves a DSC of 0.69 ± 0.1 when compared to reader 1 and 0.61 ± 0.1 for reader 2, respectively, while the DSC between the two readers’ manual segmentation is 0.67 ± 0.15. Additional robustness analysis shows that the segmentation performance of the authors' method is stable both with respect to selecting different cases and to varying the number of cases needed to construct the prior probability atlas. The authors' results also show that the proposed FCM-Atlas method outperforms the commonly used two-cluster FCM-alone method. The authors' method runs at ∼5 min for each 3D bilateral MR scan (56 slices) for computing the FGT% and |FGT|, compared to ∼55 min needed for manual segmentation for the same purpose. Conclusions: The authors' method achieves robust segmentation and can serve as an efficient tool for processing large clinical datasets for quantifying the fibroglandular tissue content in breast MRI. It holds a great potential to support clinical applications in the future including breast cancer risk assessment.
Not Available
1994-02-02
This report consists of three separate but related reports. They are (1) Human Resource Development, (2) Carbon-based Structural Materials Research Cluster, and (3) Data Parallel Algorithms for Scientific Computing. To meet the objectives of the Human Resource Development plan, the plan includes K--12 enrichment activities, undergraduate research opportunities for students at the state`s two Historically Black Colleges and Universities, graduate research through cluster assistantships and through a traineeship program targeted specifically to minorities, women and the disabled, and faculty development through participation in research clusters. One research cluster is the chemistry and physics of carbon-based materials. The objective of this cluster is to develop a self-sustaining group of researchers in carbon-based materials research within the institutions of higher education in the state of West Virginia. The projects will involve analysis of cokes, graphites and other carbons in order to understand the properties that provide desirable structural characteristics including resistance to oxidation, levels of anisotropy and structural characteristics of the carbons themselves. In the proposed cluster on parallel algorithms, research by four WVU faculty and three state liberal arts college faculty are: (1) modeling of self-organized critical systems by cellular automata; (2) multiprefix algorithms and fat-free embeddings; (3) offline and online partitioning of data computation; and (4) manipulating and rendering three dimensional objects. This cluster furthers the state Experimental Program to Stimulate Competitive Research plan by building on existing strengths at WVU in parallel algorithms.
NASA Astrophysics Data System (ADS)
Shenvi, Neil; van Aggelen, Helen; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David
2013-08-01
Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r4), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T).
Shenvi, Neil; van Aggelen, Helen; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David
2013-08-01
Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r(4)), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T). PMID:23927246
Cazade, Pierre-André; Berezovska, Ganna; Meuwly, Markus; Zheng, Wenwei; Clementi, Cecilia; Prada-Gracia, Diego; Rao, Francesco
2015-01-14
The ligand migration network for O{sub 2}–diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k–means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k–means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.
Naim, Iftekhar; Datta, Suprakash; Rebhahn, Jonathan; Cavenaugh, James S; Mosmann, Tim R; Sharma, Gaurav
2014-01-01
We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems. © 2014 The Authors. Published by Wiley Periodicals Inc. PMID:24677621
An improved FCM medical image segmentation algorithm based on MMTD.
Zhou, Ningning; Yang, Tingting; Zhang, Shaobai
2014-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) is one of the popular clustering algorithms for medical image segmentation. But FCM is highly vulnerable to noise due to not considering the spatial information in image segmentation. This paper introduces medium mathematics system which is employed to process fuzzy information for image segmentation. It establishes the medium similarity measure based on the measure of medium truth degree (MMTD) and uses the correlation of the pixel and its neighbors to define the medium membership function. An improved FCM medical image segmentation algorithm based on MMTD which takes some spatial features into account is proposed in this paper. The experimental results show that the proposed algorithm is more antinoise than the standard FCM, with more certainty and less fuzziness. This will lead to its practicable and effective applications in medical image segmentation. PMID:24648852
Baldauf, Tobias; Smith, Robert E.; Seljak, Uros; Mandelbaum, Rachel
2010-03-15
The clustering of matter on cosmological scales is an essential probe for studying the physical origin and composition of our Universe. To date, most of the direct studies have focused on shear-shear weak lensing correlations, but it is also possible to extract the dark matter clustering by combining galaxy-clustering and galaxy-galaxy-lensing measurements. In order to extract the required information, one must relate the observable galaxy distribution to the underlying dark matter distribution. In this study we develop in detail a method that can constrain the dark matter correlation function from galaxy clustering and galaxy-galaxy-lensing measurements, by focusing on the correlation coefficient between the galaxy and matter overdensity fields. Our goal is to develop an estimator that maximally correlates the two. To generate a mock galaxy catalogue for testing purposes, we use the halo occupation distribution approach applied to a large ensemble of N-body simulations to model preexisting SDSS luminous red galaxy sample observations. Using this mock catalogue, we show that a direct comparison between the excess surface mass density measured by lensing and its corresponding galaxy clustering quantity is not optimal. We develop a new statistic that suppresses the small-scale contributions to these observations and show that this new statistic leads to a cross-correlation coefficient that is within a few percent of unity down to 5h{sup -1} Mpc. Furthermore, the residual incoherence between the galaxy and matter fields can be explained using a theoretical model for scale-dependent galaxy bias, giving us a final estimator that is unbiased to within 1%, so that we can reconstruct the dark matter clustering power spectrum at this accuracy up to k{approx}1h Mpc{sup -1}. We also perform a comprehensive study of other physical effects that can affect the analysis, such as redshift space distortions and differences in radial windows between galaxy clustering and weak lensing observations. We apply the method to a range of cosmological models and explicitly show the viability of our new statistic to distinguish between cosmological models.
Krejci, Adam; Hupp, Ted R.; Lexa, Matej; Vojtesek, Borivoj; Muller, Petr
2016-01-01
Motivation: Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins’ surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions. Results: The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible. Availability and implementation: Hammock is published under GNU GPL v. 3 license and is freely available as a standalone program (from http://www.recamo.cz/en/software/hammock-cluster-peptides/) or as a tool for the Galaxy toolbox (from https://toolshed.g2.bx.psu.edu/view/hammock/hammock). The source code can be downloaded from https://github.com/hammock-dev/hammock/releases. Contact: muller@mou.cz Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26342231
Information Clustering Based on Fuzzy Multisets.
ERIC Educational Resources Information Center
Miyamoto, Sadaaki
2003-01-01
Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…
Muster: Massively Scalable Clustering
Energy Science and Technology Software Center (ESTSC)
2010-05-20
Muster is a framework for scalable cluster analysis. It includes implementations of classic K-Medoids partitioning algorithms, as well as infrastructure for making these algorithms run scalably on very large systems. In particular, Muster contains algorithms such as CAPEK (described in reference 1) that are capable of clustering highly distributed data sets in-place on a hundred thousand or more processes.
Adaptive fuzzy leader clustering of complex data sets in pattern recognition
NASA Technical Reports Server (NTRS)
Newton, Scott C.; Pemmaraju, Surya; Mitra, Sunanda
1992-01-01
A modular, unsupervised neural network architecture for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns on-line in a stable and efficient manner. The initial classification is performed in two stages: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from fuzzy C-means system equations for the centroids and the membership values. The AFLC algorithm is applied to the Anderson Iris data and laser-luminescent fingerprint image data. It is concluded that the AFLC algorithm successfully classifies features extracted from real data, discrete or continuous.
Clustering recommendations to compute agent reputation
NASA Astrophysics Data System (ADS)
Bedi, Punam; Kaur, Harmeet
2005-03-01
Traditional centralized approaches to security are difficult to apply to multi-agent systems which are used nowadays in e-commerce applications. Developing a notion of trust that is based on the reputation of an agent can provide a softer notion of security that is sufficient for many multi-agent applications. Our paper proposes a mechanism for computing reputation of the trustee agent for use by the trustier agent. The trustier agent computes the reputation based on its own experience as well as the experience the peer agents have with the trustee agents. The trustier agents intentionally interact with the peer agents to get their experience information in the form of recommendations. We have also considered the case of unintentional encounters between the referee agents and the trustee agent, which can be directly between them or indirectly through a set of interacting agents. The clustering is done to filter off the noise in the recommendations in the form of outliers. The trustier agent clusters the recommendations received from referee agents on the basis of the distances between recommendations using the hierarchical agglomerative method. The dendogram hence obtained is cut at the required similarity level which restricts the maximum distance between any two recommendations within a cluster. The cluster with maximum number of elements denotes the views of the majority of recommenders. The center of this cluster represents the reputation of the trustee agent which can be computed using c-means algorithm.
[Cluster analysis in biomedical researches].
Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D
2013-01-01
Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research. PMID:24640781
NASA Astrophysics Data System (ADS)
Biosca, Josep Miquel; Lerma, José Luis
Terrestrial laser scanning is becoming a common surveying technique to measure quickly and accurately dense point clouds in 3-D. It simplifies measurement tasks on site. However, the massive volume of 3-D point measurements presents a challenge not only because of acquisition time and management of huge volumes of data, but also because of processing limitations on PCs. Raw laser scanner point clouds require a great deal of processing before final products can be derived. Thus, segmentation becomes an essential step whenever grouping of points with common attributes is required, and it is necessary for applications requiring the labelling of point clouds, surface extraction and classification into homogeneous areas. Segmentation algorithms can be classified as surface growing algorithms or clustering algorithms. This paper presents an unsupervised robust clustering approach based on fuzzy methods. Fuzzy parameters are analysed to adapt the unsupervised clustering methods to segmentation of laser scanner data. Both the Fuzzy C-Means (FCM) algorithm and the Possibilistic C-Means (PCM) mode-seeking algorithm are reviewed and used in combination with a similarity-driven cluster merging method. They constitute the kernel of the unsupervised fuzzy clustering method presented herein. It is applied to three point clouds acquired with different terrestrial laser scanners and scenarios: the first is an artificial (synthetic) data set that simulates a structure with different planar blocks; the second a composition of three metric ceramic gauge blocks (Grade 0, flatness tolerance ± 0.1 μm) recorded with a Konica Minolta Vivid 9i optical triangulation digitizer; the last is an outdoor data set that comes up to a modern architectural building collected from the centre of an open square. The amplitude-modulated-continuous-wave (AMCW) terrestrial laser scanner system, the Faro 880, was used for the acquisition of the latter data set. Experimental analyses of the results from the proposed unsupervised planar segmentation process are shown to be promising.
Ghorbanzadeh, Leila; Torshabi, Ahmad Esmaili; Nabipour, Jamshid Soltani; Arbatan, Moslem Ahmadi
2016-04-01
In image guided radiotherapy, in order to reach a prescribed uniform dose in dynamic tumors at thorax region while minimizing the amount of additional dose received by the surrounding healthy tissues, tumor motion must be tracked in real-time. Several correlation models have been proposed in recent years to provide tumor position information as a function of time in radiotherapy with external surrogates. However, developing an accurate correlation model is still a challenge. In this study, we proposed an adaptive neuro-fuzzy based correlation model that employs several data clustering algorithms for antecedent parameters construction to avoid over-fitting and to achieve an appropriate performance in tumor motion tracking compared with the conventional models. To begin, a comparative assessment is done between seven nuero-fuzzy correlation models each constructed using a unique data clustering algorithm. Then, each of the constructed models are combined within an adaptive sevenfold synthetic model since our tumor motion database has high degrees of variability and that each model has its intrinsic properties at motion tracking. In the proposed sevenfold synthetic model, best model is selected adaptively at pre-treatment. The model also updates the steps for each patient using an automatic model selectivity subroutine. We tested the efficacy of the proposed synthetic model on twenty patients (divided equally into two control and worst groups) treated with CyberKnife synchrony system. Compared to Cyberknife model, the proposed synthetic model resulted in 61.2% and 49.3% reduction in tumor tracking error in worst and control group, respectively. These results suggest that the proposed model selection program in our synthetic neuro-fuzzy model can significantly reduce tumor tracking errors. Numerical assessments confirmed that the proposed synthetic model is able to track tumor motion in real time with high accuracy during treatment. PMID:25765021
NASA Astrophysics Data System (ADS)
Turan, Muhammed K.; Sehirli, Eftal; Elen, Abdullah; Karas, Ismail R.
2015-07-01
Gel electrophoresis (GE) is one of the most used method to separate DNA, RNA, protein molecules according to size, weight and quantity parameters in many areas such as genetics, molecular biology, biochemistry, microbiology. The main way to separate each molecule is to find borders of each molecule fragment. This paper presents a software application that show columns edges of DNA fragments in 3 steps. In the first step the application obtains lane histograms of agarose gel electrophoresis images by doing projection based on x-axis. In the second step, it utilizes k-means clustering algorithm to classify point values of lane histogram such as left side values, right side values and undesired values. In the third step, column edges of DNA fragments is shown by using mean algorithm and mathematical processes to separate DNA fragments from the background in a fully automated way. In addition to this, the application presents locations of DNA fragments and how many DNA fragments exist on images captured by a scientific camera.
Kidney segmentation in CT sequences using SKFCM and improved GrowCut algorithm
2015-01-01
Background Organ segmentation is an important step in computer-aided diagnosis and pathology detection. Accurate kidney segmentation in abdominal computed tomography (CT) sequences is an essential and crucial task for surgical planning and navigation in kidney tumor ablation. However, kidney segmentation in CT is a substantially challenging work because the intensity values of kidney parenchyma are similar to those of adjacent structures. Results In this paper, a coarse-to-fine method was applied to segment kidney from CT images, which consists two stages including rough segmentation and refined segmentation. The rough segmentation is based on a kernel fuzzy C-means algorithm with spatial information (SKFCM) algorithm and the refined segmentation is implemented with improved GrowCut (IGC) algorithm. The SKFCM algorithm introduces a kernel function and spatial constraint into fuzzy c-means clustering (FCM) algorithm. The IGC algorithm makes good use of the continuity of CT sequences in space which can automatically generate the seed labels and improve the efficiency of segmentation. The experimental results performed on the whole dataset of abdominal CT images have shown that the proposed method is accurate and efficient. The method provides a sensitivity of 95.46% with specificity of 99.82% and performs better than other related methods. Conclusions Our method achieves high accuracy in kidney segmentation and considerably reduces the time and labor required for contour delineation. In addition, the method can be expanded to 3D segmentation directly without modification. PMID:26356850
NASA Astrophysics Data System (ADS)
Feng, Jian-xin; Tang, Jia-fu; Wang, Guang-xing
2007-04-01
On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.
NASA Astrophysics Data System (ADS)
Bellugi, Dino; Milledge, David G.; Dietrich, William E.; Perron, J. Taylor; McKean, Jim
2015-12-01
Predicting shallow landslide size and location across landscapes is important for understanding landscape form and evolution and for hazard identification. We test a recently developed model that couples a search algorithm with 3-D slope stability analysis that predicts these two key attributes in an intensively studied landscape with a 10 year landslide inventory. We use process-based submodels to estimate soil depth, root strength, and pore pressure for a sequence of landslide-triggering rainstorms. We parameterize submodels with field measurements independently of the slope stability model, without calibrating predictions to observations. The model generally reproduces observed landslide size and location distributions, overlaps 65% of observed landslides, and of these predicts size to within factors of 2 and 1.5 in 55% and 28% of cases, respectively. Five percent of the landscape is predicted unstable, compared to 2% recorded landslide area. Missed landslides are not due to the search algorithm but to the formulation and parameterization of the slope stability model and inaccuracy of observed landslide maps. Our model does not improve location prediction relative to infinite-slope methods but predicts landslide size, improves process representation, and reduces reliance on effective parameters. Increasing rainfall intensity or root cohesion generally increases landslide size and shifts locations down hollow axes, while increasing cohesion restricts unstable locations to areas with deepest soils. Our findings suggest that shallow landslide abundance, location, and size are ultimately controlled by covarying topographic, material, and hydrologic properties. Estimating the spatiotemporal patterns of root strength, pore pressure, and soil depth across a landscape may be the greatest remaining challenge.
Discovering fuzzy clusters in databases using an evolutionary approach
NASA Astrophysics Data System (ADS)
Chung, Lewis L. H.; Chan, Keith C. C.; Leung, Henry
2000-04-01
In this paper, we present a fuzzy clustering technique for relational database for data mining task. Clustering task for data mining application can be performed more effective if the technique is able to handle both continuous- and discrete- valued data commonly found in real-life relational databases. However, many of fuzzy clustering techniques such as fuzzy c- means are developed only for continuous-valued data due to their distance measure defined in the Euclidean space. When attributes are also characterized by discrete-valued attribute, they are unable to perform their task. Besides, how to deal with fuzzy input data in addition to mixed continuous and discrete is not clearly discussed. Instead of using a distance measure for defining similarity between records, we propose a technique based on a genetic algorithm (GA). By representing a specific grouping of records in a chromosome and using an objective measure as a fitness measure to determine if such grouping is meaningful and interesting, our technique is able to handle continuous, discrete, and even fuzzy input data. Unlike many of the existing clustering techniques, which can only produce the result of grouping with no interpretation, our proposed algorithm is able to generate a set of rules describing the interestingness of the discovered clusters. This feature, in turn, eases the understandability of the discovered result.
Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong
2015-01-01
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong
2015-01-01
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
NASA Astrophysics Data System (ADS)
Douglass, Michael; Bezak, Eva; Penfold, Scott
2015-04-01
The preliminary framework of a combined radiobiological model is developed and calibrated in the current work. The model simulates the production of individual cells forming a tumour, the spatial distribution of individual ionization events (using Geant4-DNA) and the stochastic biochemical repair of DNA double strand breaks (DSBs) leading to the prediction of survival or death of individual cells. In the current work, we expand upon a previously developed tumour generation and irradiation model to include a stochastic ionization damage clustering and DNA lesion repair model. The Geant4 code enabled the positions of each ionization event in the cells to be simulated and recorded for analysis. An algorithm was developed to cluster the ionization events in each cell into simple and complex double strand breaks. The two lesion kinetic (TLK) model was then adapted to predict DSB repair kinetics and the resultant cell survival curve. The parameters in the cell survival model were then calibrated using experimental cell survival data of V79 cells after low energy proton irradiation. A monolayer of V79 cells was simulated using the tumour generation code developed previously. The cells were then irradiated by protons with mean energies of 0.76 MeV and 1.9 MeV using a customized version of Geant4. By replicating the experimental parameters of a low energy proton irradiation experiment and calibrating the model with two sets of data, the model is now capable of predicting V79 cell survival after low energy (<2 MeV) proton irradiation for a custom set of input parameters. The novelty of this model is the realistic cellular geometry which can be irradiated using Geant4-DNA and the method in which the double strand breaks are predicted from clustering the spatial distribution of ionisation events. Unlike the original TLK model which calculates a tumour average cell survival probability, the cell survival probability is calculated for each cell in the geometric tumour model developed in the current work. This model uses fundamental measurable microscopic quantities such as genome length rather than macroscopic radiobiological quantities such as alpha/beta ratios. This means that the model can be theoretically used under a wide range of conditions with a single set of input parameters once calibrated for a given cell line.
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2014-03-01
We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the q-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported the idea of the GPU implementation for 2D models (Komura and Okabe, 2012). We here explain the details of sample programs, and discuss the performance of the present GPU implementation for the 3D Ising and XY models. We also show the calculated results of the moment ratio for these models, and discuss phase transitions. Catalogue identifier: AERM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 5632 No. of bytes in distributed program, including test data, etc.: 14688 Distribution format: tar.gz Programming language: C, CUDA. Computer: System with an NVIDIA CUDA enabled GPU. Operating system: System with an NVIDIA CUDA enabled GPU. Classification: 23. External routines: NVIDIA CUDA Toolkit 3.0 or newer Nature of problem: Monte Carlo simulation of classical spin systems. Ising, q-state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices. Solution method: GPU-based Swendsen-Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [1] and that by Kalentev et al. [2]. Restrictions: The system size is limited depending on the memory of a GPU. Running time: For the parameters used in the sample programs, it takes about a minute for each program. Of course, it depends on the system size, the number of Monte Carlo steps, etc. References: [1] K.A. Hawick, A. Leist, and D. P. Playne, Parallel Computing 36 (2010) 655-678 [2] O. Kalentev, A. Rai, S. Kemnitzb, and R. Schneider, J. Parallel Distrib. Comput. 71 (2011) 615-620
Lu, Jing; Chen, Lei; Yin, Jun; Huang, Tao; Bi, Yi; Kong, Xiangyin; Zheng, Mingyue; Cai, Yu-Dong
2016-04-01
Lung cancer, characterized by uncontrolled cell growth in the lung tissue, is the leading cause of global cancer deaths. Until now, effective treatment of this disease is limited. Many synthetic compounds have emerged with the advancement of combinatorial chemistry. Identification of effective lung cancer candidate drug compounds among them is a great challenge. Thus, it is necessary to build effective computational methods that can assist us in selecting for potential lung cancer drug compounds. In this study, a computational method was proposed to tackle this problem. The chemical-chemical interactions and chemical-protein interactions were utilized to select candidate drug compounds that have close associations with approved lung cancer drugs and lung cancer-related genes. A permutation test and K-means clustering algorithm were employed to exclude candidate drugs with low possibilities to treat lung cancer. The final analysis suggests that the remaining drug compounds have potential anti-lung cancer activities and most of them have structural dissimilarity with approved drugs for lung cancer. PMID:26849843
A quantitative comparison of functional MRI cluster analysis.
Dimitriadou, Evgenia; Barth, Markus; Windischberger, Christian; Hornik, Kurt; Moser, Ewald
2004-05-01
The aim of this work is to compare the efficiency and power of several cluster analysis techniques on fully artificial (mathematical) and synthesized (hybrid) functional magnetic resonance imaging (fMRI) data sets. The clustering algorithms used are hierarchical, crisp (neural gas, self-organizing maps, hard competitive learning, k-means, maximin-distance, CLARA) and fuzzy (c-means, fuzzy competitive learning). To compare these methods we use two performance measures, namely the correlation coefficient and the weighted Jaccard coefficient (wJC). Both performance coefficients (PCs) clearly show that the neural gas and the k-means algorithm perform significantly better than all the other methods using our setup. For the hierarchical methods the ward linkage algorithm performs best under our simulation design. In conclusion, the neural gas method seems to be the best choice for fMRI cluster analysis, given its correct classification of activated pixels (true positives (TPs)) whilst minimizing the misclassification of inactivated pixels (false positives (FPs)), and in the stability of the results achieved. PMID:15182847
Fuzzy Clustering Neural Networks for Real-Time Odor Recognition System
Karlık, Bekir; Yüksek, Kemal
2007-01-01
The aim of this study is to develop a novel fuzzy clustering neural network (FCNN) algorithm as pattern classifiers for real-time odor recognition system. In this type of FCNN, the input neurons activations are derived through fuzzy c mean clustering of the input data, so that the neural system could deal with the statistics of the measurement error directly. Then the performance of FCNN network is compared with the other network which is well-known algorithm, named multilayer perceptron (MLP), for the same odor recognition system. Experimental results show that both FCNN and MLP provided high recognition probability in determining various learn categories of odors, however, the FCNN neural system has better ability to recognize odors more than the MLP network. PMID:18368140
On the analysis of BIS stage epochs via fuzzy clustering.
Nasibov, Efendi; Ozgören, Murat; Ulutagay, Gözde; Oniz, Adile; Kocaaslan, Sibel
2010-06-01
Among various types of clustering methods, partition-based methods such as k-means and FCM are widely used in the analysis of such data. However, when duration between stimuli is different, such methods are not able to provide satisfactory results because they find equal size clusters according to the fundamental running principle of these methods. In such cases, neighborhood-based clustering methods can give more satisfactory results because measurement series are separated from one another according to dramatic breaking points. In recent years, bispectral index (BIS) monitoring, which is used for monitoring the level of anesthesia, has been used in sleep studies. Sleep stages are classically scored according to the Rechtschaffen and Kales (R&K) scoring system. BIS has been shown to have a strong correlation with the R&K scoring system. In this study, fuzzy neighborhood/density-based spatial clustering of applications with noise (FN-DBSCAN) that combines speed of the DBSCAN algorithm and robustness of the NRFJP algorithm is applied to BIS measurement series. As a result of experiments, we can conclude that, by using BIS data, the FN-DBSCAN method estimates sleep stages better than the fuzzy c-means method. PMID:20156029
Wang, Jianzhong; Kong, Jun; Lu, Yinghua; Qi, Miao; Zhang, Baoxue
2008-12-01
Image segmentation is often required as a preliminary and indispensable stage in the computer aided medical image process, particularly during the clinical analysis of magnetic resonance (MR) brain images. In this paper, we present a modified fuzzy c-means (FCM) algorithm for MRI brain image segmentation. In order to reduce the noise effect during segmentation, the proposed method incorporates both the local spatial context and the non-local information into the standard FCM cluster algorithm using a novel dissimilarity index in place of the usual distance metric. The efficiency of the proposed algorithm is demonstrated by extensive segmentation experiments using both simulated and real MR images and by comparison with other state of the art algorithms. PMID:18818051
NASA Astrophysics Data System (ADS)
Ward, W. O. C.; Wilkinson, P. B.; Chambers, J. E.; Oxby, L. S.; Bai, L.
2014-04-01
A novel method for the effective identification of bedrock subsurface elevation from electrical resistivity tomography images is described. Identifying subsurface boundaries in the topographic data can be difficult due to smoothness constraints used in inversion, so a statistical population-based approach is used that extends previous work in calculating isoresistivity surfaces. The analysis framework involves a procedure for guiding a clustering approach based on the fuzzy c-means algorithm. An approximation of resistivity distributions, found using kernel density estimation, was utilized as a means of guiding the cluster centroids used to classify data. A fuzzy method was chosen over hard clustering due to uncertainty in hard edges in the topography data, and a measure of clustering uncertainty was identified based on the reciprocal of cluster membership. The algorithm was validated using a direct comparison of known observed bedrock depths at two 3-D survey sites, using real-time GPS information of exposed bedrock by quarrying on one site, and borehole logs at the other. Results show similarly accurate detection as a leading isosurface estimation method, and the proposed algorithm requires significantly less user input and prior site knowledge. Furthermore, the method is effectively dimension-independent and will scale to data of increased spatial dimensions without a significant effect on the runtime. A discussion on the results by automated versus supervised analysis is also presented.
Adaptive Clustering of Hypermedia Documents.
ERIC Educational Resources Information Center
Johnson, Andrew; Fotouhi, Farshad
1996-01-01
Discussion of hypermedia systems focuses on a comparison of two types of adaptive algorithm (genetic algorithm and neural network) in clustering hypermedia documents. These clusters allow the user to index into the nodes to find needed information more quickly, since clustering is "personalized" based on the user's paths rather than representing…
Algorithms and Algorithmic Languages.
ERIC Educational Resources Information Center
Veselov, V. M.; Koprov, V. M.
This paper is intended as an introduction to a number of problems connected with the description of algorithms and algorithmic languages, particularly the syntaxes and semantics of algorithmic languages. The terms "letter, word, alphabet" are defined and described. The concept of the algorithm is defined and the relation between the algorithm and
NASA Astrophysics Data System (ADS)
Hsu, Kuo-Hsien
2012-11-01
Formosat-2 image is a kind of high-spatial-resolution (2 meters GSD) remote sensing satellite data, which includes one panchromatic band and four multispectral bands (Blue, Green, Red, near-infrared). An essential sector in the daily processing of received Formosat-2 image is to estimate the cloud statistic of image using Automatic Cloud Coverage Assessment (ACCA) algorithm. The information of cloud statistic of image is subsequently recorded as an important metadata for image product catalog. In this paper, we propose an ACCA method with two consecutive stages: preprocessing and post-processing analysis. For pre-processing analysis, the un-supervised K-means classification, Sobel's method, thresholding method, non-cloudy pixels reexamination, and cross-band filter method are implemented in sequence for cloud statistic determination. For post-processing analysis, Box-Counting fractal method is implemented. In other words, the cloud statistic is firstly determined via pre-processing analysis, the correctness of cloud statistic of image of different spectral band is eventually cross-examined qualitatively and quantitatively via post-processing analysis. The selection of an appropriate thresholding method is very critical to the result of ACCA method. Therefore, in this work, We firstly conduct a series of experiments of the clustering-based and spatial thresholding methods that include Otsu's, Local Entropy(LE), Joint Entropy(JE), Global Entropy(GE), and Global Relative Entropy(GRE) method, for performance comparison. The result shows that Otsu's and GE methods both perform better than others for Formosat-2 image. Additionally, our proposed ACCA method by selecting Otsu's method as the threshoding method has successfully extracted the cloudy pixels of Formosat-2 image for accurate cloud statistic estimation.
Matlab Cluster Ensemble Toolbox
Sapio, Vincent De; Kegelmeyer, Philip
2009-04-27
This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. With regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.
Delineation of river bed-surface patches by clustering high-resolution spatial grain size data
NASA Astrophysics Data System (ADS)
Nelson, Peter A.; Bellugi, Dino; Dietrich, William E.
2014-01-01
The beds of gravel-bed rivers commonly display distinct sorting patterns, which at length scales of ~ 0.1 - 1 channel widths appear to form an organization of patches or facies. This paper explores alternatives to traditional visual facies mapping by investigating methods of patch delineation in which clustering analysis is applied to a high-resolution grid of spatial grain-size distributions (GSDs) collected during a flume experiment. Specifically, we examine four clustering techniques: 1) partitional clustering of grain-size distributions with the k-means algorithm (assigning each GSD to a type of patch based solely on its distribution characteristics), 2) spatially-constrained agglomerative clustering ("growing" patches by merging adjacent GSDs, thus generating a hierarchical structure of patchiness), 3) spectral clustering using Normalized Cuts (using the spatial distance between GSDs and the distribution characteristics to generate a matrix describing the similarity between all GSDs, and using the eigenvalues of this matrix to divide the bed into patches), and 4) fuzzy clustering with the fuzzy c-means algorithm (assigning each GSD a membership probability to every patch type). For each clustering method, we calculate metrics describing how well-separated cluster-average GSDs are and how patches are arranged in space. We use these metrics to compute optimal clustering parameters, to compare the clustering methods against each other, and to compare clustering results with patches mapped visually during the flume experiment.All clustering methods produced better-separated patch GSDs than the visually-delineated patches. Although they do not produce crisp cluster assignment, fuzzy algorithms provide useful information that can characterize the uncertainty of a location on the bed belonging to any particular type of patch, and they can be used to characterize zones of transition from one patch to another. The extent to which spatial information influences clustering leads to a trade-off between the quality of GSD separation between patch types and the spatial coherence of patches. Methods incorporating spatial information during the clustering process tended to produce a finite number of types of patches. As methods improve for collecting high-resolution grain size data, the approaches described here can be scaled up to field studies to better characterize the grain size heterogeneity of river beds.
NASA Astrophysics Data System (ADS)
Khateri, Parisa; Rad, Hamidreza Saligheh; Jafari, Amir Homayoun; Ay, Mohammad Reza
2014-01-01
Quantitative PET image reconstruction requires an accurate map of attenuation coefficients of the tissue under investigation at 511 keV (μ-map), and in order to correct the emission data for attenuation. The use of MRI-based attenuation correction (MRAC) has recently received lots of attention in the scientific literature. One of the major difficulties facing MRAC has been observed in the areas where bone and air collide, e.g. ethmoidal sinuses in the head area. Bone is intrinsically not detectable by conventional MRI, making it difficult to distinguish air from bone. Therefore, development of more versatile MR sequences to label the bone structure, e.g. ultra-short echo-time (UTE) sequences, certainly plays a significant role in novel methodological developments. However, long acquisition time and complexity of UTE sequences limit its clinical applications. To overcome this problem, we developed a novel combination of Short-TE (ShTE) pulse sequence to detect bone signal with a 2-point Dixon technique for water-fat discrimination, along with a robust image segmentation method based on fuzzy clustering C-means (FCM) to segment the head area into four classes of air, bone, soft tissue and adipose tissue. The imaging protocol was set on a clinical 3 T Tim Trio and also 1.5 T Avanto (Siemens Medical Solution, Erlangen, Germany) employing a triple echo time pulse sequence in the head area. The acquisition parameters were as follows: TE1/TE2/TE3=0.98/4.925/6.155 ms, TR=8 ms, FA=25 on the 3 T system, and TE1/TE2/TE3=1.1/2.38/4.76 ms, TR=16 ms, FA=18 for the 1.5 T system. The second and third echo-times belonged to the Dixon decomposition to distinguish soft and adipose tissues. To quantify accuracy, sensitivity and specificity of the bone segmentation algorithm, resulting classes of MR-based segmented bone were compared with the manual segmented one by our expert neuro-radiologist. Results for both 3 T and 1.5 T systems show that bone segmentation applied in several slices yields average accuracy, sensitivity and specificity higher than 90%. Results indicate that FCM is an appropriate technique for tissue classification in the sinusoidal area where there is air-bone interface. Furthermore, using Dixon method, fat and brain tissues were successfully separated.
Haplotyping Problem, A Clustering Approach
NASA Astrophysics Data System (ADS)
Eslahchi, Changiz; Sadeghi, Mehdi; Pezeshk, Hamid; Kargar, Mehdi; Poormohammadi, Hadi
2007-09-01
Construction of two haplotypes from a set of Single Nucleotide Polymorphism (SNP) fragments is called haplotype reconstruction problem. One of the most popular computational model for this problem is Minimum Error Correction (MEC). Since MEC is an NP-hard problem, here we propose a novel heuristic algorithm based on clustering analysis in data mining for haplotype reconstruction problem. Based on hamming distance and similarity between two fragments, our iterative algorithm produces two clusters of fragments; then, in each iteration, the algorithm assigns a fragment to one of the clusters. Our results suggest that the algorithm has less reconstruction error rate in comparison with other algorithms.
Haplotyping Problem, A Clustering Approach
Eslahchi, Changiz; Sadeghi, Mehdi; Pezeshk, Hamid; Kargar, Mehdi; Poormohammadi, Hadi
2007-09-06
Construction of two haplotypes from a set of Single Nucleotide Polymorphism (SNP) fragments is called haplotype reconstruction problem. One of the most popular computational model for this problem is Minimum Error Correction (MEC). Since MEC is an NP-hard problem, here we propose a novel heuristic algorithm based on clustering analysis in data mining for haplotype reconstruction problem. Based on hamming distance and similarity between two fragments, our iterative algorithm produces two clusters of fragments; then, in each iteration, the algorithm assigns a fragment to one of the clusters. Our results suggest that the algorithm has less reconstruction error rate in comparison with other algorithms.
Weigend, Florian
2014-10-07
Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf{sub 12} and [LaPb{sub 7}Bi{sub 7}]{sup 4−}. For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the “pure” genetic algorithm.
Weigend, Florian
2014-10-01
Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf12 and [LaPb7Bi7](4-). For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the "pure" genetic algorithm. PMID:25296780
NASA Astrophysics Data System (ADS)
Ergun, Bahadir; Sahin, Cumhur; Ustuntas, Taner
2014-01-01
Terrestrial Laser Scanners (TLS) are used frequently in three dimensional documentation studies and present an alternative method for three dimensional modeling without any deformation of scale. In this study, point cloud data segmentation is used for photogrammetrical image data production from laser scanner data. The segmentation studies suggest several methods for automation of curve surface determination for digital terrain modeling. In this study, fuzzy logic approach has been used for the automatic segmentation of the regular curve surfaces which differ in their depths to the instrument. This type of shapes has been usually observed in the dome surfaces for close range architectural documentation. The model of C-means integrated fuzzy logic approach has been developed with MatLAB 7.0 software. Gauss2mf membership functions algorithm has been tested with original data set. These results were used in photogrammetric 3D modeling process. As the result of the study, testing the results of point cloud data set has been discussed and interpreted with all of its advantages and disadvantages in Section 5.
Wang, Hesheng; Feyes, Denise; Mulvihill, John; Oleinick, Nancy; MacLennan, Gregory; Fei, Baowei
2013-01-01
We are investigating in vivo small animal imaging and analysis methods for the assessment of photodynamic therapy (PDT), an emerging therapeutic modality for cancer treatment. Multiple weighted MR images were acquired from tumor-bearing mice pre- and post-PDT and 24-hour after PDT. We developed an automatic image classification method to differentiate live, necrotic and intermediate tissues within the treated tumor on the MR images. We used a multiscale diffusion filter to process the MR images before classification. A multiscale fuzzy C-means (FCM) classification method was applied along the scales. The object function of the standard FCM was modified to allow multiscale classification processing where the result from a coarse scale is used to supervise the classification in the next scale. The multiscale fuzzy C-means (MFCM) method takes noise levels and partial volume effects into the classification processing. The method was validated by simulated MR images with various noise levels. For simulated data, the classification method achieved 96.0 ± 1.1% overlap ratio. For real mouse MR images, the classification results of the treated tumors were validated by histologic images. The overlap ratios were 85.6 ± 5.1%, 82.4 ± 7.8% and 80.5 ± 10.2% for the live, necrotic, and intermediate tissues, respectively. The MR imaging and the MFCM classification methods may provide a useful tool for the assessment of the tumor response to photodynamic therapy in vivo. PMID:24386526
NASA Astrophysics Data System (ADS)
Wang, Hesheng; Feyes, Denise; Mulvihill, John; Oleinick, Nancy; MacLennan, Gregory; Fei, Baowei
2007-03-01
We are investigating in vivo small animal imaging and analysis methods for the assessment of photodynamic therapy (PDT), an emerging therapeutic modality for cancer treatment. Multiple weighted MR images were acquired from tumor-bearing mice pre- and post-PDT and 24-hour after PDT. We developed an automatic image classification method to differentiate live, necrotic and intermediate tissues within the treated tumor on the MR images. We used a multiscale diffusion filter to process the MR images before classification. A multiscale fuzzy C-means (FCM) classification method was applied along the scales. The object function of the standard FCM was modified to allow multiscale classification processing where the result from a coarse scale is used to supervise the classification in the next scale. The multiscale fuzzy C-means (MFCM) method takes noise levels and partial volume effects into the classification processing. The method was validated by simulated MR images with various noise levels. For simulated data, the classification method achieved 96.0 +/- 1.1% overlap ratio. For real mouse MR images, the classification results of the treated tumors were validated by histologic images. The overlap ratios were 85.6 +/- 5.1%, 82.4 +/- 7.8% and 80.5 +/- 10.2% for the live, necrotic, and intermediate tissues, respectively. The MR imaging and the MFCM classification methods may provide a useful tool for the assessment of the tumor response to photodynamic therapy in vivo.
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.; Bensaid, Amine M.; Clarke, Laurence P.; Velthuizen, Robert P.; Silbiger, Martin S.; Bezdek, James C.
1992-01-01
Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms and a supervised computational neural network, a dynamic multilayered perception trained with the cascade correlation learning algorithm. Initial clinical results are presented on both normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. However, for a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed.
Cluster analysis for pattern recognition in solar butterfly diagrams
NASA Astrophysics Data System (ADS)
Illarionov, E.; Sokoloff, D.; Arlt, R.; Khlystova, A.
2011-07-01
We investigate to what extent the wings of solar butterfly diagrams can be separated without an explicit usage of Hale's polarity law as well as the location of the solar equator. We apply two algorithms of cluster analysis for this purpose, namely DBSCAN and C-means, and demonstrate their ability to separate the wings of contemporary butterfly diagrams based on the sunspot group density in the diagram only. Then we apply the method to historical data concerning the solar activity in the 18th century (Staudacher data). The method separates the two wings for Cycle 2, but fails to separate them for Cycle 1. In our opinion, this finding supports the interpretation of the Staudacher data as an indication of the unusual nature of the solar cycle in the 18th century.
A Linear Algebra Measure of Cluster Quality.
ERIC Educational Resources Information Center
Mather, Laura A.
2000-01-01
Discussion of models for information retrieval focuses on an application of linear algebra to text clustering, namely, a metric for measuring cluster quality based on the theory that cluster quality is proportional to the number of terms that are disjoint across the clusters. Explains term-document matrices and clustering algorithms. (Author/LRW)
Relation chain based clustering analysis
NASA Astrophysics Data System (ADS)
Zhang, Cheng-ning; Zhao, Ming-yang; Luo, Hai-bo
2011-08-01
Clustering analysis is currently one of well-developed branches in data mining technology which is supposed to find the hidden structures in the multidimensional space called feature or pattern space. A datum in the space usually possesses a vector form and the elements in the vector represent several specifically selected features. These features are often of efficiency to the problem oriented. Generally, clustering analysis goes into two divisions: one is based on the agglomerative clustering method, and the other one is based on divisive clustering method. The former refers to a bottom-up process which regards each datum as a singleton cluster while the latter refers to a top-down process which regards entire data as a cluster. As the collected literatures, it is noted that the divisive clustering is currently overwhelming both in application and research. Although some famous divisive clustering methods are designed and well developed, clustering problems are still far from being solved. The k - means algorithm is the original divisive clustering method which initially assigns some important index values, such as the clustering number and the initial clustering prototype positions, and that could not be reasonable in some certain occasions. More than the initial problem, the k - means algorithm may also falls into local optimum, clusters in a rigid way and is not available for non-Gaussian distribution. One can see that seeking for a good or natural clustering result, in fact, originates from the one's understanding of the concept of clustering. Thus, the confusion or misunderstanding of the definition of clustering always derives some unsatisfied clustering results. One should consider the definition deeply and seriously. This paper demonstrates the nature of clustering, gives the way of understanding clustering, discusses the methodology of designing a clustering algorithm, and proposes a new clustering method based on relation chains among 2D patterns. In this paper, a new method called relation chain based clustering is presented. The given method demonstrates that arbitrary distribution shape and density are not the essential factors for clustering research, in another words, clusters described by some particular expressions should be considered as a uniform mathematical description which is called "relation chain" emphasized in this paper. The relation chain indicates the relation between each pair of the spatial points and gives the evaluation of the connection between the pair-wise points. This relation chain based clustering algorithm initially assigns the neighborhood evaluation radius of the points, then assesses the clustering result based on inner-cluster variance of each cluster while increasing the radius, adjusting the radius properly and finally gives the clustering result. Some experiments are conducted using the proposed method and the hidden data structure is well explored.
NASA Astrophysics Data System (ADS)
Zhou, Pu-cheng; Liu, Cun-chao
2013-08-01
Camouflaged targets detection in complex background is a challenging problem. Spectral-polarimetric imaging can offers spectral information and polarization information from the objects in the scene. Fusion of the spectral and polarization information in the images will result in better camouflaged target identification and recognition. In this paper a novel spectral-polarimetric image fusion algorithm based on Shearlet transform is proposed. Firstly, every polarimetric image in each wave band is decomposed into images of low frequency components and high frequency components by Shearlet transform. Then, the fused low frequency approximate coefficients are obtained with weighted average method, and the fused high frequency coefficients are obtained with area-based feature selection method, so features and details from different spectral-polarimetric images are fused successfully. After that, the kernel fuzzy c-means clustering algorithm is used for camouflaged object separation from its background. Experimental results have shown that better identification performance was achieved.
Ugulu, Ilker; Aydin, Halil
2016-01-01
We propose an approach to clustering and visualization of students' cognitive structural models. We use the self-organizing map (SOM) combined with Ward's clustering to conduct cluster analysis. In the study carried out on 100 subjects, a conceptual understanding test consisting of open-ended questions was used as a data collection tool. The results of analyses indicated that students constructed the aliveness concept by associating it predominantly with human. Motion appeared as the most frequently associated term with the aliveness concept. The results suggest that the aliveness concept has been constructed using anthropocentric and animistic cognitive structures. In the next step, we used the data obtained from the conceptual understanding test for training the SOM. Consequently, we propose a visualization method about cognitive structure of the aliveness concept. PMID:26819579
Yorek, Nurettin; Ugulu, Ilker; Aydin, Halil
2016-01-01
We propose an approach to clustering and visualization of students' cognitive structural models. We use the self-organizing map (SOM) combined with Ward's clustering to conduct cluster analysis. In the study carried out on 100 subjects, a conceptual understanding test consisting of open-ended questions was used as a data collection tool. The results of analyses indicated that students constructed the aliveness concept by associating it predominantly with human. Motion appeared as the most frequently associated term with the aliveness concept. The results suggest that the aliveness concept has been constructed using anthropocentric and animistic cognitive structures. In the next step, we used the data obtained from the conceptual understanding test for training the SOM. Consequently, we propose a visualization method about cognitive structure of the aliveness concept. PMID:26819579
A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.
Liu, Zhi; Xu, Shuqiong; Zhang, Yun; Chen, Chun Lung Philip
2014-11-01
This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method. PMID:25248211
Brightest Cluster Galaxy Identification
NASA Astrophysics Data System (ADS)
Leisman, Luke; Haarsma, D. B.; Sebald, D. A.; ACCEPT Team
2011-01-01
Brightest cluster galaxies (BCGs) play an important role in several fields of astronomical research. The literature includes many different methods and criteria for identifying the BCG in the cluster, such as choosing the brightest galaxy, the galaxy nearest the X-ray peak, or the galaxy with the most extended profile. Here we examine a sample of 75 clusters from the Archive of Chandra Cluster Entropy Profile Tables (ACCEPT) and the Sloan Digital Sky Survey (SDSS), measuring masked magnitudes and profiles for BCG candidates in each cluster. We first identified galaxies by hand; in 15% of clusters at least one team member selected a different galaxy than the others.We also applied 6 other identification methods to the ACCEPT sample; in 30% of clusters at least one of these methods selected a different galaxy than the other methods. We then developed an algorithm that weighs brightness, profile, and proximity to the X-ray peak and centroid. This algorithm incorporates the advantages of by-hand identification (weighing multiple properties) and automated selection (repeatable and consistent). The BCG population chosen by the algorithm is more uniform in its properties than populations selected by other methods, particularly in the relation between absolute magnitude (a proxy for galaxy mass) and average gas temperature (a proxy for cluster mass). This work supported by a Barry M. Goldwater Scholarship and a Sid Jansma Summer Research Fellowship.
NASA Astrophysics Data System (ADS)
Li, Chaofeng; Yang, Maolong; Shi, Chengxian; Xia, Deshen
2003-09-01
Road extracted from satellite imagery have been used for many different purposes, e.g. military, map publishing, transportation, and car navigations, etc. Many method such as, neural network, Knowledge-based, Optimal search, Snake model, Semantic model, Road operator model, etc. was researched to identify road from satellite image, but because of complicated characteristics of road and image itself, and automated road network extraction still remains a challenge problem, and no existing software is able to perform the task reliably. This paper presents a hybrid method which combines Fuzzy-C-Means with back-propagation neural network and knowledge processing technique to detect roads in SPOT image. The basic idea of the paper is "easiest first" principal, and firstly focus to extract local salient road segments most easily and reliably, then use contextual knowledge and supervised back-propagation neural network model to extract fuzzy road segments among salient road segment, and then grouping these extracted pixel as seed point, candidate point, and not-road point, and then according to appropriate knowledge rule to traversal and join, guide the further road link in the whole image. At last, some post-processing steps are taken to refine the result. The resultant image shows this hybrid identification method performs better than only using knowledge-based method or neural network techniques.
NASA Astrophysics Data System (ADS)
Arrell, K. E.; Fisher, P. F.; Tate, N. J.; Bastin, L.
2007-10-01
The increasing global coverage of high resolution/large-scale digital elevation data has allowed the study of geomorphological form to receive renewed attention by providing accessible datasets for the characterisation and quantification of land surfaces. Digital elevation models (DEMs) provide quantitative elevation data, but it is the characterisation and extraction of geomorphologically significant measures (morphometric indices) from these raw data that form more informative and useful datasets. Common to many geographical measures, morphometric measures derived from DEMs are dependent on the scale of observation. This paper reports results of employing a fuzzy c-means classification for a sample DEM from Snowdonia, Wales, with a number of morphometric measures at different resolutions as input, and morphometric classification of landforms at each resolution as output. The classifications reveal that different landscape components or morphometric classes are important at different resolutions, and that morphometric classes exhibit resolution dependency in their geographical extents. Examination of the scale dependency and behaviour of morphometric classifications of landforms at different resolutions provides a fuller and more holistic view of the classes present than a single-scale analysis.
Symmetry Based Automatic Evolution of Clusters: A New Approach to Data Clustering
Vijendra, Singh; Laxman, Sahoo
2015-01-01
We present a multiobjective genetic clustering approach, in which data points are assigned to clusters based on new line symmetry distance. The proposed algorithm is called multiobjective line symmetry based genetic clustering (MOLGC). Two objective functions, first the Davies-Bouldin (DB) index and second the line symmetry distance based objective functions, are used. The proposed algorithm evolves near-optimal clustering solutions using multiple clustering criteria, without a priori knowledge of the actual number of clusters. The multiple randomized K dimensional (Kd) trees based nearest neighbor search is used to reduce the complexity of finding the closest symmetric points. Experimental results based on several artificial and real data sets show that proposed clustering algorithm can obtain optimal clustering solutions in terms of different cluster quality measures in comparison to existing SBKM and MOCK clustering algorithms. PMID:26339233
Garcia, Claudio; de Carvalho Berni, Cássio; de Oliveira, Carlos Eduardo Neri
2008-04-01
This paper presents the design and implementation of an embedded soft sensor, i.e., a generic and autonomous hardware module, which can be applied to many complex plants, wherein a certain variable cannot be directly measured. It is implemented based on a fuzzy identification algorithm called "Limited Rules", employed to model continuous nonlinear processes. The fuzzy model has a Takagi-Sugeno-Kang structure and the premise parameters are defined based on the Fuzzy C-Means (FCM) clustering algorithm. The firmware contains the soft sensor and it runs online, estimating the target variable from other available variables. Tests have been performed using a simulated pH neutralization plant. The results of the embedded soft sensor have been considered satisfactory. A complete embedded inferential control system is also presented, including a soft sensor and a PID controller. PMID:17981281
Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold
2014-12-01
In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings. PMID:24802018
Linear Coherent Bi-cluster Discovery via Beam Detection and Sample Set Clustering
NASA Astrophysics Data System (ADS)
Shi, Yi; Hasan, Maryam; Cai, Zhipeng; Lin, Guohui; Schuurmans, Dale
We propose a new bi-clustering algorithm, LinCoh, for finding linear coherent bi-clusters in gene expression microarray data. Our method exploits a robust technique for identifying conditionally correlated genes, combined with an efficient density based search for clustering sample sets. Experimental results on both synthetic and real datasets demonstrated that LinCoh consistently finds more accurate and higher quality bi-clusters than existing bi-clustering algorithms.
Slonim, Noam; Atwal, Gurinder Singh; Tkačik, Gašper; Bialek, William
2005-01-01
In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial assumptions about the structure of data. Here, we reformulate the clustering problem from an information theoretic perspective that avoids many of these assumptions. In particular, our formulation obviates the need for defining a cluster “prototype,” does not require an a priori similarity metric, is invariant to changes in the representation of the data, and naturally captures nonlinear relations. We apply this approach to different domains and find that it consistently produces clusters that are more coherent than those extracted by existing algorithms. Finally, our approach provides a way of clustering based on collective notions of similarity rather than the traditional pairwise measures. PMID:16352721
Bayesian Decision Theoretical Framework for Clustering
ERIC Educational Resources Information Center
Chen, Mo
2011-01-01
In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the
Bayesian Decision Theoretical Framework for Clustering
ERIC Educational Resources Information Center
Chen, Mo
2011-01-01
In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the…
Orbit Clustering Based on Transfer Cost
NASA Technical Reports Server (NTRS)
Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.
2013-01-01
We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.
Matlab Cluster Ensemble Toolbox
Energy Science and Technology Software Center (ESTSC)
2009-04-27
This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. Withmore » regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.« less
Clustering of Multi-Temporal Fully Polarimetric L-Band SAR Data for Agricultural Land Cover Mapping
NASA Astrophysics Data System (ADS)
Tamiminia, H.; Homayouni, S.; Safari, A.
2015-12-01
Recently, the unique capabilities of Polarimetric Synthetic Aperture Radar (PolSAR) sensors make them an important and efficient tool for natural resources and environmental applications, such as land cover and crop classification. The aim of this paper is to classify multi-temporal full polarimetric SAR data using kernel-based fuzzy C-means clustering method, over an agricultural region. This method starts with transforming input data into the higher dimensional space using kernel functions and then clustering them in the feature space. Feature space, due to its inherent properties, has the ability to take in account the nonlinear and complex nature of polarimetric data. Several SAR polarimetric features extracted using target decomposition algorithms. Features from Cloude-Pottier, Freeman-Durden and Yamaguchi algorithms used as inputs for the clustering. This method was applied to multi-temporal UAVSAR L-band images acquired over an agricultural area near Winnipeg, Canada, during June and July in 2012. The results demonstrate the efficiency of this approach with respect to the classical methods. In addition, using multi-temporal data in the clustering process helped to investigate the phenological cycle of plants and significantly improved the performance of agricultural land cover mapping.
Gong, Hui; Chen, Shangbin; Zhang, Bin; Ding, Wenxiang; Luo, Qingming; Li, Anan
2014-01-01
Characterizing cytoarchitecture is crucial for understanding brain functions and neural diseases. In neuroanatomy, it is an important task to accurately extract cell populations' centroids and contours. Recent advances have permitted imaging at single cell resolution for an entire mouse brain using the Nissl staining method. However, it is difficult to precisely segment numerous cells, especially those cells touching each other. As presented herein, we have developed an automated three-dimensional detection and segmentation method applied to the Nissl staining data, with the following two key steps: 1) concave points clustering to determine the seed points of touching cells; and 2) random walker segmentation to obtain cell contours. Also, we have evaluated the performance of our proposed method with several mouse brain datasets, which were captured with the micro-optical sectioning tomography imaging system, and the datasets include closely touching cells. Comparing with traditional detection and segmentation methods, our approach shows promising detection accuracy and high robustness. PMID:25111442
Color segmentation using MDL clustering
NASA Astrophysics Data System (ADS)
Wallace, Richard S.; Suenaga, Yasuhito
1991-02-01
This paper describes a procedure for segmentation of color face images. A cluster analysis algorithm uses a subsample of the input image color pixels to detect clusters in color space. The clustering program consists of two parts. The first part searches for a hierarchical clustering using the NIHC algorithm. The second part searches the resultant cluster tree for a level clustering having minimum description length (MDL). One of the primary advantages of the MDL paradigm is that it enables writing robust vision algorithms that do not depend on user-specified threshold parameters or other " magic numbers. " This technical note describes an application of minimal length encoding in the analysis of digitized human face images at the NTT Human Interface Laboratories. We use MDL clustering to segment color images of human faces. For color segmentation we search for clusters in color space. Using only a subsample of points from the original face image our clustering program detects color clusters corresponding to the hair skin and background regions in the image. Then a maximum likelyhood classifier assigns the remaining pixels to each class. The clustering program tends to group small facial features such as the nostrils mouth and eyes together but they can be separated from the larger classes through connected components analysis.
NASA Astrophysics Data System (ADS)
Siebra, Hélio; Carvalho, Bruno M.; Garduño, Edgar
2015-01-01
Digital image segmentation is the process of assigning distinct labels to different objects in a digital image, and clustering techniques can be used to achieve such segmentations. However, many traditional segmentation algorithm fail to segment objects that are characterized by textures whose patterns cannot be successfully described by simple statistics computed over a very restricted area. In this paper we present a fuzzy clustering algorithm that achieves the segmentation of images with color textures by employing a distance function based on the Skew Divergence, that is based on the well-known Kullback-Leibler Divergence. In order for such a distance to produce good results when applied to color images, we reduced the dimensionality of the image's histogram, thus eliminating the sparsity of the color histogram and speeding up the execution of the algorithm. We performed experiments on thin rock sections and compared our results to the segmentations obtained by the Fuzzy C-Means and by another fuzzy segmentation technique, showing the superiority of our approach.
Comparing clustering and partitioning strategies
NASA Astrophysics Data System (ADS)
Afonso, Carlos; Ferreira, Fábio; Exposto, José; Pereira, Ana I.
2012-09-01
In this work we compare balance and edge-cut evaluation metrics to measure the performance of two wellknown graph data-grouping algorithms applied to four web and social network graphs. One of the algorithms employs a partitioning technique using Kmetis tool, and the other employs a clustering technique using Scluster tool. Because clustering algorithms use a similarity measure between each graph node and partitioning algorithms use a dissimilarity measure (weight), it was necessary to apply a normalized function to convert weighted graphs to similarity matrices.
Ghaheri, Salehe; Masoum, Saeed; Gholami, Ali
2016-01-15
Analysis of fragrance composition is very important for both the fragrance producers and consumers. Unraveling of fragrance formulation is necessary for quality control, competitor and trace analysis. Gas chromatography-mass spectrometry (GC-MS) has been introduced as the most appropriate analytical technique for this type of analysis, which is based on Kovats index and MS database. The most straightforward method to analyze a GC-MS dataset is to integrate those peaks that can be recognized by their mass profiles. But, because of common problems of chromatographic data such as spectral background, baseline offset and specially overlapped peaks, accurate quantitative and qualitative analysis could be failed. Some chemometric modeling techniques such as bilinear multivariate curve resolution (MCR) methods have been introduced to overcome these problems and obtained well resolved chromatographic profiles. The main drawback of these methods is rotational ambiguity or nonunique solution that is represented as area of feasible solutions (AFS). Polygonal inflation algorithm (PIA) is an automatic and simple to use algorithm for numerical computation of AFS. In this study, the extent of rotational ambiguity in curve resolution methods is calculated by MCR-BAND toolbox and the PIA. The ability of the PIA in resolving GC-MS data sets is evaluated by simulated GC-MS data in comparison with other popular curve resolution methods such as multivariate curve resolution alternative least square (MCR-ALS), multivariate curve resolution objective function minimization (MCR-FMIN) by different initial estimation methods and independent component analysis (ICA). In addition, two typical challenging area of total ion chromatogram (TIC) of commercial fragrances with overlapped peaks were analyzed by the PIA to investigate the possibility of peak deconvolution analysis. PMID:26711156
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth
2015-01-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
City block distance and rough-fuzzy clustering for identification of co-expressed microRNAs.
Paul, Sushmita; Maji, Pradipta
2014-06-01
The microRNAs or miRNAs are short, endogenous RNAs having ability to regulate mRNA expression at the post-transcriptional level. Various studies have revealed that miRNAs tend to cluster on chromosomes. The members of a cluster that are in close proximity on chromosomes are highly likely to be processed as co-transcribed units. Therefore, a large proportion of miRNAs are co-expressed. Expression profiling of miRNAs generates a huge volume of data. Complicated networks of miRNA-mRNA interaction increase the challenges of comprehending and interpreting the resulting mass of data. In this regard, this paper presents a clustering algorithm in order to extract meaningful information from miRNA expression data. It judiciously integrates the merits of rough sets, fuzzy sets, the c-means algorithm, and the normalized range-normalized city block distance to discover co-expressed miRNA clusters. While the membership functions of fuzzy sets enable efficient handling of overlapping partitions in a noisy environment, the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition. The city block distance is used to compute the membership functions of fuzzy sets and to find initial partition of a data set, and therefore helps to handle minute differences between two miRNA expression profiles. The effectiveness of the proposed approach, along with a comparison with other related methods, is demonstrated for several miRNA expression data sets using different cluster validity indices. Moreover, the gene ontology is used to analyze the functional consistency and biological significance of generated miRNA clusters. PMID:24682049
NASA Technical Reports Server (NTRS)
Gregory, Kyle J.; Hill, Joanne E. (Editor); Black, J. Kevin; Baumgartner, Wayne H.; Jahoda, Keith
2016-01-01
A fundamental challenge in a spaceborne application of a gas-based Time Projection Chamber (TPC) for observation of X-ray polarization is handling the large amount of data collected. The TPC polarimeter described uses the APV-25 Application Specific Integrated Circuit (ASIC) to readout a strip detector. Two dimensional photoelectron track images are created with a time projection technique and used to determine the polarization of the incident X-rays. The detector produces a 128x30 pixel image per photon interaction with each pixel registering 12 bits of collected charge. This creates challenging requirements for data storage and downlink bandwidth with only a modest incidence of photons and can have a significant impact on the overall mission cost. An approach is described for locating and isolating the photoelectron track within the detector image, yielding a much smaller data product, typically between 8x8 pixels and 20x20 pixels. This approach is implemented using a Microsemi RT-ProASIC3-3000 Field-Programmable Gate Array (FPGA), clocked at 20 MHz and utilizing 10.7k logic gates (14% of FPGA), 20 Block RAMs (17% of FPGA), and no external RAM. Results will be presented, demonstrating successful photoelectron track cluster detection with minimal impact to detector dead-time.
Swarm Intelligence in Text Document Clustering
Cui, Xiaohui; Potok, Thomas E
2008-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.
NASA Astrophysics Data System (ADS)
Liu, Xiaoming; Mei, Ming; Liu, Jun; Hu, Wei
2015-12-01
Clustered microcalcifications (MCs) in mammograms are an important early sign of breast cancer in women. Their accurate detection is important in computer-aided detection (CADe). In this paper, we integrated the possibilistic fuzzy c-means (PFCM) clustering algorithm and weighted support vector machine (WSVM) for the detection of MC clusters in full-field digital mammograms (FFDM). For each image, suspicious MC regions are extracted with region growing and active contour segmentation. Then geometry and texture features are extracted for each suspicious MC, a mutual information-based supervised criterion is used to select important features, and PFCM is applied to cluster the samples into two clusters. Weights of the samples are calculated based on possibilities and typicality values from the PFCM, and the ground truth labels. A weighted nonlinear SVM is trained. During the test process, when an unknown image is presented, suspicious regions are located with the segmentation step, selected features are extracted, and the suspicious MC regions are classified as containing MC or not by the trained weighted nonlinear SVM. Finally, the MC regions are analyzed with spatial information to locate MC clusters. The proposed method is evaluated using a database of 410 clinical mammograms and compared with a standard unweighted support vector machine (SVM) classifier. The detection performance is evaluated using response receiver operating (ROC) curves and free-response receiver operating characteristic (FROC) curves. The proposed method obtained an area under the ROC curve of 0.8676, while the standard SVM obtained an area of 0.8268 for MC detection. For MC cluster detection, the proposed method obtained a high sensitivity of 92 % with a false-positive rate of 2.3 clusters/image, and it is also better than standard SVM with 4.7 false-positive clusters/image at the same sensitivity.
Classification of high resolution satellite images using spatial constraints-based fuzzy clustering
NASA Astrophysics Data System (ADS)
Singh, Pankaj Pratap; Garg, Rahul Dev
2014-01-01
A spatial constraints-based fuzzy clustering technique is introduced in the paper and the target application is classification of high resolution multispectral satellite images. This fuzzy-C-means (FCM) technique enhances the classification results with the help of a weighted membership function (Wmf). Initially, spatial fuzzy clustering (FC) is used to segment the targeted vegetation areas with the surrounding low vegetation areas, which include the information of spatial constraints (SCs). The performance of the FCM image segmentation is subject to appropriate initialization of Wmf and SC. It is able to evolve directly from the initial segmentation by spatial fuzzy clustering. The controlling parameters in fuzziness of the FCM approach, Wmf and SC, help to estimate the segmented road results, then the Stentiford thinning algorithm is used to estimate the road network from the classified results. Such improvements facilitate FCM method manipulation and lead to segmentation that is more robust. The results confirm its effectiveness for satellite image classification, which extracts useful information in suburban and urban areas. The proposed approach, spatial constraint-based fuzzy clustering with a weighted membership function (SCFCWmf), has been used to extract the information of healthy trees with vegetation and shadows showing elevated features in satellite images. The performance values of quality assessment parameters show a good degree of accuracy for segmented roads using the proposed hybrid SCFCWmf-MO (morphological operations) approach which also occluded nonroad parts.
Efficient clustering aggregation based on data fragments.
Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing
2012-06-01
Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy. PMID:22334025
Using Greedy algorithm: DBSCAN revisited II.
Yue, Shi-hong; Li, Ping; Guo, Ji-dong; Zhou, Shui-geng
2004-11-01
The density-based clustering algorithm presented is different from the classical Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al., 1996), and has the following advantages: first, Greedy algorithm substitutes for R(*)-tree (Bechmann et al., 1990) in DBSCAN to index the clustering space so that the clustering time cost is decreased to great extent and I/O memory load is reduced as well; second, the merging condition to approach to arbitrary-shaped clusters is designed carefully so that a single threshold can distinguish correctly all clusters in a large spatial dataset though some density-skewed clusters live in it. Finally, authors investigate a robotic navigation and test two artificial datasets by the proposed algorithm to verify its effectiveness and efficiency. PMID:15495334
The applicability and effectiveness of cluster analysis
NASA Technical Reports Server (NTRS)
Ingram, D. S.; Actkinson, A. L.
1973-01-01
An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.
Time series clustering analysis of health-promoting behavior
NASA Astrophysics Data System (ADS)
Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng
2013-10-01
Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.
NASA Technical Reports Server (NTRS)
Wang, Lui; Bayer, Steven E.
1991-01-01
Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.
Grid-based DBSCAN Algorithm with Referential Parameters
NASA Astrophysics Data System (ADS)
Darong, Huang; Peng, Wang
A new algorithm GRPDBSCAN (Grid-based DBSCAN Algorithm with Referential Parameters) is proposed in this paper. GRPDBSCAN, which combined the grid partition technique and multi-density based clustering algorithm, has improved its efficiency. On the other hand, because the Eps and Minpts parameters of the DBSCAN algorithm were auto-generated, so they were more objective. Experimental results shown that the new algorithm not only can better differentiate between noises and discovery clusters of arbitrary shapes but also have more robust.
NASA Astrophysics Data System (ADS)
Xu, Changhang; Xie, Jing; Chen, Guoming; Huang, Weiping
2014-11-01
Infrared thermography has been used increasingly as an effective non-destructive technique to detect cracks on metal surface. Due to many factors, infrared thermal image has low definition compared to visible image. The contrasts between cracks and sound areas in different thermal image frames of a specimen vary greatly with the recorded time. An accurate detection can only be obtained by glancing over the whole thermal video, which is a laborious work. Moreover, experience of the operator has a great important influence on the accuracy of detection result. In this paper, an infrared thermal image processing framework based on superpixel algorithm is proposed to accomplish crack detection automatically. Two popular superpixel algorithms are compared and one of them is selected to generate superpixels in this application. Combined features of superpixels were selected from both the raw gray level image and the high-pass filtered image. Fuzzy c-means clustering is used to cluster superpixels in order to segment infrared thermal image. Experimental results show that the proposed framework can recognize cracks on metal surface through infrared thermal image automatically.
A compilation of jet finding algorithms
Flaugher, B.; Meier, K.
1992-12-31
Technical descriptions of jet finding algorithms currently in use in p{anti p} collider experiments (CDF, UA1, UA2), e{sup +}e{sup {minus}} experiments and Monte-Carlo event generators (LUND programs, ISAJET) have been collected. For the hadron collider experiments, the clustering methods fall into two categories: cone algorithms and nearest-neighbor algorithms. In addition, UA2 has employed a combination of both methods for some analysis. While there are clearly differences between the cone and nearest-neighbor algorithms, the authors have found that there are also differences among the cone algorithms in the details of how the centroid of a cone cluster is located and how the E{sub T} and P{sub T} of the jet are defined. The most commonly used jet algorithm in electron-positron experiments is the JADE-type cluster algorithm. Five various incarnations of this approach have been described.
Spontaneous clustering via minimum γ-divergence.
Notsu, Akifumi; Komori, Osamu; Eguchi, Shinto
2014-02-01
We propose a new method for clustering based on local minimization of the gamma-divergence, which we call spontaneous clustering. The greatest advantage of the proposed method is that it automatically detects the number of clusters that adequately reflect the data structure. In contrast, existing methods, such as K-means, fuzzy c-means, or model-based clustering need to prescribe the number of clusters. We detect all the local minimum points of the gamma-divergence, by which we define the cluster centers. A necessary and sufficient condition for the gamma-divergence to have local minimum points is also derived in a simple setting. Applications to simulated and real data are presented to compare the proposed method with existing ones. PMID:24206383
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Toward Parallel Document Clustering
Mogill, Jace A.; Haglin, David J.
2011-09-01
A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.
Clusters, brightest cluster galaxies and galaxy alignments
NASA Astrophysics Data System (ADS)
Chisari, Nora Elisa
This thesis develops two main topics related to the study of the large-scale structure of the Universe. The first one is the intrinsic alignment of galaxies. These are correlations between the shapes and orientations of galaxies that arise mainly as a consequence of tidal forces across a large range of scales. I use the tidal alignment model to predict how the intrinsic alignment of Luminous Red Galaxies could in the future provide constraints on the Baryon Acoustic Oscillation scale, a standard ruler for measuring the expansion of the Universe. I also show that primordial signatures of inflation can translate into a non-Gaussian bias in the correlation of the intrinsic shapes of galaxies, which could be observed with future surveys. The second main topic discussed in this thesis is clusters of galaxies. I use data from the Sloan Digital Sky Survey and a public catalog of galaxy clusters to estimate the alignment of galaxies around groups and clusters of galaxies. The novelty of this work is mainly in the method for estimating the alignment signal. In photometric surveys, the redshift uncertainty is large compared to the size of a cluster, making the distinction between galaxies in the cluster and in the background very challenging. In the method developed here, each galaxy is assigned a posterior probability distribution function of its redshift to separate the alignment component from the gravitational lensing of galaxies in the background. Among the galaxies that make up a cluster, Brightest Cluster Galaxies stand out by their luminosity. I study the connection between these galaxies and other ellipticals to understand the physics of their formation. Finally, I re-develop the Adaptive Matched Filter method for finding clusters in spectroscopic and photometric surveys to include a new treatment of the distances to galaxies. Again, I model the distance to each galaxy using a redshift posterior and propose other modifications to the algorithm that will be of use to upcoming photometric surveys.
A GMBCG galaxy cluster catalog of 55,880 rich clusters from SDSS DR7
Hao, Jiangang; McKay, Timothy A.; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Annis, James; Wechsler, Risa H.; Evrard, August; Siegel, Seth R.; Becker, Matthew; Busha, Michael; /Fermilab /Michigan U. /Chicago U., Astron. Astrophys. Ctr. /UC, Santa Barbara /KICP, Chicago /KIPAC, Menlo Park /SLAC /Caltech /Brookhaven
2010-08-01
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
A GMBCG Galaxy Cluster Catalog of 55,424 Rich Clusters from SDSS DR7
Hao, Jiangang; McKay, Timothy A.; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Annis, James; Wechsler, Risa H.; Evrard, August; Siegel, Seth R.; Becker, Matthew; Busha, Michael; Gerdes, David; Johnston, David E.; Sheldon, Erin; /Brookhaven
2011-08-22
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
Hierarchical clustering in minimum spanning trees
NASA Astrophysics Data System (ADS)
Yu, Meichen; Hillebrand, Arjan; Tewarie, Prejaas; Meier, Jil; van Dijk, Bob; Van Mieghem, Piet; Stam, Cornelis Jan
2015-02-01
The identification of clusters or communities in complex networks is a reappearing problem. The minimum spanning tree (MST), the tree connecting all nodes with minimum total weight, is regarded as an important transport backbone of the original weighted graph. We hypothesize that the clustering of the MST reveals insight in the hierarchical structure of weighted graphs. However, existing theories and algorithms have difficulties to define and identify clusters in trees. Here, we first define clustering in trees and then propose a tree agglomerative hierarchical clustering (TAHC) method for the detection of clusters in MSTs. We then demonstrate that the TAHC method can detect clusters in artificial trees, and also in MSTs of weighted social networks, for which the clusters are in agreement with the previously reported clusters of the original weighted networks. Our results therefore not only indicate that clusters can be found in MSTs, but also that the MSTs contain information about the underlying clusters of the original weighted network.
GPU-based Multilevel Clustering.
Chiosa, Iurie; Kolb, Andreas
2010-04-01
The processing power of parallel co-processors like the Graphics Processing Unit (GPU) are dramatically increasing. However, up until now only a few approaches have been presented to utilize this kind of hardware for mesh clustering purposes. In this paper we introduce a Multilevel clustering technique designed as a parallel algorithm and solely implemented on the GPU. Our formulation uses the spatial coherence present in the cluster optimization and hierarchical cluster merging to significantly reduce the number of comparisons in both parts . Our approach provides a fast, high quality and complete clustering analysis. Furthermore, based on the original concept we present a generalization of the method to data clustering. All advantages of the meshbased techniques smoothly carry over to the generalized clustering approach. Additionally, this approach solves the problem of the missing topological information inherent to general data clustering and leads to a Local Neighbors k-means algorithm. We evaluate both techniques by applying them to Centroidal Voronoi Diagram (CVD) based clustering. Compared to classical approaches, our techniques generate results with at least the same clustering quality. Our technique proves to scale very well, currently being limited only by the available amount of graphics memory. PMID:20421676
A fully automated algorithm under modified FCM framework for improved brain MR image segmentation.
Sikka, Karan; Sinha, Nitesh; Singh, Pankaj K; Mishra, Amit K
2009-09-01
Automated brain magnetic resonance image (MRI) segmentation is a complex problem especially if accompanied by quality depreciating factors such as intensity inhomogeneity and noise. This article presents a new algorithm for automated segmentation of both normal and diseased brain MRI. An entropy driven homomorphic filtering technique has been employed in this work to remove the bias field. The initial cluster centers are estimated using a proposed algorithm called histogram-based local peak merger using adaptive window. Subsequently, a modified fuzzy c-mean (MFCM) technique using the neighborhood pixel considerations is applied. Finally, a new technique called neighborhood-based membership ambiguity correction (NMAC) has been used for smoothing the boundaries between different tissue classes as well as to remove small pixel level noise, which appear as misclassified pixels even after the MFCM approach. NMAC leads to much sharper boundaries between tissues and, hence, has been found to be highly effective in prominently estimating the tissue and tumor areas in a brain MR scan. The algorithm has been validated against MFCM and FMRIB software library using MRI scans from BrainWeb. Superior results to those achieved with MFCM technique have been observed along with the collateral advantages of fully automatic segmentation, faster computation and faster convergence of the objective function. PMID:19395212
NASA Technical Reports Server (NTRS)
Abrams, D.; Williams, C.
1999-01-01
This thesis describes several new quantum algorithms. These include a polynomial time algorithm that uses a quantum fast Fourier transform to find eigenvalues and eigenvectors of a Hamiltonian operator, and that can be applied in cases for which all know classical algorithms require exponential time.
Clustering analysis of moving target signatures
NASA Astrophysics Data System (ADS)
Martone, Anthony; Ranney, Kenneth; Innocenti, Roberto
2010-04-01
Previously, we developed a moving target indication (MTI) processing approach to detect and track slow-moving targets inside buildings, which successfully detected moving targets (MTs) from data collected by a low-frequency, ultra-wideband radar. Our MTI algorithms include change detection, automatic target detection (ATD), clustering, and tracking. The MTI algorithms can be implemented in a real-time or near-real-time system; however, a person-in-the-loop is needed to select input parameters for the clustering algorithm. Specifically, the number of clusters to input into the cluster algorithm is unknown and requires manual selection. A critical need exists to automate all aspects of the MTI processing formulation. In this paper, we investigate two techniques that automatically determine the number of clusters: the adaptive knee-point (KP) algorithm and the recursive pixel finding (RPF) algorithm. The KP algorithm is based on a well-known heuristic approach for determining the number of clusters. The RPF algorithm is analogous to the image processing, pixel labeling procedure. Both algorithms are used to analyze the false alarm and detection rates of three operational scenarios of personnel walking inside wood and cinderblock buildings.
Absolute classification with unsupervised clustering
NASA Technical Reports Server (NTRS)
Jeon, Byeungwoo; Landgrebe, D. A.
1992-01-01
An absolute classification algorithm is proposed in which the class definition through training samples or otherwise is required only for a particular class of interest. The absolute classification is considered as a problem of unsupervised clustering when one cluster is known initially. The definitions and statistics of the other classes are automatically developed through the weighted unsupervised clustering procedure, which is developed to keep the cluster corresponding to the class of interest from losing its identity as the class of interest. Once all the classes are developed, a conventional relative classifier such as the maximum-likelihood classifier is used in the classification.
Extracting semantic object based on color feature using ISODAT algorithm
NASA Astrophysics Data System (ADS)
Yu, Weiyu; Yu, Yinglin; Xie, Shengli
2005-10-01
In this paper we present an approach to extract semantic object based on color feature using ISODATA clustering algorithm. First, we translate RGB color space into L*a*b* color space. Second, we use the ISODATA algorithm to solve clustering problem. In the end, we extract semantic object in terms of color information. Experimental results show that the color clustering give superior results in increases in cluster compactness.
Semi-supervised clustering methods
Bair, Eric
2013-01-01
Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830
Discriminative clustering via extreme learning machine.
Huang, Gao; Liu, Tianchi; Yang, Yan; Lin, Zhiping; Song, Shiji; Wu, Cheng
2015-10-01
Discriminative clustering is an unsupervised learning framework which introduces the discriminative learning rule of supervised classification into clustering. The underlying assumption is that a good partition (clustering) of the data should yield high discrimination, namely, the partitioned data can be easily classified by some classification algorithms. In this paper, we propose three discriminative clustering approaches based on Extreme Learning Machine (ELM). The first algorithm iteratively trains weighted ELM (W-ELM) classifier to gradually maximize the data discrimination. The second and third methods are both built on Fisher's Linear Discriminant Analysis (LDA); but one approach adopts alternative optimization, while the other leverages kernel k-means. We show that the proposed algorithms can be easily implemented, and yield competitive clustering accuracy on real world data sets compared to state-of-the-art clustering methods. PMID:26143036
Clustering gene expression data using a diffraction‐inspired framework
2012-01-01
Background The recent developments in microarray technology has allowed for the simultaneous measurement of gene expression levels. The large amount of captured data challenges conventional statistical tools for analysing and finding inherent correlations between genes and samples. The unsupervised clustering approach is often used, resulting in the development of a wide variety of algorithms. Typical clustering algorithms require selecting certain parameters to operate, for instance the number of expected clusters, as well as defining a similarity measure to quantify the distance between data points. The diffraction‐based clustering algorithm however is designed to overcome this necessity for user‐defined parameters, as it is able to automatically search the data for any underlying structure. Methods The diffraction‐based clustering algorithm presented in this paper is tested using five well‐known expression datasets pertaining to cancerous tissue samples. The clustering results are then compared to those results obtained from conventional algorithms such as the k‐means, fuzzy c‐means, self‐organising map, hierarchical clustering algorithm, Gaussian mixture model and density‐based spatial clustering of applications with noise (DBSCAN). The performance of each algorithm is measured using an average external criterion and an average validity index. Results The diffraction‐based clustering algorithm is shown to be independent of the number of clusters as the algorithm searches the feature space and requires no form of parameter selection. The results show that the diffraction‐based clustering algorithm performs significantly better on the real biological datasets compared to the other existing algorithms. Conclusion The results of the diffraction‐based clustering algorithm presented in this paper suggest that the method can provide researchers with a new tool for successfully analysing microarray data. PMID:23164195
A Fast Implementation of the ISOCLUS Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2003-01-01
Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor modification to the ISOCLUS specification. We provide empirical evidence, on both synthetic and Landsat image data sets, that our algorithm's performance is essentially the same as that of ISOCLUS, but with significantly lower running times. We show that our algorithm runs from 3 to 30 times faster than a straightforward implementation of ISOCLUS. Our adaptation of the filtering algorithm involves the efficient computation of a number of cluster statistics that are needed for ISOCLUS, but not for k-means.
Sanfilippo, Antonio P.; Calapristi, Augustin J.; Crow, Vernon L.; Hetzler, Elizabeth G.; Turner, Alan E.
2004-05-26
We present an approach to the disambiguation of cluster labels that capitalizes on the notion of semantic similarity to assign WordNet senses to cluster labels. The approach provides interesting insights on how document clustering can provide the basis for developing a novel approach to word sense disambiguation.
NASA Astrophysics Data System (ADS)
Katgert, P.; Murdin, P.
2000-11-01
Abell clusters are the most conspicuous groupings of galaxies identified by George Abell on the plates of the first photographic survey made with the SCHMIDT TELESCOPE at Mount Palomar in the 1950s. Sometimes, the term Abell clusters is used as a synonym of nearby, optically selected galaxy clusters....
Galaxy cluster detection in the Next Generation Virgo Cluster Survey (NGVS)
NASA Astrophysics Data System (ADS)
Licitra, R.; Mei, S.; Raichoor, A.; Erben, T.; Hildebrandt, H.; Ilbert, O.; Huertas-Company, M.; Van Waerbeke, L.; Côté, P.; Cuillandre, J.-C.; Duc, P. A.; Ferrarese, L.; Gwyn, S. D. J.; Lancon, A.; Munoz, R.; Puzia, T.
2014-12-01
We describe our cluster detection algorithm Red-GOLD, based on the search of red-sequence galaxy overdensities. In this work, the algorithm is optimized to search for clusters up to z˜1 using optical data. We applied this algorithm to semi-analytic simulations and we found that for haloes more massive than M≥ 10^{14} M_{⊙} the completeness is 80% and the purity is ˜81%, up to redshift z=1.
Segmentation of color images using genetic algorithm with image histogram
NASA Astrophysics Data System (ADS)
Sneha Latha, P.; Kumar, Pawan; Kahu, Samruddhi; Bhurchandi, Kishor M.
2015-02-01
This paper proposes a family of color image segmentation algorithms using genetic approach and color similarity threshold in terns of Just noticeable difference. Instead of segmenting and then optimizing, the proposed technique directly uses GA for optimized segmentation of color images. Application of GA on larger size color images is computationally heavy so they are applied on 4D-color image histogram table. The performance of the proposed algorithms is benchmarked on BSD dataset with color histogram based segmentation and Fuzzy C-means Algorithm using Probabilistic Rand Index (PRI). The proposed algorithms yield better analytical and visual results.
Image segmentation using fuzzy LVQ clustering networks
NASA Technical Reports Server (NTRS)
Tsao, Eric Chen-Kuo; Bezdek, James C.; Pal, Nikhil R.
1992-01-01
In this note we formulate image segmentation as a clustering problem. Feature vectors extracted from a raw image are clustered into subregions, thereby segmenting the image. A fuzzy generalization of a Kohonen learning vector quantization (LVQ) which integrates the Fuzzy c-Means (FCM) model with the learning rate and updating strategies of the LVQ is used for this task. This network, which segments images in an unsupervised manner, is thus related to the FCM optimization problem. Numerical examples on photographic and magnetic resonance images are given to illustrate this approach to image segmentation.
Clustering of financial time series
NASA Astrophysics Data System (ADS)
D'Urso, Pierpaolo; Cappelli, Carmela; Di Lallo, Dario; Massari, Riccardo
2013-05-01
This paper addresses the topic of classifying financial time series in a fuzzy framework proposing two fuzzy clustering models both based on GARCH models. In general clustering of financial time series, due to their peculiar features, needs the definition of suitable distance measures. At this aim, the first fuzzy clustering model exploits the autoregressive representation of GARCH models and employs, in the framework of a partitioning around medoids algorithm, the classical autoregressive metric. The second fuzzy clustering model, also based on partitioning around medoids algorithm, uses the Caiado distance, a Mahalanobis-like distance, based on estimated GARCH parameters and covariances that takes into account the information about the volatility structure of time series. In order to illustrate the merits of the proposed fuzzy approaches an application to the problem of classifying 29 time series of Euro exchange rates against international currencies is presented and discussed, also comparing the fuzzy models with their crisp version.
Detecting alternative graph clusterings.
Mandala, Supreet; Kumara, Soundar; Yao, Tao
2012-07-01
The problem of graph clustering or community detection has enjoyed a lot of attention in complex networks literature. A quality function, modularity, quantifies the strength of clustering and on maximization yields sensible partitions. However, in most real world networks, there are an exponentially large number of near-optimal partitions with some being very different from each other. Therefore, picking an optimal clustering among the alternatives does not provide complete information about network topology. To tackle this problem, we propose a graph perturbation scheme which can be used to identify an ensemble of near-optimal and diverse clusterings. We establish analytical properties of modularity function under the perturbation which ensures diversity. Our approach is algorithm independent and therefore can leverage any of the existing modularity maximizing algorithms. We numerically show that our methodology can systematically identify very different partitions on several existing data sets. The knowledge of diverse partitions sheds more light into the topological organization and helps gain a more complete understanding of the underlying complex network. PMID:23005495
Detecting alternative graph clusterings
NASA Astrophysics Data System (ADS)
Mandala, Supreet; Kumara, Soundar; Yao, Tao
2012-07-01
The problem of graph clustering or community detection has enjoyed a lot of attention in complex networks literature. A quality function, modularity, quantifies the strength of clustering and on maximization yields sensible partitions. However, in most real world networks, there are an exponentially large number of near-optimal partitions with some being very different from each other. Therefore, picking an optimal clustering among the alternatives does not provide complete information about network topology. To tackle this problem, we propose a graph perturbation scheme which can be used to identify an ensemble of near-optimal and diverse clusterings. We establish analytical properties of modularity function under the perturbation which ensures diversity. Our approach is algorithm independent and therefore can leverage any of the existing modularity maximizing algorithms. We numerically show that our methodology can systematically identify very different partitions on several existing data sets. The knowledge of diverse partitions sheds more light into the topological organization and helps gain a more complete understanding of the underlying complex network.
NASA Astrophysics Data System (ADS)
Abrams, Daniel S.
This thesis describes several new quantum algorithms. These include a polynomial time algorithm that uses a quantum fast Fourier transform to find eigenvalues and eigenvectors of a Hamiltonian operator, and that can be applied in cases (commonly found in ab initio physics and chemistry problems) for which all known classical algorithms require exponential time. Fast algorithms for simulating many body Fermi systems are also provided in both first and second quantized descriptions. An efficient quantum algorithm for anti-symmetrization is given as well as a detailed discussion of a simulation of the Hubbard model. In addition, quantum algorithms that calculate numerical integrals and various characteristics of stochastic processes are described. Two techniques are given, both of which obtain an exponential speed increase in comparison to the fastest known classical deterministic algorithms and a quadratic speed increase in comparison to classical Monte Carlo (probabilistic) methods. I derive a simpler and slightly faster version of Grover's mean algorithm, show how to apply quantum counting to the problem, develop some variations of these algorithms, and show how both (apparently distinct) approaches can be understood from the same unified framework. Finally, the relationship between physics and computation is explored in some more depth, and it is shown that computational complexity theory depends very sensitively on physical laws. In particular, it is shown that nonlinear quantum mechanics allows for the polynomial time solution of NP-complete and #P oracle problems. Using the Weinberg model as a simple example, the explicit construction of the necessary gates is derived from the underlying physics. Nonlinear quantum algorithms are also presented using Polchinski type nonlinearities which do not allow for superluminal communication. (Copies available exclusively from MIT Libraries, Rm. 14- 0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)
Hierarchical clustering using mutual information
NASA Astrophysics Data System (ADS)
Kraskov, A.; Stögbauer, H.; Andrzejak, R. G.; Grassberger, P.
2005-04-01
We present a conceptually simple method for hierarchical clustering of data called mutual information clustering (MIC) algorithm. It uses mutual information (MI) as a similarity measure and exploits its grouping property: The MI between three objects X, Y, and Z is equal to the sum of the MI between X and Y, plus the MI between Z and the combined object (XY). We use this both in the Shannon (probabilistic) version of information theory and in the Kolmogorov (algorithmic) version. We apply our method to the construction of phylogenetic trees from mitochondrial DNA sequences and to the output of independent components analysis (ICA) as illustrated with the ECG of a pregnant woman.
Analyzing geographic clustered response
Merrill, D.W.; Selvin, S.; Mohr, M.S.
1991-08-01
In the study of geographic disease clusters, an alternative to traditional methods based on rates is to analyze case locations on a transformed map in which population density is everywhere equal. Although the analyst's task is thereby simplified, the specification of the density equalizing map projection (DEMP) itself is not simple and continues to be the subject of considerable research. Here a new DEMP algorithm is described, which avoids some of the difficulties of earlier approaches. The new algorithm (a) avoids illegal overlapping of transformed polygons; (b) finds the unique solution that minimizes map distortion; (c) provides constant magnification over each map polygon; (d) defines a continuous transformation over the entire map domain; (e) defines an inverse transformation; (f) can accept optional constraints such as fixed boundaries; and (g) can use commercially supported minimization software. Work is continuing to improve computing efficiency and improve the algorithm. 21 refs., 15 figs., 2 tabs.
Radio sources in galaxy clusters
NASA Astrophysics Data System (ADS)
Gralla, Megan B.
I present a statistical analysis of the radio source population in galaxy clusters by matching radio sources from two large-area surveys with optically-selected galaxy clusters spanning a wide range of redshift (˜0.3 to ˜1) from the Red-Sequence Cluster Surveys (RCS). The first RCS (RCS1) is well-characterized within a cosmological context, with richness measurements calibrated over a wide redshift range. I have focused o