An improved fuzzy c-means clustering algorithm based on shadowed sets and PSO.
Zhang, Jian; Shen, Ling
2014-01-01
To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect. PMID:25477953
Wang, Deguang; Han, Baochang; Huang, Ming
Computer forensics is the technology of applying computer technology to access, investigate and analysis the evidence of computer crime. It mainly include the process of determine and obtain digital evidence, analyze and take data, file and submit result. And the data analysis is the key link of computer forensics. As the complexity of real data and the characteristics of fuzzy, evidence analysis has been difficult to obtain the desired results. This paper applies fuzzy c-means clustering algorithm based on particle swarm optimization (FCMP) in computer forensics, and it can be more satisfactory results.
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
Abdul-Nasir, Aimi Salihah; Mashor, Mohd Yusoff; Halim, Nurul Hazwani Abd; Mohamed, Zeehaida
2015-05-01
Malaria is a life-threatening parasitic infectious disease that corresponds for nearly one million deaths each year. Due to the requirement of prompt and accurate diagnosis of malaria, the current study has proposed an unsupervised pixel segmentation based on clustering algorithm in order to obtain the fully segmented red blood cells (RBCs) infected with malaria parasites based on the thin blood smear images of P. vivax species. In order to obtain the segmented infected cell, the malaria images are first enhanced by using modified global contrast stretching technique. Then, an unsupervised segmentation technique based on clustering algorithm has been applied on the intensity component of malaria image in order to segment the infected cell from its blood cells background. In this study, cascaded moving k-means (MKM) and fuzzy c-means (FCM) clustering algorithms has been proposed for malaria slide image segmentation. After that, median filter algorithm has been applied to smooth the image as well as to remove any unwanted regions such as small background pixels from the image. Finally, seeded region growing area extraction algorithm has been applied in order to remove large unwanted regions that are still appeared on the image due to their size in which cannot be cleaned by using median filter. The effectiveness of the proposed cascaded MKM and FCM clustering algorithms has been analyzed qualitatively and quantitatively by comparing the proposed cascaded clustering algorithm with MKM and FCM clustering algorithms. Overall, the results indicate that segmentation using the proposed cascaded clustering algorithm has produced the best segmentation performances by achieving acceptable sensitivity as well as high specificity and accuracy values compared to the segmentation results provided by MKM and FCM algorithms.
Yu, Xuelian; Chen, Qian; Gu, Guohua; Qian, Weixian; Xu, Mengxi
2014-11-01
The integration between polarization and intensity images possessing complementary and discriminative information has emerged as a new and important research area. On the basis of the consideration that the resulting image has different clarity and layering requirement for the target and background, we propose a novel fusion method based on non-subsampled Contourlet transform (NSCT) and fuzzy C-means (FCM) segmentation for IR polarization and light intensity images. First, the polarization characteristic image is derived from fusion of the degree of polarization (DOP) and the angle of polarization (AOP) images using local standard variation and abrupt change degree (ACD) combined criteria. Then, the polarization characteristic image is segmented with FCM algorithm. Meanwhile, the two source images are respectively decomposed by NSCT. The regional energy-weighted and similarity measure are adopted to combine the low-frequency sub-band coefficients of the object. The high-frequency sub-band coefficients of the object boundaries are integrated through the maximum selection rule. In addition, the high-frequency sub-band coefficients of internal objects are integrated by utilizing local variation, matching measure and region feature weighting. The weighted average and maximum rules are employed independently in fusing the low-frequency and high-frequency components of the background. Finally, an inverse NSCT operation is accomplished and the final fused image is obtained. The experimental results illustrate that the proposed IR polarization image fusion algorithm can yield an improved performance in terms of the contrast between artificial target and cluttered background and a more detailed representation of the depicted scene.
Ayvaz, M. Tamer
2007-11-01
This study proposes an inverse solution algorithm through which both the aquifer parameters and the zone structure of these parameters can be determined based on a given set of observations on piezometric heads. In the zone structure identification problem fuzzy c-means ( FCM) clustering method is used. The association of the zone structure with the transmissivity distribution is accomplished through an optimization model. The meta-heuristic harmony search ( HS) algorithm, which is conceptualized using the musical process of searching for a perfect state of harmony, is used as an optimization technique. The optimum parameter zone structure is identified based on three criteria which are the residual error, parameter uncertainty, and structure discrimination. A numerical example given in the literature is solved to demonstrate the performance of the proposed algorithm. Also, a sensitivity analysis is performed to test the performance of the HS algorithm for different sets of solution parameters. Results indicate that the proposed solution algorithm is an effective way in the simultaneous identification of aquifer parameters and their corresponding zone structures.
Efficient inhomogeneity compensation using fuzzy c-means clustering models.
Szilágyi, László; Szilágyi, Sándor M; Benyó, Balázs
2012-10-01
Intensity inhomogeneity or intensity non-uniformity (INU) is an undesired phenomenon that represents the main obstacle for magnetic resonance (MR) image segmentation and registration methods. Various techniques have been proposed to eliminate or compensate the INU, most of which are embedded into classification or clustering algorithms, they generally have difficulties when INU reaches high amplitudes and usually suffer from high computational load. This study reformulates the design of c-means clustering based INU compensation techniques by identifying and separating those globally working computationally costly operations that can be applied to gray intensity levels instead of individual pixels. The theoretical assumptions are demonstrated using the fuzzy c-means algorithm, but the proposed modification is compatible with a various range of c-means clustering based INU compensation and MR image segmentation algorithms. Experiments carried out using synthetic phantoms and real MR images indicate that the proposed approach produces practically the same segmentation accuracy as the conventional formulation, but 20-30 times faster. PMID:22405524
Schröter, Ingmar; Paasche, Hendik; Dietrich, Peter; Wollschläger, Ute
2014-05-01
Soil moisture is a key variable of the hydrological cycle. For example, it controls partitioning of rainfall into a runoff and an infiltration component and modulating physical, chemical and biological processes within the soil. For a better understanding of these processes, knowledge about the spatio-temporal distribution of soil moisture is indispensable. For the field to the small catchment scale with survey areas up to a few square kilometres, there are numerous new and innovative ground-based and remote sensing technologies available which have great potential to provide temporal information about soil moisture patterns. The aim of this work is to design an optimal soil moisture monitoring program for a low-mountain catchment in central Germany. In a first step, the fuzzy c-means clustering technique (Paasche et al., 2006) was used to identify structure-relevant patterns in a set of different terrain attributes derived from a DEM. Based on these patterns optimal measurement locations were identified to conduct in-situ soil moisture measurements. To consider different wetting and drying states in the catchment, several TDR measurement campaigns were conducted from April to October 2013. The TDR measurements have been integrated with the structure-relevant patterns obtained by the fuzzy cluster analysis to regionally predict soil moisture. In this study, we outline the conceptual framework of this integrative approach and present first results from field measurements. The results of the project are expected to improve the monitoring and understanding of small catchment-scale hydrological processes and to contribute to a better representation of soil moisture dynamics in physically-based, hydrological models operating at the field to the small catchment scale. Reference: Paasche, H., J. Tronicke, K. Holliger, A.G. Green, and H. Maurer (2006): Integration of diverse physical-property models: Subsurface zonation and petrophysical parameter estimation based on fuzzy c-means cluster analyses. Geophysics 71(3), H33-H44, doi:10.1190/1.2192927.
Design of Fuzzy Logic Controllers by Fuzzy c-Means Clustering
Watcharachai Wiriyasuttiwong; Kajornsak Kantapanit
In this paper, the use of Fuzzy c-means clustering algorithm in the design of membership functions and fuzzy rules of a fuzry logic controller.are described. In the design procedure, an auto- tuning PID controller was used to operate an example plant which is a model of the air-conditioning system, and the plant operating data were collected.The fuzry c-partition of the
Mandarin Digital Speech Recognition Based on a Chaotic Neural Network and Fuzzy C-means Clustering
Freeman, Walter J.
Mandarin Digital Speech Recognition Based on a Chaotic Neural Network and Fuzzy C-means Clustering model can perform digital speech recognition efficiently and the fuzzy c-means clustering has better performance than the hard k-means clustering.
Sandhir, Radha Pyari; Nayak, Tapan; 10.1016/j.nima.2012.04.023
2012-01-01
In high energy physics experiments, calorimetric data reconstruction requires a suitable clustering technique in order to obtain accurate information about the shower characteristics such as position of the shower and energy deposition. Fuzzy clustering techniques have high potential in this regard, as they assign data points to more than one cluster,thereby acting as a tool to distinguish between overlapping clusters. Fuzzy c-means (FCM) is one such clustering technique that can be applied to calorimetric data reconstruction. However, it has a drawback: it cannot easily identify and distinguish clusters that are not uniformly spread. A version of the FCM algorithm called dynamic fuzzy c-means (dFCM) allows clusters to be generated and eliminated as required, with the ability to resolve non-uniformly distributed clusters. Both the FCM and dFCM algorithms have been studied and successfully applied to simulated data of a sampling tungsten-silicon calorimeter. It is seen that the FCM technique works reasonably w...
A Wavelet Relational Fuzzy C-Means Algorithm for 2D Gel Image Segmentation
Rashwan, Shaheera; Faheem, Mohamed Talaat; Sarhan, Amany; Youssef, Bayumy A. B.
2013-01-01
One of the most famous algorithms that appeared in the area of image segmentation is the Fuzzy C-Means (FCM) algorithm. This algorithm has been used in many applications such as data analysis, pattern recognition, and image segmentation. It has the advantages of producing high quality segmentation compared to the other available algorithms. Many modifications have been made to the algorithm to improve its segmentation quality. The proposed segmentation algorithm in this paper is based on the Fuzzy C-Means algorithm adding the relational fuzzy notion and the wavelet transform to it so as to enhance its performance especially in the area of 2D gel images. Both proposed modifications aim to minimize the oversegmentation error incurred by previous algorithms. The experimental results of comparing both the Fuzzy C-Means (FCM) and the Wavelet Fuzzy C-Means (WFCM) to the proposed algorithm on real 2D gel images acquired from human leukemias, HL-60 cell lines, and fetal alcohol syndrome (FAS) demonstrate the improvement achieved by the proposed algorithm in overcoming the segmentation error. In addition, we investigate the effect of denoising on the three algorithms. This investigation proves that denoising the 2D gel image before segmentation can improve (in most of the cases) the quality of the segmentation. PMID:24174990
A wavelet relational fuzzy C-means algorithm for 2D gel image segmentation.
Rashwan, Shaheera; Faheem, Mohamed Talaat; Sarhan, Amany; Youssef, Bayumy A B
2013-01-01
One of the most famous algorithms that appeared in the area of image segmentation is the Fuzzy C-Means (FCM) algorithm. This algorithm has been used in many applications such as data analysis, pattern recognition, and image segmentation. It has the advantages of producing high quality segmentation compared to the other available algorithms. Many modifications have been made to the algorithm to improve its segmentation quality. The proposed segmentation algorithm in this paper is based on the Fuzzy C-Means algorithm adding the relational fuzzy notion and the wavelet transform to it so as to enhance its performance especially in the area of 2D gel images. Both proposed modifications aim to minimize the oversegmentation error incurred by previous algorithms. The experimental results of comparing both the Fuzzy C-Means (FCM) and the Wavelet Fuzzy C-Means (WFCM) to the proposed algorithm on real 2D gel images acquired from human leukemias, HL-60 cell lines, and fetal alcohol syndrome (FAS) demonstrate the improvement achieved by the proposed algorithm in overcoming the segmentation error. In addition, we investigate the effect of denoising on the three algorithms. This investigation proves that denoising the 2D gel image before segmentation can improve (in most of the cases) the quality of the segmentation. PMID:24174990
A Modified Fuzzy C-Means Algorithm For Collaborative LMAM and School of Mathematical Sciences,
Li, Tiejun
to the Netflix Prize data set and acquire comparable accuracy with that of MF.
Xu, Chao; Zhang, Pei-lin; Ren, Guo-quan; Wu, Ding-hai
2010-08-01
A Parzen window based semi-supervised fuzzy c-means (PSFCM) clustering algorithm was presented. The initial clustering centers of fuzzy c-means (FCM) were determined with training samples. The membership iteration of FCM was redefined after the membership degrees of testing samples relatively to each state were calculated using Parzen window. Two typical faults of gear box were simulated through the gear box bed in order to acquire the lubricant samples. Concentration of Fe, Si and B, which were the representative elements, was selected as the three-dimensional feature vectors to be analyzed with FCM and PSFCM clustering methods. The clustering results were that the correct ratio of FCM was 48.9%, while that of PSFCM was 97.4% because of integrating with supervised information. Experimental results also indicated that it can reduce the dependence of the experience and lots of faults data to introduce PSFCM into oil atomic spectrometric analysis. It was of great help in improving the wear faults diagnosis ratio. PMID:20939333
A modified fuzzy C-means algorithm for bias field estimation and segmentation of MRI data.
Ahmed, Mohamed N; Yamany, Sameh M; Mohamed, Nevin; Farag, Aly A; Moriarty, Thomas
2002-03-01
In this paper, we present a novel algorithm for fuzzy segmentation of magnetic resonance imaging (MRI) data and estimation of intensity inhomogeneities using fuzzy logic. MRI intensity inhomogeneities can be attributed to imperfections in the radio-frequency coils or to problems associated with the acquisition sequences. The result is a slowly varying shading artifact over the image that can produce errors with conventional intensity-based classification. Our algorithm is formulated by modifying the objective function of the standard fuzzy c-means (FCM) algorithm to compensate for such inhomogeneities and to allow the labeling of a pixel (voxel) to be influenced by the labels in its immediate neighborhood. The neighborhood effect acts as a regularizer and biases the solution toward piecewise-homogeneous labelings. Such a regularization is useful in segmenting scans corrupted by salt and pepper noise. Experimental results on both synthetic images and MR data are given to demonstrate the effectiveness and efficiency of the proposed algorithm. PMID:11989844
Remote sensing ocean data analyses using fuzzy C-Means clustering
Xu, Suqin; Chen, Jie; Gao, Guoxing
2009-10-01
With the deep understanding and exploitation of the wide Ocean, There are more and more fine instrument installed or loaded on measuring ships or other marines. The high costs and complexity of corrosion place ever-increasing demands on the analyses of surrounding ocean environment. In this paper, the fuzzy C-Means clustering is used to analyze the surrounding ocean environment with remote sensing data. The studied ocean area is considered as a two dimensional gird or an image, and the fuzzy C-Means clustering technique is used to reveal the underlying relationship of the elements and segment the interrelated ocean in regions with similar spectral properties in the influence of instrument corrosion. The influence of the environment elements in instrument corrosion is studied and a priori spatial information is added to improving the segmentation result. The fitness function containing neighbor information was set up based on the gray information and the neighbor relations between the pixels. By making use of the global searching ability of the predator-prey particle swarm optimization, the optimal cluster center could be obtained by iterative optimization and the segmentation could be accomplished. The calculation results show that the segmentation is accurate and reasonable. This ocean environment analysis fruit has used in real application and has proved to be valuable in ship instrument corrosion monitoring and the guide of other ocean activity.
Segmentation of pomegranate MR images using spatial fuzzy c-means (SFCM) algorithm
Moradi, Ghobad; Shamsi, Mousa; Sedaaghi, M. H.; Alsharif, M. R.
2011-10-01
Segmentation is one of the fundamental issues of image processing and machine vision. It plays a prominent role in a variety of image processing applications. In this paper, one of the most important applications of image processing in MRI segmentation of pomegranate is explored. Pomegranate is a fruit with pharmacological properties such as being anti-viral and anti-cancer. Having a high quality product in hand would be critical factor in its marketing. The internal quality of the product is comprehensively important in the sorting process. The determination of qualitative features cannot be manually made. Therefore, the segmentation of the internal structures of the fruit needs to be performed as accurately as possible in presence of noise. Fuzzy c-means (FCM) algorithm is noise-sensitive and pixels with noise are classified inversely. As a solution, in this paper, the spatial FCM algorithm in pomegranate MR images' segmentation is proposed. The algorithm is performed with setting the spatial neighborhood information in FCM and modification of fuzzy membership function for each class. The segmentation algorithm results on the original and the corrupted Pomegranate MR images by Gaussian, Salt Pepper and Speckle noises show that the SFCM algorithm operates much more significantly than FCM algorithm. Also, after diverse steps of qualitative and quantitative analysis, we have concluded that the SFCM algorithm with 5×5 window size is better than the other windows.
Kesiko?lu, M. H.; Atasever, Ü. H.; Özkan, C.
2013-10-01
Change detection analyze means that according to observations made in different times, the process of defining the change detection occurring in nature or in the state of any objects or the ability of defining the quantity of temporal effects by using multitemporal data sets. There are lots of change detection techniques met in literature. It is possible to group these techniques under two main topics as supervised and unsupervised change detection. In this study, the aim is to define the land cover changes occurring in specific area of Kayseri with unsupervised change detection techniques by using Landsat satellite images belonging to different years which are obtained by the technique of remote sensing. While that process is being made, image differencing method is going to be applied to the images by following the procedure of image enhancement. After that, the method of Principal Component Analysis is going to be applied to the difference image obtained. To determine the areas that have and don't have changes, the image is grouped as two parts by Fuzzy C-Means Clustering method. For achieving these processes, firstly the process of image to image registration is completed. As a result of this, the images are being referred to each other. After that, gray scale difference image obtained is partitioned into 3 × 3 nonoverlapping blocks. With the method of principal component analysis, eigenvector space is gained and from here, principal components are reached. Finally, feature vector space consisting principal component is partitioned into two clusters using Fuzzy C-Means Clustering and after that change detection process has been done.
BP network identification technology of infrared polarization based on fuzzy c-means clustering
NASA Astrophysics Data System (ADS)
Zeng, Haifang; Gu, Guohua; He, Weiji; Chen, Qian; Yang, Wei
2011-08-01
Infrared detection system is frequently employed on surveillance operations and reconnaissance mission to detect particular targets of interest in both civilian and military communities. By incorporating the polarization of light as supplementary information, the target discrimination performance could be enhanced. So this paper proposed an infrared target identification method which is based on fuzzy theory and neural network with polarization properties of targets. The paper utilizes polarization degree and light intensity to advance the unsupervised KFCM (kernel fuzzy C-Means) clustering method. And establish different material pol1arization properties database. In the built network, the system can feedback output corresponding material types of probability distribution toward any input polarized degree such as 10° 15°, 20°, 25°, 30°. KFCM, which has stronger robustness and accuracy than FCM, introduces kernel idea and gives the noise points and invalid value different but intuitively reasonable weights. Because of differences in characterization of material properties, there will be some conflicts in classification results. And D - S evidence theory was used in the combination of the polarization and intensity information. Related results show KFCM clustering precision and operation rate are higher than that of the FCM clustering method. The artificial neural network method realizes material identification, which reasonable solved the problems of complexity in environmental information of infrared polarization, and improperness of background knowledge and inference rule. This method of polarization identification is fast in speed, good in self-adaption and high in resolution.
Pattern Classification of Typhoon Tracks Using the Fuzzy c-Means Clustering Method HYEONG-SEOG KIM
Hawai'i at Manoa, University of
tracks. FCM is suitable for the data where cluster boundaries are ambiguous, such as a group of TC tracks tracks into the FCM, that is, the interpolation of all tracks into equal number of segments. Four
Nasseri, Aynur; Jafar Mohammadzadeh, Mohammad; Hashem Tabatabaei Raeisi, S.
2015-04-01
This paper deals with the application of the ant colony algorithm (AC) to a seismic dataset from Dezful Embayment in the southwest region of Iran. The objective of the approach is to generate an accurate representation of faults and discontinuities to assist in pertinent matters such as well planning and field optimization. The AC analyzed all spatial discontinuities in the seismic attributes from which features were extracted. True fault information from the attributes was detected by many artificial ants, whereas noise and the remains of the reflectors were eliminated. Furthermore, the fracture enhancement procedure was conducted by three steps on seismic data of the area. In the first step several attributes such as chaos, variance/coherence and dip deviation were taken into account; the resulting maps indicate high-resolution contrast for the variance attribute. Subsequently, the enhancement of spatial discontinuities was performed and finally elimination of the noise and remains of non-faulting events was carried out by simulating the behavior of ant colonies. After considering stepwise attribute optimization, focusing on chaos and variance in particular, an attribute fusion was generated and used in the ant colony algorithm. The resulting map displayed the highest performance in feature detection along the main structural feature trend, confined to a NW–SE direction. Thus, the optimized attribute fusion might be used with greater confidence to map the structural feature network with more accuracy and resolution. In order to assess the performance of the AC in feature detection, and cross validate the reliability of the method used, fuzzy c-means clustering (FCMC) was employed for the same dataset. Comparing the maps illustrates the effectiveness and preference of the AC approach due to its high resolution contrast for structural feature detection compared to the FCMC method. Accordingly, 3D planes of discontinuity determined spatial distribution of fractures in the field in order to assist well planning. Results revealed that the high impedance location probability related to an area in the vicinity of the faults, whilst low impedance location probably could indicate zones of high permeability which indicate flow conduits. Analysis under the present study suggests that the orientation and magnitude of fractures exhibiting the main trend of NW–SE in Dezful Embayment is more susceptible to stimulation and is more likely to open for fluid flow.
Maitra, Madhubanti; Chatterjee, Amitava
2008-06-01
The paper presents a new approach for automated segregation of brain MR images, using an improved orthogonal discrete wavelet transform (DWT), known as the Slantlet transform (ST), and a fuzzy c-means (FCM) clustering approach. ST has excellent time-frequency resolution characteristics and these can be achieved with shorter supports for the filter, compared to DWT employed for identical situations. FCM clustering, on the other hand, can provide efficient classification results, if it is implemented for well-processed input feature vectors. Thus, by combining both the ST and the FCM clustering approaches, a hybrid scheme has been developed that can segregate brain MR images. This automated tool when developed can infer whether the input image is that of a normal brain or a pathological brain. The proposed technique has been applied to several benchmark brain MR images and the results reveal excellent accuracy in characterizing human brain MR imaging. PMID:17698397
A modified fuzzy C-means with particle swarm optimization adaptive image segmentation algorithm
NASA Astrophysics Data System (ADS)
Gu, Yingjie; Jia, Zhenhong; Yang, Jie; Pang, Shaoning
2010-07-01
A new method which the numbers of cluster is self-adapted and use up and down cut-off of FCM combined with PSO to take place of common FCM combined with PSO is proposed in this paper. Experiment's results show that compared with the method of combining the particle swarm optimization (PSO) with common FCM, it helps to make a better effect on image segmentation and optimize the numbers of cluster and converge the rate quickly.
Wang, Shilong; Xu, Yuru; Pang, Yongjie
2011-03-01
The S/N of an underwater image is low and has a fuzzy edge. If using traditional methods to process it directly, the result is not satisfying. Though the traditional fuzzy C-means algorithm can sometimes divide the image into object and background, its time-consuming computation is often an obstacle. The mission of the vision system of an autonomous underwater vehicle (AUV) is to rapidly and exactly deal with the information about the object in a complex environment for the AUV to use the obtained result to execute the next task. So, by using the statistical characteristics of the gray image histogram, a fast and effective fuzzy C-means underwater image segmentation algorithm was presented. With the weighted histogram modifying the fuzzy membership, the above algorithm can not only cut down on a large amount of data processing and storage during the computation process compared with the traditional algorithm, so as to speed up the efficiency of the segmentation, but also improve the quality of underwater image segmentation. Finally, particle swarm optimization (PSO) described by the sine function was introduced to the algorithm mentioned above. It made up for the shortcomings that the FCM algorithm can not get the global optimal solution. Thus, on the one hand, it considers the global impact and achieves the local optimal solution, and on the other hand, further greatly increases the computing speed. Experimental results indicate that the novel algorithm can reach a better segmentation quality and the processing time of each image is reduced. They enhance efficiency and satisfy the requirements of a highly effective, real-time AUV.
Hassan, Mehdi; Chaudhry, Asmatullah; Khan, Asifullah; Iftikhar, M Aksam
2014-02-01
In this paper, a robust method is proposed for segmentation of medical images by exploiting the concept of information gain. Medical images contain inherent noise due to imaging equipment, operating environment and patient movement during image acquisition. A robust medical image segmentation technique is thus inevitable for accurate results in subsequent stages. The clustering technique proposed in this work updates fuzzy membership values and cluster centroids based on information gain computed from the local neighborhood of a pixel. The proposed approach is less sensitive to noise and produces homogeneous clustering. Experiments are performed on medical and non-medical images and results are compared with state of the art segmentation approaches. Analysis of visual and quantitative results verifies that the proposed approach outperforms other techniques both on noisy and noise free images. Furthermore, the proposed technique is used to segment a dataset of 300 real carotid artery ultrasound images. A decision system for plaque detection in the carotid artery is then proposed. Intima media thickness (IMT) is measured from the segmented images produced by the proposed approach. A feature vector based on IMT values is constructed for making decision about the presence of plaque in carotid artery using probabilistic neural network (PNN). The proposed decision system detects plaque in carotid artery images with high accuracy. Finally, effect of the proposed segmentation technique has also been investigated on classification of carotid artery ultrasound images. PMID:24239296
A relational Fuzzy C-Means algorithm for detecting protein spots in two-dimensional gel images.
Rashwan, Shaheera; Faheem, Talaat; Sarhan, Amany; Youssef, Bayumy A B
2010-01-01
Two-dimensional polyacrylamide gel electrophoresis of proteins is a robust and reproducible technique. It is the most widely used separation tool in proteomics. Current efforts in the field are directed at the development of tools for expanding the range of proteins accessible with two-dimensional gels. Proteomics was built around the two-dimensional gel. The idea that multiple proteins can be analyzed in parallel grew from two-dimensional gel maps. Proteomics researchers needed to identify interested protein spots by examining the gel. This is time consuming, labor extensive and error prone. It is desired that the computer can analyze the proteins automatically by first detecting, then quantifying the protein spots in the 2D gel images. This paper focuses on the protein spot detection and segmentation of 2D gel electrophoresis images. We present a new technique for segmentation of 2D gel images using the Fuzzy C-Means (FCM) algorithm and matching spots using the notion of fuzzy relations. Through the experimental results, the new algorithm was found out to detect protein spots more accurately, then the current known algorithms. PMID:20865504
An, Yu; Liu, Jie; Ye, Jinzuo; Mao, Yamin; Yang, Xin; Jiang, Shixin; Chi, Chongwei; Tian, Jie
2015-03-01
As an important molecular imaging modality, fluorescence molecular imaging (FMI) has the advantages of high sensitivity, low cost and ease of use. By labeling the regions of interest with fluorophore, FMI can noninvasively obtain the distribution of fluorophore in-vivo. However, due to the fact that the spectrum of fluorescence is in the section of the visible light range, there are mass of autofluorescence on the surface of the bio-tissues, which is a major disturbing factor in FMI. Meanwhile, the high-level of dark current for charge-coupled device (CCD) camera and other influencing factor can also produce a lot of background noise. In this paper, a novel method for image denoising of FMI based on fuzzy C-Means clustering (FCM) is proposed, because the fluorescent signal is the major component of the fluorescence images, and the intensity of autofluorescence and other background signals is relatively lower than the fluorescence signal. First, the fluorescence image is smoothed by sliding-neighborhood operations to initially eliminate the noise. Then, the wavelet transform (WLT) is performed on the fluorescence images to obtain the major component of the fluorescent signals. After that, the FCM method is adopt to separate the major component and background of the fluorescence images. Finally, the proposed method was validated using the original data obtained by in vivo implanted fluorophore experiment, and the results show that our proposed method can effectively obtain the fluorescence signal while eliminate the background noise, which could increase the quality of fluorescence images.
A cluster algorithm for graphs
S. Van Dongen
2000-01-01
A cluster algorithm for graphs called the emph{Markov Cluster algorithm (MCL~algorithm) is introduced. The algorithm provides basically an interface to an algebraic process defined on stochastic matrices, called the MCL~process. The graphs may be both weighted (with nonnegative weight) and directed. Let~$G$~be such a graph. The MCL~algorithm simulates flow in $G$ by first identifying $G$ in a canonical way with
Keller, Brad M.; Nathan, Diane L.; Wang Yan; Zheng Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina [Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Applied Mathematics and Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States)
2012-08-15
Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') and vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r= 0.82, p < 0.001) and processed (r= 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r= 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's {kappa}{>=} 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies.
Keller, Brad M.; Nathan, Diane L.; Wang, Yan; Zheng, Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina
2012-01-01
Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., “FOR PROCESSING”) and vendor postprocessed (i.e., “FOR PRESENTATION”), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r = 0.82, p < 0.001) and processed (r = 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r = 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's ? ? 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies. PMID:22894417
A meteor cluster detection algorithm
Burt, Joshua B.; Moorhead, Althea V.; Cooke, William J.
2014-02-01
We present an algorithm to identify groups of meteors within all-sky meteor network observations that are clustered in radiant, velocity, and time. These meteor clusters may reveal new minor meteor showers or uncover false negatives for known shower association. Sporadic meteoroid sources and established meteor showers exhibiting spatiotemporal proximity to identified clusters are reported by the algorithm for end-user reference, as well as the orbital similarity of cluster members quantified using the Drummond D-criterion. This algorithm will be integrated into the existing data-processing pipeline at the NASA Meteoroid Environments Office to alert staff in near-real time of clustered meteor events.
Effective FCM noise clustering algorithms in medical images.
Kannan, S R; Devi, R; Ramathilagam, S; Takezawa, K
2013-02-01
The main motivation of this paper is to introduce a class of robust non-Euclidean distance measures for the original data space to derive new objective function and thus clustering the non-Euclidean structures in data to enhance the robustness of the original clustering algorithms to reduce noise and outliers. The new objective functions of proposed algorithms are realized by incorporating the noise clustering concept into the entropy based fuzzy C-means algorithm with suitable noise distance which is employed to take the information about noisy data in the clustering process. This paper presents initial cluster prototypes using prototype initialization method, so that this work tries to obtain the final result with less number of iterations. To evaluate the performance of the proposed methods in reducing the noise level, experimental work has been carried out with a synthetic image which is corrupted by Gaussian noise. The superiority of the proposed methods has been examined through the experimental study on medical images. The experimental results show that the proposed algorithms perform significantly better than the standard existing algorithms. The accurate classification percentage of the proposed fuzzy C-means segmentation method is obtained using silhouette validity index. PMID:23219569
Algorithms for Gene Clustering Analysis on Genomes
Yi, Gang Man
2012-07-16
groups of functionally related genes in large-scale data sets by applying new gene clustering algorithms. Proposed gene clustering algorithms that can help us understand gene function and genome evolution include new algorithms for protein family...
CLAG: an unsupervised non hierarchical clustering algorithm handling biological data
2012-01-01
Background Searching for similarities in a set of biological data is intrinsically difficult due to possible data points that should not be clustered, or that should group within several clusters. Under these hypotheses, hierarchical agglomerative clustering is not appropriate. Moreover, if the dataset is not known enough, like often is the case, supervised classification is not appropriate either. Results CLAG (for CLusters AGgregation) is an unsupervised non hierarchical clustering algorithm designed to cluster a large variety of biological data and to provide a clustered matrix and numerical values indicating cluster strength. CLAG clusterizes correlation matrices for residues in protein families, gene-expression and miRNA data related to various cancer types, sets of species described by multidimensional vectors of characters, binary matrices. It does not ask to all data points to cluster and it converges yielding the same result at each run. Its simplicity and speed allows it to run on reasonably large datasets. Conclusions CLAG can be used to investigate the cluster structure present in biological datasets and to identify its underlying graph. It showed to be more informative and accurate than several known clustering methods, as hierarchical agglomerative clustering, k-means, fuzzy c-means, model-based clustering, affinity propagation clustering, and not to suffer of the convergence problem proper to this latter. PMID:23216858
Havens, Timothy
) data or big data are any data that you cannot load into your computer's working memory there is a dataset too big for any computer you might use; hence, this is VL data for you. Clustering is one (including VL images) in various applications, and so, clustering algorithms that scale well to VL data
DAU StatRefresher: Clustering Algorithms
This interactive module helps students to understand the definition of and uses for clustering algorithms. Students will learn to categorize the types of clustering algorithms, to use the minimal spanning tree and the k-means clustering algorithm, and to solve exercise problems using clustering algorithms. Each component has a detailed explanation along with quiz questions. A series of questions is presented at the end to test the students understanding of the lesson's entire concept.
Chang, Yeun-Chung; Huang, Yan-Hao; Huang, Chiun-Sheng; Chang, Pei-Kang; Chen, Jeon-Hor; Chang, Ruey-Feng
2012-04-01
The purpose of this study is to evaluate the diagnostic efficacy of the representative characteristic kinetic curve of dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) extracted by fuzzy c-means (FCM) clustering for the discrimination of benign and malignant breast tumors using a novel computer-aided diagnosis (CAD) system. About the research data set, DCE-MRIs of 132 solid breast masses with definite histopathologic diagnosis (63 benign and 69 malignant) were used in this study. At first, the tumor region was automatically segmented using the region growing method based on the integrated color map formed by the combination of kinetic and area under curve color map. Then, the FCM clustering was used to identify the time-signal curve with the larger initial enhancement inside the segmented region as the representative kinetic curve, and then the parameters of the Tofts pharmacokinetic model for the representative kinetic curve were compared with conventional curve analysis (maximal enhancement, time to peak, uptake rate and washout rate) for each mass. The results were analyzed with a receiver operating characteristic curve and Student's t test to evaluate the classification performance. Accuracy, sensitivity, specificity, positive predictive value and negative predictive value of the combined model-based parameters of the extracted kinetic curve from FCM clustering were 86.36% (114/132), 85.51% (59/69), 87.30% (55/63), 88.06% (59/67) and 84.62% (55/65), better than those from a conventional curve analysis. The A(Z) value was 0.9154 for Tofts model-based parametric features, better than that for conventional curve analysis (0.8673), for discriminating malignant and benign lesions. In conclusion, model-based analysis of the characteristic kinetic curve of breast mass derived from FCM clustering provides effective lesion classification. This approach has potential in the development of a CAD system for DCE breast MRI. PMID:22245697
Hierarchical Clustering Algorithms for Document Datasets
Ying Zhao; George Karypis; Usama M. Fayyad
2005-01-01
Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In particular, clustering algorithms that build meaningful hierarchies out of large document collections are ideal tools for their interactive visualization and exploration as they provide data-views that are consistent, predictable,
Sokouti, Babak; Rezvan, Farshad; Dastmalchi, Siavoush
2015-08-01
G protein-coupled receptors (GPCRs) constitute the largest superfamily of integral membrane proteins (IMPs) and they tremendously contribute in the flow of information into cells. In this study, the random forest (RF) and the subtractive fuzzy c-means clustering (SBC) methods have been used to determine the importance of input variables and discriminate GPCRs from non-GPCRs using twenty amino acid and fifty pseudo amino acid compositions derived from GPCR sequences. The studied dataset was retrieved from the UniProt/SWISSPROT database and consists of 1000 GPCR and 1000 non-GPCR reviewed sequences. The top ranked RF-SBC-based model discriminates GPCRs and non-GPCRs successfully with the accuracy, sensitivity, specificity and Matthew's coefficient correlation (MCC) rates of 99.15%, 99.60%, 98.70% and 0.983%, respectively. These rates were obtained from averaged values of 5-fold cross validation using only twenty four out of fifty pseudo amino acid composition features. The results show that the proposed RF-SBC-based model outperforms other existing algorithms in terms of the evaluated performance criteria. The webserver for the proposed algorithm is available at http://brcinfo.shinyapps.io/GPCRIden. PMID:26108102
The walking cluster algorithm Internal Report
The walking cluster algorithm Internal Report Frank De Smet, Kathleen Marchal, Gert Thijs, Janick to be excluded from further analysis. With this remark in mind, we propose in this text an algorithm that tries
Castro, Marcelo A.; Thomasson, David; Avila, Nilo A.; Hufton, Jennifer; Senseney, Justin; Johnson, Reed F.; Dyall, Julie
2013-03-01
Monkeypox virus is an emerging zoonotic pathogen that results in up to 10% mortality in humans. Knowledge of clinical manifestations and temporal progression of monkeypox disease is limited to data collected from rare outbreaks in remote regions of Central and West Africa. Clinical observations show that monkeypox infection resembles variola infection. Given the limited capability to study monkeypox disease in humans, characterization of the disease in animal models is required. A previous work focused on the identification of inflammatory patterns using PET/CT image modality in two non-human primates previously inoculated with the virus. In this work we extended techniques used in computer-aided detection of lung tumors to identify inflammatory lesions from monkeypox virus infection and their progression using CT images. Accurate estimation of partial volumes of lung lesions via segmentation is difficult because of poor discrimination between blood vessels, diseased regions, and outer structures. We used hard C-means algorithm in conjunction with landmark based registration to estimate the extent of monkeypox virus induced disease before inoculation and after disease progression. Automated estimation is in close agreement with manual segmentation.
Zainuddin, Zarita; Lai, Kee Huong; Ong, Pauline
2013-04-01
Artificial neural networks (ANNs) are powerful mathematical models that are used to solve complex real world problems. Wavelet neural networks (WNNs), which were developed based on the wavelet theory, are a variant of ANNs. During the training phase of WNNs, several parameters need to be initialized; including the type of wavelet activation functions, translation vectors, and dilation parameter. The conventional k-means and fuzzy c-means clustering algorithms have been used to select the translation vectors. However, the solution vectors might get trapped at local minima. In this regard, the evolutionary harmony search algorithm, which is capable of searching for near-optimum solution vectors, both locally and globally, is introduced to circumvent this problem. In this paper, the conventional k-means and fuzzy c-means clustering algorithms were hybridized with the metaheuristic harmony search algorithm. In addition to obtaining the estimation of the global minima accurately, these hybridized algorithms also offer more than one solution to a particular problem, since many possible solution vectors can be generated and stored in the harmony memory. To validate the robustness of the proposed WNNs, the real world problem of epileptic seizure detection was presented. The overall classification accuracy from the simulation showed that the hybridized metaheuristic algorithms outperformed the standard k-means and fuzzy c-means clustering algorithms.
Guofeng, Jin; Wei, Zhang; Zhengwei, Yang; Zhiyong, Huang; Yuanjia, Song; Dongdong, Wang; Gan, Tian
2012-12-01
The Fuzzy C-Mean clustering (FCM) algorithm is an effective image segmentation algorithm which combines the clustering of non-supervised and the idea of the blurry aggregate, it is widely applied to image segmentation, but it has many problems, such as great amount of calculation, being sensitive to initial data values and noise in images, and being vulnerable to fall into the shortcoming of local optimization. To conquer the problems of FCM, the algorithm of fuzzy clustering based on Particle Swarm Optimization (PSO) was proposed, this article first uses the PSO algorithm of a powerful global search capability to optimize FCM centers, and then uses this center to partition the images, the speed of the image segmentation was boosted and the segmentation accuracy was improved. The results of the experiments show that the PSO-FCM algorithm can effectively avoid the disadvantage of FCM, boost the speed and get a better image segmentation result.
PFClust: a novel parameter free clustering algorithm
2013-01-01
Background We present the algorithm PFClust (Parameter Free Clustering), which is able automatically to cluster data and identify a suitable number of clusters to group them into without requiring any parameters to be specified by the user. The algorithm partitions a dataset into a number of clusters that share some common attributes, such as their minimum expectation value and variance of intra-cluster similarity. A set of n objects can be clustered into any number of clusters from one to n, and there are many different hierarchical and partitional, agglomerative and divisive, clustering methodologies available that can be used to do this. Nonetheless, automatically determining the number of clusters present in a dataset constitutes a significant challenge for clustering algorithms. Identifying a putative optimum number of clusters to group the objects into involves computing and evaluating a range of clusterings with different numbers of clusters. However, there is no agreed or unique definition of optimum in this context. Thus, we test PFClust on datasets for which an external gold standard of ‘correct’ cluster definitions exists, noting that this division into clusters may be suboptimal according to other reasonable criteria. PFClust is heuristic in the sense that it cannot be described in terms of optimising any single simply-expressed metric over the space of possible clusterings. Results We validate PFClust firstly with reference to a number of synthetic datasets consisting of 2D vectors, showing that its clustering performance is at least equal to that of six other leading methodologies – even though five of the other methods are told in advance how many clusters to use. We also demonstrate the ability of PFClust to classify the three dimensional structures of protein domains, using a set of folds taken from the structural bioinformatics database CATH. Conclusions We show that PFClust is able to cluster the test datasets a little better, on average, than any of the other algorithms, and furthermore is able to do this without the need to specify any external parameters. Results on the synthetic datasets demonstrate that PFClust generates meaningful clusters, while our algorithm also shows excellent agreement with the correct assignments for a dataset extracted from the CATH part-manually curated classification of protein domain structures. PMID:23819480
K-nearest neighbors clustering algorithm
Gauza, Dariusz; ?ukowska, Anna; Nowak, Robert
2014-11-01
Cluster analysis, understood as unattended method of assigning objects to groups solely on the basis of their measured characteristics, is the common method to analyze DNA microarray data. Our proposal is to classify the results of one nearest neighbors algorithm (1NN). The presented method well cope with complex, multidimensional data, where the number of groups is properly identified. The numerical experiments on benchmark microarray data shows that presented algorithm give a better results than k-means clustering.
A survey of Clustering Algorithms
Rokach, Lior
This chapter presents a tutorial overview of the main clustering methods used in Data Mining. The goal is to provide a self-contained review of the concepts and the mathematics underlying clustering techniques. The chapter begins by providing measures and criteria that are used for determining whether two objects are similar or dissimilar. Then the clustering methods are presented, divided into: hierarchical, partitioning, density-based, model-based, grid-based, and soft-computing methods. Following the methods, the challenges of performing clustering in large data sets are discussed. Finally, the chapter presents how to determine the number of clusters.
Comparison of color clustering algorithms for segmentation of dermatological images
Melli, Rudy; Grana, Costantino; Cucchiara, Rita
2006-03-01
Automatic segmentation of skin lesions in clinical images is a very challenging task; it is necessary for visual analysis of the edges, shape and colors of the lesions to support the melanoma diagnosis, but, at the same time, it is cumbersome since lesions (both naevi and melanomas) do not have regular shape, uniform color, or univocal structure. Most of the approaches adopt unsupervised color clustering. This works compares the most spread color clustering algorithms, namely median cut, k-means, fuzzy-c means and mean shift applied to a method for automatic border extraction, providing an evaluation of the upper bound in accuracy that can be reached with these approaches. Different tests have been performed to examine the influence of the choice of the parameter settings with respect to the performances of the algorithms. Then a new supervised learning phase is proposed to select the best number of clusters and to segment the lesion automatically. Examples have been carried out in a large database of medical images, manually segmented by dermatologists. From these experiments mean shift was resulted the best technique, in term of sensitivity and specificity. Finally, a qualitative evaluation of the goodness of segmentation has been validated by the human experts too, confirming the results of the quantitative comparison.
A local distribution based spatial clustering algorithm
Deng, Min; Liu, Qiliang; Li, Guangqiang; Cheng, Tao
2009-10-01
Spatial clustering is an important means for spatial data mining and spatial analysis, and it can be used to discover the potential spatial association rules and outliers among the spatial data. Most existing spatial clustering algorithms only utilize the spatial distance or local density to find the spatial clusters in a spatial database, without taking the spatial local distribution characters into account, so that the clustered results are unreasonable in many cases. To overcome such limitations, this paper develops a new indicator (i.e. local median angle) to measure the local distribution at first, and further proposes a new algorithm, called local distribution based spatial clustering algorithm (LDBSC in abbreviation). In the process of spatial clustering, a series of recursive search are implemented for all the entities so that those entities with its local median angle being very close or equal are clustered. In this way, all the spatial entities in the spatial database can be automatically divided into some clusters. Finally, two tests are implemented to demonstrate that the method proposed in this paper is more prominent than DBSCAN, as well as that it is very robust and feasible, and can be used to find the clusters with different shapes.
An algorithm for spatial heirarchy clustering
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (principal investigator); Velasco, F. R. D.
1981-01-01
A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.
Disentangling Clustering Effects in Jet Algorithms
Randall Kelley; Jonathan R. Walsh; Saba Zuberi
2012-04-04
Clustering algorithms build jets though the iterative application of single particle and pairwise metrics. This leads to phase space constraints that are extremely complicated beyond the lowest orders in perturbation theory, and in practice they must be implemented numerically. This complication presents a significant barrier to gaining an analytic understanding of the perturbative structure of jet cross sections. We present a novel framework to express the jet algorithm's phase space constraints as a function of clustered groups of particles, which are the possible outcomes of the algorithm. This approach highlights the analytic properties of jet observables, rather than the explicit constraints on individual final state momenta, which can be unwieldy at higher orders. We derive the form of the n-particle phase space constraints for a jet algorithm with any measurement. We provide an expression for the measurement that makes clustering effects manifest and relates them to constraints from clustering at lower orders. The utility of this framework is demonstrated by using it to understand clustering effects for a large class of jet shape observables in the soft/collinear limit. We apply this framework to isolate divergences and analyze the logarithmic structure of the Abelian terms in the soft function, providing the all-orders form of these terms and showing that corrections from clustering start at next-to-leading logarithmic order in the exponent of the cross section.
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various conditions were collected in conjunction with JSC postural control studies using a Tilt-Translation Device (TTD). The University of West Florida proposed applying the Fuzzy C-Means Clustering (FCM) Algorithms to this data with a view towards identifying various states and stages. Data supplied by NASA/JSC were submitted to the FCM algorithms in an attempt to identify and characterize cluster substructure in a mixed ensemble of pre- and post-adaptational TTD data. Following several unsuccessful trials with FCM using a full 11 dimensional data set, a set of two channels (features) were found to enable FCM to separate pre- from post-adaptational TTD data. The main conclusions are that: (1) FCM seems able to separate pre- from post-TTD subject no. 2 on the one trial that was used, but only in certain subintervals of time; and (2) Channels 2 (right rear transducer force) and 8 (hip sway bar) contain better discrimination information than other supersets and combinations of the data that were tried so far.
Density based clustering algorithm based on satellite cloud sensing
Krivtsov, I. A.; Kalayda, V. T.
2014-11-01
The paper proposes a modified clustering algorithm cloud cover of the Earth based on the density data clustering algorithm in the presence of noise. A new approach to identification of textural features based on the evaluation of the information matrix adjacency gradation. To optimize the clustering algorithm used data structure R-trees. The results of the algorithm are given in this article.
Fusion and clustering algorithms for spatial data
Kuntala, Pavani
Spatial clustering is an approach for discovering groups of related data points in spatial data. Spatial clustering has attracted a lot of research attention due to various applications where it is needed. It holds practical importance in application domains such as geographic knowledge discovery, sensors, rare disease discovery, astronomy, remote sensing, and so on. The motivation for this work stems from the limitations of the existing spatial clustering methods. In most conventional spatial clustering algorithms, the similarity measurement mainly considers the geometric attributes. However, in many real applications, users are concerned about both the spatial and the non-spatial attributes. In conventional spatial clustering, the input data set is partitioned into several compact regions and data points that are similar to one another in their non-spatial attributes may be scattered over different regions, thus making the corresponding objective difficult to achieve. In this dissertation, a novel clustering methodology is proposed to explore the clustering problem within both spatial and non-spatial domains by employing a fusion-based approach. The goal is to optimize a given objective function in the spatial domain, while satisfying the constraint specified in the non- spatial attribute domain. Several experiments are conducted to provide insights into the proposed methodology. The algorithm first captures the spatial cores having the highest structure and then employs an iterative, heuristic mechanism to find the optimal number of spatial cores and non-spatial clusters that exist in the data. Such a fusion-based framework allows for the handling of data streams and provides a framework for comparing spatial clusters. The correctness and efficiency of the proposed clustering model is demonstrated on real world and synthetic data sets.
DEPARTAMENTO DE COMPUTACION Constrained Clustering Algorithms
Coruña, Universidade da
PHD THESIS TESE DE DOUTORAMENTO Manuel Eduardo Ares Brea 2013
HARP: A Practical Projected Clustering Algorithm
Cheung, David Wai-lok
the objects are revealed in Fig. 1b, where the members of different clusters are given different shapes for faster and more specialized algorithms grows with the production of huge amount of data with diverse data of the input space defined by the dimensions1 of the data set. The similarity between different members
CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling
Kumar, Vipin
CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling George Karypis EuiHong (Sam, densities, and sizes. In this paper, we present a novel hierarchical clustering algorithm called CHAMELEON clusters. The methodology of dynamic modeling of clusters used in CHAMELEON is applicable to all types
Dimensionality Reduction Particle Swarm Algorithm for High Dimensional Clustering
Cui, Xiaohui [ORNL; ST Charles, Jesse Lee [ORNL; Potok, Thomas E [ORNL; Beaver, Justin M [ORNL
2008-01-01
The Particle Swarm Optimization (PSO) clustering algorithm can generate more compact clustering results than the traditional K-means clustering algorithm. However, when clustering high dimensional datasets, the PSO clustering algorithm is notoriously slow because its computation cost increases exponentially with the size of the dataset dimension. Dimensionality reduction techniques offer solutions that both significantly improve the computation time, and yield reasonably accurate clustering results in high dimensional data analysis. In this paper, we introduce research that combines different dimensionality reduction techniques with the PSO clustering algorithm in order to reduce the complexity of high dimensional datasets and speed up the PSO clustering process. We report significant improvements in total runtime. Moreover, the clustering accuracy of the dimensionality reduction PSO clustering algorithm is comparable to the one that uses full dimension space.
A hybrid discrete Artificial Bee Colony - GRASP algorithm for clustering
Y. Marinakis; M. Marinaki; N. Matsatsinis
2009-01-01
This paper presents a new hybrid algorithm, which is based on the concepts of the artificial bee colony (ABC) and greedy randomized adaptive search procedure (GRASP), for optimally clustering N objects into K clusters. The proposed algorithm is a two phase algorithm which combines an artificial bee colony optimization algorithm for the solution of the feature selection problem and a
Hong Yan
2001-01-01
This paper presents a fuzzy clustering algorithm for the extraction of a smooth curve from unordered noisy data. In this method, the input data are first clustered into different regions using the fuzzy c-means algorithm and each region is represented by its cluster center. Neighboring cluster centers are linked to produce a graph according to the average class membership values.
CLASSY: An adaptive maximum likelihood clustering algorithm
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Rassbach, M. E. (principal investigators)
1979-01-01
The CLASSY clustering method alternates maximum likelihood iterative techniques for estimating the parameters of a mixture distribution with an adaptive procedure for splitting, combining, and eliminating the resultant components of the mixture. The adaptive procedure is based on maximizing the fit of a mixture of multivariate normal distributions to the observed data using its first through fourth central moments. It generates estimates of the number of multivariate normal components in the mixture as well as the proportion, mean vector, and covariance matrix for each component. The basic mathematical model for CLASSY and the actual operation of the algorithm as currently implemented are described. Results of applying CLASSY to real and simulated LANDSAT data are presented and compared with those generated by the iterative self-organizing clustering system algorithm on the same data sets.
Chaotic map clustering algorithm for EEG analysis
Bellotti, R.; De Carlo, F.; Stramaglia, S.
2004-03-01
The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
Hearing the clusters in a graph: A distributed algorithm
Sahai, Tuhin; Banaszuk, Andrzej
2009-01-01
We propose a novel distributed algorithm to decompose graphs or cluster data. The algorithm recovers the solution obtained from spectral clustering without need for expensive eigenvalue/ eigenvector computations. We demonstrate that by solving the wave equation on the graph, every node can assign itself to a cluster by performing a local fast Fourier transform. We prove the equivalence of our algorithm to spectral clustering, derive convergence rates and demonstrate it on examples.
Dekkers, M. J.; Heslop, D.; Herrero-Bervera, E.; Acton, G.; Krasa, D.
2014-12-01
Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6.44.1' N, 91.56.1' W) on the Cocos Plate occurs in 15.2 Ma oceanic crust generated by superfast seafloor spreading. Presently, it is the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Here we interpret down-hole trends in several rock-magnetic parameters with fuzzy c-means cluster analysis, a multivariate statistical technique. The parameters include the magnetization ratio, the coercivity ratio, the coercive force, the low-field susceptibility, and the Curie temperature. By their combined, multivariate, analysis the effects of magmatic and hydrothermal processes can be evaluated. The optimal number of clusters - a key point in the analysis because there is no a priori information on this - was determined through a combination of approaches: by calculation of several cluster validity indices, by testing for coherent cluster distributions on non-linear-map plots, and importantly by testing for stability of the cluster solution from all possible starting points. Here, we consider a solution robust if the cluster allocation is independent of the starting configuration. The five-cluster solution appeared to be robust. Three clusters are distinguished in the extrusive segment of the Hole that express increasing hydrothermal alteration of the lavas. The sheeted dike and gabbro portions are characterized by two clusters, both with higher coercivities than in lava samples. Extensive alteration, however, can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. All clusters display rock magnetic characteristics in line with a stable NRM. This implies that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Determination of the absolute paleointensity with thermal techniques is not straightforward because of the propensity of oxyexsolution during laboratory heating and/or the presence of intergrowths. The upper part of the extrusive sequence, the granoblastic portion of the dikes, and moderately altered gabbros may contain a comparatively uncontaminated thermoremanent magnetization.
Parallelization of Edge Detection Algorithm using MPI on Beowulf Cluster
Haron, Nazleeni; Amir, Ruzaini; Aziz, Izzatdin A.; Jung, Low Tan; Shukri, Siti Rohkmah
In this paper, we present the design of parallel Sobel edge detection algorithm using Foster's methodology. The parallel algorithm is implemented using MPI message passing library and master/slave algorithm. Every processor performs the same sequential algorithm but on different part of the image. Experimental results conducted on Beowulf cluster are presented to demonstrate the performance of the parallel algorithm.
Energy Aware Clustering Algorithms for Wireless Sensor Networks
Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian
2011-09-01
The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
Clustering algorithm for determining community structure in large networks
Pujol, Josep M.; Béjar, Javier; Delgado, Jordi
2006-07-01
We propose an algorithm to find the community structure in complex networks based on the combination of spectral analysis and modularity optimization. The clustering produced by our algorithm is as accurate as the best algorithms on the literature of modularity optimization; however, the main asset of the algorithm is its efficiency. The best match for our algorithm is Newman’s fast algorithm, which is the reference algorithm for clustering in large networks due to its efficiency. When both algorithms are compared, our algorithm outperforms the fast algorithm both in efficiency and accuracy of the clustering, in terms of modularity. Thus, the results suggest that the proposed algorithm is a good choice to analyze the community structure of medium and large networks in the range of tens and hundreds of thousand vertices.
Filamentary galaxy clustering - A mapping algorithm
NASA Technical Reports Server (NTRS)
Gott, J. R., III; Moody, J. E.; Turner, E. L.
1983-01-01
A simple and objective algorithm is presented which not only accurately identifies the filamentary structures in the Shane-Wirtanen galaxy count catalog, but also finds a set of visually less impressive filaments in a static hierarchical model of the clustering conducted by Soneira and Peebles (1978). The statistical properties of the elements in the model, while very similar to those in the data, show a significant excess of long and bright filaments in the data relative to the model. Two possible interpretations of these results are presented and discussed.
A Novel Clustering Algorithm Inspired by Membrane Computing
Luo, Xiaohui; Gao, Zhisheng; Wang, Jun; Pei, Zheng
2015-01-01
P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature. PMID:25874264
A novel clustering algorithm inspired by membrane computing.
Peng, Hong; Luo, Xiaohui; Gao, Zhisheng; Wang, Jun; Pei, Zheng
2015-01-01
P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature. PMID:25874264
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models
NASA Technical Reports Server (NTRS)
Mjoisness, Eric; Castano, Rebecca; Gray, Alexander
1999-01-01
We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
Optimal combination of nested clusters by a greedy approximation algorithm.
Dang, Edward K F; Luk, Robert W P; Lee, D L; Ho, K S; Chan, Stephen C F
2009-11-01
Given a set of clusters, we consider an optimization problem which seeks a subset of clusters that maximizes the microaverage F-measure. This optimal value can be used as an evaluation measure of the goodness of clustering. For arbitrarily overlapping clusters, finding the optimal value is NP-hard. We claim that a greedy approximation algorithm yields the global optimal solution for clusters that overlap only by nesting. We present a mathematical proof of this claim by induction. For a family of n clusters containing a total of N objects, this algorithm has an {\\rm O}(n;{2}) time complexity and O(N) space complexity. PMID:19762933
A new efficient Cluster Algorithm for the Ising Model
Matthias Nyfeler; Michele Pepe; Uwe-Jens Wiese
2005-10-06
Using D-theory we construct a new efficient cluster algorithm for the Ising model. The construction is very different from the standard Swendsen-Wang algorithm and related to worm algorithms. With the new algorithm we have measured the correlation function with high precision over a surprisingly large number of orders of magnitude.
Hybridizing Evolutionary Algorithms and Clustering Algorithms to Find Source-Code Clones
Maletic, Jonathan I.
Hybridizing Evolutionary Algorithms and Clustering Algorithms to Find Source-Code Clones Andrew a hybrid approach to detect source-code clones that combines evolutionary algorithms and clustering. A case is effective in detecting groups of source-code clones.
Falter, Siegfried
2006-07-01
HIROCS is a multi-color survey designed to construct a statistically significant galaxy cluster sample for galaxy evolution studies using a multi-color classification scheme in the redshift range 0.5 < z < 1.5. After contributing to the survey specifications, tests of the multi-color classification with the observational setup showed the feasibility of the project. The photometric redshift accuracy of ?z) = 0.076 was estimated at the R band limit of ~25 mag. The new algorithm for the galaxy cluster detection was developed and tested with COMBO-17 data. In the three COMBO-17 fields covering 0.78 square degrees 15 cluster candidates were identified in the redshift range 0.3 < z < 0.9. The power of the search method was demonstrated by a comparison with the cluster detections from the Voronoi tessellation. For the determination of the cluster selection function in HIROCS and COMBO-17 procedures to simulate galaxy clusters were introduced. Due to the lack of fully reduced HIROCS data, the COMBO-17 selection function was quantified; rich clusters are expected to be found in the redshift range covered by COMBO-17. First steps towards an analysis of cluster candidates were carried out using the COMBO-17 candidates. Finally, a rich cluster at redshift ~0.7 was identified in the first HIROCS infrared data.
Evaluation of hierarchical clustering algorithms for document datasets
Ying Zhao; George Karypis
2002-01-01
Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering solutions provide a view of the data at different levels of granularity, making them ideal for people to visualize and interactively explore large document collections.In this
A survey of fuzzy clustering algorithms for pattern recognition. I
Andrea Baraldi; Palma Blonda
1999-01-01
Clustering algorithms aim at modeling fuzzy (i.e., ambiguous) unlabeled patterns efficiently. Our goal is to propose a theoretical framework where the expressive power of clustering systems can be compared on the basis of a meaningful set of common functional features. Part I of this paper reviews the following issues related to clustering approaches found in the literature: relative (probabilistic) and
Clustering algorithms for Stokes space modulation format recognition.
Boada, Ricard; Borkowski, Robert; Monroy, Idelfonso Tafur
2015-06-15
Stokes space modulation format recognition (Stokes MFR) is a blind method enabling digital coherent receivers to infer modulation format information directly from a received polarization-division-multiplexed signal. A crucial part of the Stokes MFR is a clustering algorithm, which largely influences the performance of the detection process, particularly at low signal-to-noise ratios. This paper reports on an extensive study of six different clustering algorithms: k-means, expectation maximization, density-based DBSCAN and OPTICS, spectral clustering and maximum likelihood clustering, used for discriminating between dual polarization: BPSK, QPSK, 8-PSK, 8-QAM, and 16-QAM. We determine essential performance metrics for each clustering algorithm and modulation format under test: minimum required signal-to-noise ratio, detection accuracy and algorithm complexity. PMID:26193532
A biased random-key genetic algorithm for data clustering.
Festa, P
2013-09-01
Cluster analysis aims at finding subsets (clusters) of a given set of entities, which are homogeneous and/or well separated. Starting from the 1990s, cluster analysis has been applied to several domains with numerous applications. It has emerged as one of the most exciting interdisciplinary fields, having benefited from concepts and theoretical results obtained by different scientific research communities, including genetics, biology, biochemistry, mathematics, and computer science. The last decade has brought several new algorithms, which are able to solve larger sized and real-world instances. We will give an overview of the main types of clustering and criteria for homogeneity or separation. Solution techniques are discussed, with special emphasis on the combinatorial optimization perspective, with the goal of providing conceptual insights and literature references to the broad community of clustering practitioners. A new biased random-key genetic algorithm is also described and compared with several efficient hybrid GRASP algorithms recently proposed to cluster biological data. PMID:23896381
Sparse Subspace Clustering: Algorithm, Theory, and Applications
Vidal, René
. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework as well as the two real-world problems of motion segmentation and face clustering.
A fuzzy clustering algorithm to detect planar and quadric shapes
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa
1992-01-01
In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.
The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm
Ahmed, Zakir Hussain
2014-01-01
The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148
A novel spatial clustering algorithm based on Delaunay triangulation
Yang, Xiankun; Cui, Weihong
2008-12-01
Exploratory data analysis is increasingly more necessary as larger spatial data is managed in electro-magnetic media. Spatial clustering is one of the very important spatial data mining techniques. So far, a lot of spatial clustering algorithms have been proposed. In this paper we propose a robust spatial clustering algorithm named SCABDT (Spatial Clustering Algorithm Based on Delaunay Triangulation). SCABDT demonstrates important advantages over the previous works. First, it discovers even arbitrary shape of cluster distribution. Second, in order to execute SCABDT, we do not need to know any priori nature of distribution. Third, like DBSCAN, Experiments show that SCABDT does not require so much CPU processing time. Finally it handles efficiently outliers.
Efficient Cluster Algorithm for Spin Glasses in Any Space Dimension.
Zhu, Zheng; Ochoa, Andrew J; Katzgraber, Helmut G
2015-08-14
Spin systems with frustration and disorder are notoriously difficult to study, both analytically and numerically. While the simulation of ferromagnetic statistical mechanical models benefits greatly from cluster algorithms, these accelerated dynamics methods remain elusive for generic spin-glass-like systems. Here, we present a cluster algorithm for Ising spin glasses that works in any space dimension and speeds up thermalization by at least one order of magnitude at temperatures where thermalization is typically difficult. Our isoenergetic cluster moves are based on the Houdayer cluster algorithm for two-dimensional spin glasses and lead to a speedup over conventional state-of-the-art methods that increases with the system size. We illustrate the benefits of the isoenergetic cluster moves in two and three space dimensions, as well as the nonplanar chimera topology found in the D-Wave Inc. quantum annealing machine. PMID:26317743
Efficient Cluster Algorithm for Spin Glasses in Any Space Dimension
Zheng Zhu; Andrew J. Ochoa; Helmut G. Katzgraber
2015-08-15
Spin systems with frustration and disorder are notoriously difficult to study both analytically and numerically. While the simulation of ferromagnetic statistical mechanical models benefits greatly from cluster algorithms, these accelerated dynamics methods remain elusive for generic spin-glass-like systems. Here we present a cluster algorithm for Ising spin glasses that works in any space dimension and speeds up thermalization by at least one order of magnitude at temperatures where thermalization is typically difficult. Our isoenergetic cluster moves are based on the Houdayer cluster algorithm for two-dimensional spin glasses and lead to a speedup over conventional state-of-the-art methods that increases with the system size. We illustrate the benefits of the isoenergetic cluster moves in two and three space dimensions, as well as the nonplanar chimera topology found in the D-Wave Inc.~quantum annealing machine.
Efficient Cluster Algorithm for Spin Glasses in Any Space Dimension
Zhu, Zheng; Ochoa, Andrew J.; Katzgraber, Helmut G.
2015-08-01
Spin systems with frustration and disorder are notoriously difficult to study, both analytically and numerically. While the simulation of ferromagnetic statistical mechanical models benefits greatly from cluster algorithms, these accelerated dynamics methods remain elusive for generic spin-glass-like systems. Here, we present a cluster algorithm for Ising spin glasses that works in any space dimension and speeds up thermalization by at least one order of magnitude at temperatures where thermalization is typically difficult. Our isoenergetic cluster moves are based on the Houdayer cluster algorithm for two-dimensional spin glasses and lead to a speedup over conventional state-of-the-art methods that increases with the system size. We illustrate the benefits of the isoenergetic cluster moves in two and three space dimensions, as well as the nonplanar chimera topology found in the D-Wave Inc. quantum annealing machine.
CARP: The Clustering Algorithms' Referee Version 3.3 Manual
Maitra, Ranjan
. . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.2 Example: Evaluating clustering algorithms with homogeneous non-spherical dispersion structures: An alternative approach . . . . . . . . . . 39 4.10 Example: Simulating non-Gaussian mixtures of clustering procedures 9 3.3.5 Simulating non-Gaussian mixtures . . . . . . . . . . . . . . . . . . . 9 3
Efficient Clustering Algorithms for Self-Organizing Wireless Sensor Networks
Starobinski, David
Efficient Clustering Algorithms for Self-Organizing Wireless Sensor Networks Rajesh Krishnan BBN@bu.edu Abstract Self-organization of wireless sensor networks, which involves network decomposi- tion-organization in wireless sensor networks. We first present a novel approach for message-efficient clustering, in which
WCA: A Weighted Clustering Algorithm for Mobile Ad Hoc Networks
Mainak Chatterjee; Sajal K. Das; Damla Turgut
2002-01-01
Abstract: In this paper, we propose an on-demand distributed clustering algorithm for multi-hop packet radio networks. These types ofnetworks, also known as ad hoc networks, are dynamic in nature due to the mobility of nodes. The association and dissociation of nodes toand from clusters perturb the stability of the network topology, and hence a reconfiguration of the system is often
Clustering of Hadronic Showers with a Structural Algorithm
Charles, M.J.; /SLAC
2005-12-13
The internal structure of hadronic showers can be resolved in a high-granularity calorimeter. This structure is described in terms of simple components and an algorithm for reconstruction of hadronic clusters using these components is presented. Results from applying this algorithm to simulated hadronic Z-pole events in the SiD concept are discussed.
Clustering algorithms in PBS MohammadTaghi Hajiaghayi
Hajiaghayi, Mohammad
Clustering algorithms in PBS MohammadTaghi Hajiaghayi mhajiaghayi@uwaterloo.ca Department components. This task can be done by applications like PBS which find these interactions using compiling contain.rsf file in PBS. In this paper, we present some algorithms for the above problem from
CCL: an algorithm for the efficient comparison of clusters
Hundt, R.; Schön, J. C.; Neelamraju, S.; Zagorac, J.; Jansen, M.
2013-01-01
The systematic comparison of the atomic structure of solids and clusters has become an important task in crystallography, chemistry, physics and materials science, in particular in the context of structure prediction and structure determination of nanomaterials. In this work, an efficient and robust algorithm for the comparison of cluster structures is presented, which is based on the mapping of the point patterns of the two clusters onto each other. This algorithm has been implemented as the module CCL in the structure visualization and analysis program KPLOT. PMID:23682193
Multilayer cellular neural network and fuzzy C-mean classifiers: comparison and performance analysis
Trujillo San-Martin, Maite; Hlebarov, Vejen; Sadki, Mustapha
2004-11-01
Neural Networks and Fuzzy systems are considered two of the most important artificial intelligent algorithms which provide classification capabilities obtained through different learning schemas which capture knowledge and process it according to particular rule-based algorithms. These methods are especially suited to exploit the tolerance for uncertainty and vagueness in cognitive reasoning. By applying these methods with some relevant knowledge-based rules extracted using different data analysis tools, it is possible to obtain a robust classification performance for a wide range of applications. This paper will focus on non-destructive testing quality control systems, in particular, the study of metallic structures classification according to the corrosion time using a novel cellular neural network architecture, which will be explained in detail. Additionally, we will compare these results with the ones obtained using the Fuzzy C-means clustering algorithm and analyse both classifiers according to its classification capabilities.
Efficient Cluster Algorithm for CP(N-1) Models
B. B Beard; M. Pepe; S. Riederer; U. -J. Wiese
2006-02-14
Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard Wilson formulation of lattice field theory. In fact, there is a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. In this paper, we construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a regularization for CP(N-1) models in the framework of D-theory. We present detailed studies of the autocorrelations and find a dynamical critical exponent that is consistent with z = 0.
A Geometric Clustering Algorithm with Applications to Structural Data
Xu, Shutan; Zou, Shuxue
2015-01-01
Abstract An important feature of structural data, especially those from structural determination and protein-ligand docking programs, is that their distribution could be mostly uniform. Traditional clustering algorithms developed specifically for nonuniformly distributed data may not be adequate for their classification. Here we present a geometric partitional algorithm that could be applied to both uniformly and nonuniformly distributed data. The algorithm is a top-down approach that recursively selects the outliers as the seeds to form new clusters until all the structures within a cluster satisfy a classification criterion. The algorithm has been evaluated on a diverse set of real structural data and six sets of test data. The results show that it is superior to the previous algorithms for the clustering of structural data and is similar to or better than them for the classification of the test data. The algorithm should be especially useful for the identification of the best but minor clusters and for speeding up an iterative process widely used in NMR structure determination. PMID:25517067
Multilayer Traffic Network Optimized by Multiobjective Genetic Clustering Algorithm
Wen, Feng; Gen, Mitsuo; Yu, Xinjie
This paper introduces a multilayer traffic network model and traffic network clustering method for solving the route selection problem (RSP) in car navigation system (CNS). The purpose of the proposed method is to reduce the computation time of route selection substantially with acceptable loss of accuracy by preprocessing the large size traffic network into new network form. The proposed approach further preprocesses the traffic network than the traditional hierarchical network method by clustering method. The traffic network clustering considers two criteria. We specify a genetic clustering algorithm for traffic network clustering and use NSGA-II for calculating the multiple objective Pareto optimal set. The proposed method can overcome the size limitations when solving route selection in CNS. Solutions provided by the proposed algorithm are compared with the optimal solutions to analyze and quantify the loss of accuracy.
Sampling Within k-Means Algorithm to Cluster Large Datasets
Bejarano, Jeremy; Bose, Koushiki; Brannan, Tyler; Thomas, Anita; Adragni, Kofi; Neerchal, Nagaraj; Ostrouchov, George
2011-08-01
Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.
A Decentralized Fuzzy C-Means-Based Energy-Efficient Routing Protocol for Wireless Sensor Networks
2014-01-01
Energy conservation in wireless sensor networks (WSNs) is a vital consideration when designing wireless networking protocols. In this paper, we propose a Decentralized Fuzzy Clustering Protocol, named DCFP, which minimizes total network energy dissipation to promote maximum network lifetime. The process of constructing the infrastructure for a given WSN is performed only once at the beginning of the protocol at a base station, which remains unchanged throughout the network's lifetime. In this initial construction step, a fuzzy C-means algorithm is adopted to allocate sensor nodes into their most appropriate clusters. Subsequently, the protocol runs its rounds where each round is divided into a CH-Election phase and a Data Transmission phase. In the CH-Election phase, the election of new cluster heads is done locally in each cluster where a new multicriteria objective function is proposed to enhance the quality of elected cluster heads. In the Data Transmission phase, the sensing and data transmission from each sensor node to their respective cluster head is performed and cluster heads in turn aggregate and send the sensed data to the base station. Simulation results demonstrate that the proposed protocol improves network lifetime, data delivery, and energy consumption compared to other well-known energy-efficient protocols. PMID:25162060
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
Martin Ester; Hans-peter Kriegel; Jörg Sander; Xiaowei Xu
1996-01-01
Clustering algorithms are attractive for the task of class iden- tification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large da- tabases. The well-known clustering algorithms offer no solu- tion to
Membership determination of open cluster NGC 188 based on the DBSCAN clustering algorithm
Gao, Xin-Hua
2014-02-01
High-precision proper motions and radial velocities of 1046 stars are used to determine member stars using three-dimensional (3D) kinematics for open cluster NGC 188 based on the density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm. By implementing this algorithm, 472 member stars in the cluster are obtained with 3D kinematics. The color-magnitude diagram (CMD) of the 472 member stars using 3D kinematics shows a well-defined main sequence and a red giant branch, which indicate that the DBSCAN clustering algorithm is very effective for membership determination. The DBSCAN clustering algorithm can effectively select probable member stars in 3D kinematic space without any assumption about the distribution of the cluster or field stars. Analysis results show that the CMD of member stars is significantly clearer than the one based on 2D kinematics, which allows us to better constrain the cluster members and estimate their physical parameters. Using the 472 member stars, the average absolute proper motion and radial velocity are determined to be (PM?, PM?) = (-2.58 ± 0.22, +0.17 ± 0.18) mas yr-1 and Vr = -42.35 ± 0.05 km s-1, respectively. Our values are in good agreement with values derived by other authors.
An image denoising algorithm based on clustering and median filtering
Wang, YuLing; Li, Ming; Li, Li
2015-03-01
It is proposed of an improved median de-noising method, namely an image de-noising algorithm based on clustering and median filtering. The algorithm is a kind of image fast de-noising method based on the clustering idea, the singular point points are isolated from the image and then clustering. It is advantage to better protect the details of an image and to substantially reduce calculation. Compared with traditional median filter, mean filter and wiener filter, our approach is more adaptive and receives better results. While for images that have complex details such as texture images, the results of experiment show that the proposed algorithm works less well in the de-noising effect comparatively.
Spin chain simulations with a meron cluster algorithm
Thomas Boyer; Wolfgang Bietenholz; Jair Wuilloud
2007-11-23
We apply a meron cluster algorithm to the XY spin chain, which describes a quantum rotor. This is a multi-cluster simulation supplemented by an improved estimator, which deals with objects of half-integer topological charge. This method is powerful enough to provide precise results for the model with a theta-term - it is therefore one of the rare examples, where a system with a complex action can be solved numerically. In particular we measure the correlation length, as well as the topological and magnetic susceptibility. We discuss the algorithmic efficiency in view of the critical slowing down. Due to the excellent performance that we observe, it is strongly motivated to work on new applications of meron cluster algorithms in higher dimensions.
Development of clustering algorithms for Compressed Baryonic Matter experiment
Kozlov, G. E.; Ivanov, V. V.; Lebedev, A. A.; Vassiliev, Yu. O.
2015-05-01
A clustering problem for the coordinate detectors in the Compressed Baryonic Matter (CBM) experiment is discussed. Because of the high interaction rate and huge datasets to be dealt with, clustering algorithms are required to be fast and efficient and capable of processing events with high track multiplicity. At present there are two different approaches to the problem. In the first one each fired pad bears information about its charge, while in the second one a pad can or cannot be fired, thus rendering the separation of overlapping clusters a difficult task. To deal with the latter, two different clustering algorithms were developed, integrated into the CBMROOT software environment, and tested with various types of simulated events. Both of them are found to be highly efficient and accurate.
Yannis Marinakis; Magdalene Marinaki; Nikolaos F. Matsatsinis
2007-01-01
This paper introduces a new hybrid algorithmic nature in- spired approach based on the concepts of the Honey Bees Mating Opti- mization Algorithm (HBMO) and of the Greedy Randomized Adaptive Search Procedure (GRASP), for optimally clustering N objects into K clusters. The proposed algorithm for the Clustering Analysis, the Hybrid HBMO-GRASP, is a two phase algorithm which combines a HBMO
A Mountain Clustering Based on Improved PSO Algorithm
Hong-yuan Shen; Xiao-qi Peng; Jun-nian Wang; Zhi-kun Hu
2005-01-01
In order to find most centre of the density of the sample set this paper combines MCA and PSO, and presents a mountain clustering based on improved PSO (MCBIPSO) algorithm. A mountain clustering method constructs a mountain function according to the density of the sample, but it is not easy to find all peaks of the mountain function. The improved
A Dynamic Hierarchical Fuzzy Clustering Algorithm for Information Filtering
Gloria Bordogna; Marco Pagani; Gabriella Pasi
In this contribution we propose a hierarchical fuzzy clustering algorithm for dynamically supporting information filtering. The idea is that document filtering can draw advantages from a dynamic hierarchical fuzzy clustering of the documents into overlapping topic categories corresponding with different levels of granularity of the categorisation. Users can have either general interests or specific ones depending on their profile and
Lennington, R. K.; Johnson, J. K.
1979-01-01
An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
ORCA: The Overdense Red-sequence Cluster Algorithm
Murphy, D. N. A.; Geach, J. E.; Bower, R. G.
2012-03-01
We present a new cluster-detection algorithm designed for the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) survey but with generic application to any multiband data. The method makes no prior assumptions about the properties of clusters other than (i) the similarity in colour of cluster galaxies (the 'red sequence'); and (ii) an enhanced projected surface density. The detector has three main steps: (i) it identifies cluster members by photometrically filtering the input catalogue to isolate galaxies in colour-magnitude space; (ii) a Voronoi diagram identifies regions of high surface density; and (iii) galaxies are grouped into clusters with a Friends-of-Friends technique. Where multiple colours are available, we require systems to exhibit sequences in two colours. In this paper, we present the algorithm and demonstrate it on two data sets. The first is a 7-deg2 sample of the deep Sloan Digital Sky Survey (SDSS) equatorial stripe (Stripe 82), from which we detect 97 clusters with z? 0.6. Benefitting from deeper data, we are 100 per cent complete in the maxBCG optically selected cluster catalogue (based on shallower single-epoch SDSS data) and find an additional 78 previously unidentified clusters. The second data set is a mock Medium Deep Survey Pan-STARRS catalogue, based on the ? cold dark matter (?CDM) model and a semi-analytic galaxy formation recipe. Knowledge of galaxy-halo memberships in the mock catalogue allows for the quantification of algorithm performance. We detect 305 mock clusters in haloes with mass >1013 h-1 M? at z? 0.6 and determine a spurious detection rate of <1 per cent, consistent with tests on the Stripe 82 catalogue. The detector performs well in the recovery of model ?CDM clusters. At the median redshift of the catalogue, the algorithm achieves >75 per cent completeness down to halo masses of 1013.4 h-1 M? and recovers >75 per cent of the total stellar mass of clusters in haloes down to 1013.8 h-1 M?. A companion paper presents the complete cluster catalogue over the full 270-deg2 Stripe 82 catalogue.
Clustering gene expression data with kernel principal components.
Liu, Zhenqiu; Chen, Dechang; Bensmail, Halima; Xu, Ying
2005-04-01
Kernel principal component analysis (KPCA) has been applied to data clustering and graphic cut in the last couple of years. This paper discusses the application of KPCA to microarray data clustering. A new algorithm based on KPCA and fuzzy C-means is proposed. Experiments with microarray data show that the proposed algorithms is in general superior to traditional algorithms. PMID:15852507
A Task-parallel Clustering Algorithm for Structured AMR
Gunney, B N; Wissink, A M
2004-11-02
A new parallel algorithm, based on the Berger-Rigoutsos algorithm for clustering grid points into logically rectangular regions, is presented. The clustering operation is frequently performed in the dynamic gridding steps of structured adaptive mesh refinement (SAMR) calculations. A previous study revealed that although the cost of clustering is generally insignificant for smaller problems run on relatively few processors, the algorithm scaled inefficiently in parallel and its cost grows with problem size. Hence, it can become significant for large scale problems run on very large parallel machines, such as the new BlueGene system (which has {Omicron}(10{sup 4}) processors). We propose a new task-parallel algorithm designed to reduce communication wait times. Performance was assessed using dynamic SAMR re-gridding operations on up to 16K processors of currently available computers at Lawrence Livermore National Laboratory. The new algorithm was shown to be up to an order of magnitude faster than the baseline algorithm and had better scaling trends.
The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey
Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer, Hans; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U., ICG /North Carolina U. /Chicago U., Astron. Astrophys. Ctr. /Chicago U., EFI /Michigan U. /Fermilab /Princeton U. Observ. /Garching, Max Planck Inst., MPE /Pittsburgh U. /Tokyo U., ICRR /Baltimore, Space Telescope Sci. /Penn State U. /Chicago U. /Stavropol, Astrophys. Observ. /Heidelberg, Max Planck Inst. Astron. /INI, SAO
2005-03-01
We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster. However, if we exclude clusters embedded in complex large-scale environments, we find that the velocity dispersion of the remaining clusters is as good an estimator of M{sub 200} as L{sub r}. The final C4 catalog will contain {approx_equal} 2500 clusters using the full SDSS data set and will represent one of the largest and most homogeneous samples of local clusters.
A Combined Clustering and Placement Algorithm for FPGAs
Lemieux, Guy
A Combined Clustering and Placement Algorithm for FPGAs by Mark Yamashita B.A.Sc., The University of reprogrammable microchips, such as field-programmable gate arrays (FPGAs), is an inherent speed disadvantage overRAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.5 Using Logic Duplication to Improve Performance in FPGAs . . . . . . . . 23 3 Combined
Performance Characterization of Clustering Algorithms for Colour Image Segmentation
Whelan, Paul F.
Performance Characterization of Clustering Algorithms for Colour Image Segmentation Dana Elena Ilea, Paul F. Whelan, Ovidiu Ghita Vision Systems Group, School of Electronic Engineering Dublin City to extract the colour information that is used in the image segmentation process. The aim of this paper
Neuromorphic algorithms on clusters of PlayStation 3s
Tarek M. Taha; Pavan Yalamanchili; Mohammad Bhuiyan; Rommel Jalasutram; Chong Chen; Richard Linderman
2010-01-01
There is a significant interest in the research community to develop large scale, high performance implementations of neuromorphic models. These have the potential to provide significantly stronger information processing capabilities than current computing algorithms. In this paper we present the implementation of five neuromorphic models on a 50 TeraFLOPS 336 node Playstation 3 cluster at the Air Force Research Laboratory.
Atomic structure prediction of metal clusters using the evolutionary algorithm
Al-Aqtash, Nabil; Tarawneh, Khaldoun; Sabirianov, Renat
2015-03-01
The evolutionary algorithm coupled with density functional (DFT) method is used to identify the global energy minimum atomic structure of metal clusters. Using evolutionary crystal structure optimization algorithm, as implemented in USPEX, we studied the atomic structure, binding energies, and magnetic properties of 13-atom Cu, Co and Cr clusters. A set of metastable and global minimum atomic structures are identified. Several new lower energy configurations were identified for 13- atom Cu, Co and Cr clusters and previous known atomic structures were confirmed by our calculations. We found that the Cu13 cluster has a distorted hexagonal bilayer (HBL) -like structure, which is composed by two layers as in the ideal HBL structure. The distorted HBL Cu13 is 1.17 eV lower in total energy compared to close-packed icosahedral (ICO) configuration, which reported as the lowest-energy structure for Cu13 in previous studies. Our calculations show that Co13 has an ideal HBL structure and Cr13 cluster has distorted ICO structure, which are consistent with the previous studies. Moreover, our calculations show that Cr13 has another lower energy atomic configuration with 0.003 eV difference form ICO. Cr13 has ferrimagnetic (FIM) interaction which plays an important role in finding the lowest energy structure. We discuss the predictive capabilities of evolutionary algorithms for nanoclusters.
An effective algorithm for mining 3-clusters in vertically partitioned data
Faris Alqadah; Raj Bhatnagar
2008-01-01
Conventional clustering algorithms group similar data points together along one dimension of a data table. Bi-clustering simultaneously clusters both dimensions of a data table. 3-clustering goes one step further and aims to concurrently cluster two data tables that share a common set of row labels, but whose column labels are distinct. Such clusters reveal the underlying connections between the elements
K-region-based Clustering Algorithm for Image Segmentation
Kumar, R.; Arthanariee, A. M.
2013-12-01
In this paper, authors have proposed K-region-based clustering algorithm which is based on performing the clustering techniques in K number of regions of given image of size N × N. The K and N are power of 2 and K < N. The authors have divided the given image into 4 regions, 16 regions, 64 regions, 256 regions, 1024 regions, 4096 regions and 16384 regions based on the value of K. Authors have grouped the adjacent pixels of similar intensity value into same cluster in each region. The clusters of similar values in each adjacent region are grouped together to form the bigger clusters. The authors have obtained the different segmented images based on the K number of regions. These segmented images are useful for image understanding. The authors have been taken four parameters: Probabilistic rand index, variation of information, global consistency error and boundary displacement error. These parameters have used to evaluate and analyze the performance of the K-region-based clustering algorithm.
ORCA: The Overdense Red-sequence Cluster Algorithm
Murphy, D N A; Bower, R G
2011-01-01
We present a new cluster detection algorithm designed for the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) survey but with generic application to any multiband data. The method makes no prior assumptions about the properties of clusters other than (a) the similarity in colour of cluster galaxies (the "red sequence") and (b) an enhanced projected surface density. The detector has three main steps: (i) it identifies cluster members by photometrically filtering the input catalogue to isolate galaxies in colour-magnitude space, (ii) a Voronoi diagram identifies regions of high surface density, (iii) galaxies are grouped into clusters with a Friends-of-Friends technique. Where multiple colours are available, we require systems to exhibit sequences in two colours. In this paper we present the algorithm and demonstrate it on two datasets. The first is a 7 square degree sample of the deep Sloan Digital Sky Survey equatorial stripe (Stripe 82), from which we detect 97 clusters with z10^13 solar ma...
An Efficient Cluster Algorithm for CP(N-1) Models
B. B. Beard; M. Pepe; S. Riederer; U. -J. Wiese
2005-10-06
We construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a new regularization for CP(N-1) models in the framework of D-theory, which is an alternative non-perturbative approach to quantum field theory formulated in terms of discrete quantum variables instead of classical fields. Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard formulation of lattice field theory. In fact, there is even a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. We present various simulations for different correlation lengths, couplings and lattice sizes. We have simulated correlation lengths up to 250 lattice spacings on lattices as large as 640x640 and we detect no evidence for critical slowing down.
INVESTIGATING DISTANCE METRICS IN SEMI-SUPERVISED FUZZY C-MEANS FOR BREAST CANCER CLASSIFICATION
Aickelin, Uwe
that would best fit NTBC data. 1 Introduction The Nottingham Tenovus Breast Cancer (NTBC) dataset has beenINVESTIGATING DISTANCE METRICS IN SEMI-SUPERVISED FUZZY C-MEANS FOR BREAST CANCER CLASSIFICATION, Nottingham UK, jmg@cs.nott.ac.uk Keywords: semi-supervised, fuzzy clustering, fuzzy c-means, breast cancer
Clustering online social network communities using genetic algorithms
Hajeer, Mustafa H; Dasgupta, Dipankar; Sanyal, Sugata
2013-01-01
To analyze the activities in an Online Social network (OSN), we introduce the concept of "Node of Attraction" (NoA) which represents the most active node in a network community. This NoA is identified as the origin/initiator of a post/communication which attracted other nodes and formed a cluster at any point in time. In this research, a genetic algorithm (GA) is used as a data mining method where the main objective is to determine clusters of network communities in a given OSN dataset. This approach is efficient in handling different type of discussion topics in our studied OSN - comments, emails, chat expressions, etc. and can form clusters according to one or more topics. We believe that this work can be useful in finding the source for spread of this GA-based clustering of online interactions and reports some results of experiments with real-world data and demonstrates the performance of proposed approach.
Fast source optimization by clustering algorithm based on lithography properties
Tawada, Masashi; Hashimoto, Takaki; Sakanushi, Keishi; Nojima, Shigeki; Kotani, Toshiya; Yanagisawa, Masao; Togawa, Nozomu
2015-03-01
Lithography is a technology to make circuit patterns on a wafer. UV light diffracted by a photomask forms optical images on a photoresist. Then, a photoresist is melt by an amount of exposed UV light exceeding the threshold. The UV light diffracted by a photomask through lens exposes the photoresist on the wafer. Its lightness and darkness generate patterns on the photoresist. As the technology node advances, the feature sizes on photoresist becomes much smaller. Diffracted UV light is dispersed on the wafer, and then exposing photoresists has become more difficult. Exposure source optimization, SO in short, techniques for optimizing illumination shape have been studied. Although exposure source has hundreds of grid-points, all of previous works deal with them one by one. Then they consume too much running time and that increases design time extremely. How to reduce the parameters to be optimized in SO is the key to decrease source optimization time. In this paper, we propose a variation-resilient and high-speed cluster-based exposure source optimization algorithm. We focus on image log slope (ILS) and use it for generating clusters. When an optical image formed by a source shape has a small ILS value at an EPE (Edge placement error) evaluation point, dose/focus variation much affects the EPE values. When an optical image formed by a source shape has a large ILS value at an evaluation point, dose/focus variation less affects the EPE value. In our algorithm, we cluster several grid-points with similar ILS values and reduce the number of parameters to be simultaneously optimized in SO. Our clustering algorithm is composed of two STEPs: In STEP 1, we cluster grid-points into four groups based on ILS values of grid-points at each evaluation point. In STEP 2, we generate super clusters from the clusters generated in STEP 1. We consider a set of grid-points in each cluster to be a single light source element. As a result, we can optimize the SO problem very fast. Experimental results demonstrate that our algorithm runs speed-up compared to a conventional algorithm with keeping the EPE values.
Median graph shift: A new clustering algorithm for graph domain Salim Jouili, Salvatore Tabbone
Paris-Sud XI, Université de
Median graph shift: a new clustering algorithm for graph domain Salim Jouili, Salvatore Tabbone, a new algorithm for the domain of graphs is introduced. In this paper, the key idea is to adapt the mean-shift clustering and its variants proposed for the domain of feature vectors to graph clustering. These algorithms
Adaptive k-means algorithm for overlapped graph clustering.
Bello-Orgaz, Gema; Menéndez, Héctor D; Camacho, David
2012-10-01
The graph clustering problem has become highly relevant due to the growing interest of several research communities in social networks and their possible applications. Overlapped graph clustering algorithms try to find subsets of nodes that can belong to different clusters. In social network-based applications it is quite usual for a node of the network to belong to different groups, or communities, in the graph. Therefore, algorithms trying to discover, or analyze, the behavior of these networks needed to handle this feature, detecting and identifying the overlapped nodes. This paper shows a soft clustering approach based on a genetic algorithm where a new encoding is designed to achieve two main goals: first, the automatic adaptation of the number of communities that can be detected and second, the definition of several fitness functions that guide the searching process using some measures extracted from graph theory. Finally, our approach has been experimentally tested using the Eurovision contest dataset, a well-known social-based data network, to show how overlapped communities can be found using our method. PMID:22916718
ABCluster: the artificial bee colony algorithm for cluster global optimization.
Zhang, Jun; Dolg, Michael
2015-10-01
Global optimization of cluster geometries is of fundamental importance in chemistry and an interesting problem in applied mathematics. In this work, we introduce a relatively new swarm intelligence algorithm, i.e. the artificial bee colony (ABC) algorithm proposed in 2005, to this field. It is inspired by the foraging behavior of a bee colony, and only three parameters are needed to control it. We applied it to several potential functions of quite different nature, i.e., the Coulomb-Born-Mayer, Lennard-Jones, Morse, Z and Gupta potentials. The benchmarks reveal that for long-ranged potentials the ABC algorithm is very efficient in locating the global minimum, while for short-ranged ones it is sometimes trapped into a local minimum funnel on a potential energy surface of large clusters. We have released an efficient, user-friendly, and free program "ABCluster" to realize the ABC algorithm. It is a black-box program for non-experts as well as experts and might become a useful tool for chemists to study clusters. PMID:26327507
Dynamical linke cluster expansions: Algorithmic aspects and applications
H. Meyer-Ortmanns; T. Reisz
1998-09-15
Dynamical linked cluster expansions are linked cluster expansions with hopping parameter terms endowed with their own dynamics. They amount to a generalization of series expansions from 2-point to point-link-point interactions. We outline an associated multiple-line graph theory involving extended notions of connectivity and indicate an algorithmic implementation of graphs. Fields of applications are SU(N) gauge Higgs systems within variational estimates, spin glasses and partially annealed neural networks. We present results for the critical line in an SU(2) gauge Higgs model for the electroweak phase transition. The results agree well with corresponding high precision Monte Carlo results.
PARAMETER SEARCH FOR AN IMAGE PROCESSING FUZZY C-MEANS HAND GESTURE RECOGNITION SYSTEM
Wachs, Juan
PARAMETER SEARCH FOR AN IMAGE PROCESSING FUZZY C- MEANS HAND GESTURE RECOGNITION SYSTEM Juan Wachs a hand gesture recognition system using an optimized Image Processing-Fuzzy C-Means (FCM) algorithm interest in gesture recognition systems with a number of researchers providing some novel approaches, many
Vetoed jet clustering: the mass-jump algorithm
Stoll, Martin
2015-04-01
A new class of jet clustering algorithms is introduced. A criterion inspired by successful mass-drop taggers is applied that prevents the recombination of two hard prongs if their combined jet mass is substantially larger than the masses of the separate prongs. This "mass jump" veto effectively results in jets with variable radii in dense environments. Differences to existing methods are investigated. It is shown for boosted top quarks that the new algorithm has beneficial properties which can lead to improved tagging purity.
A Clustering Genetic Algorithm for Genomic Data Mining
José Juan Tapia; Enrique Morett; Edgar E. Vallejo
2009-01-01
In this chapter we summarize our work toward developing clustering algorithms based on evolutionary computing and its application to genomic data mining. We have focused on the reconstruction of protein-protein functional interactions from genomic data. The discovery of functional modules of proteins is formulated as an optimization problem in which proteins with similar genomic attributes are grouped together. By considering
On an ensemble algorithm for clustering cancer patient data
2013-01-01
Background The TNM staging system is based on three anatomic prognostic factors: Tumor, Lymph Node and Metastasis. However, cancer is no longer considered an anatomic disease. Therefore, the TNM should be expanded to accommodate new prognostic factors in order to increase the accuracy of estimating cancer patient outcome. The ensemble algorithm for clustering cancer data (EACCD) by Chen et al. reflects an effort to expand the TNM without changing its basic definitions. Though results on using EACCD have been reported, there has been no study on the analysis of the algorithm. In this report, we examine various aspects of EACCD using a large breast cancer patient dataset. We compared the output of EACCD with the corresponding survival curves, investigated the effect of different settings in EACCD, and compared EACCD with alternative clustering approaches. Results Using the basic T and N definitions, EACCD generated a dendrogram that shows a graphic relationship among the survival curves of the breast cancer patients. The dendrograms from EACCD are robust for large values of m (the number of runs in the learning step). When m is large, the dendrograms depend on the linkage functions. The statistical tests, however, employed in the learning step have minimal effect on the dendrogram for large m. In addition, if omitting the step for learning dissimilarity in EACCD, the resulting approaches can have a degraded performance. Furthermore, clustering only based on prognostic factors could generate misleading dendrograms, and direct use of partitioning techniques could lead to misleading assignments to clusters. Conclusions When only the Partitioning Around Medoids (PAM) algorithm is involved in the step of learning dissimilarity, large values of m are required to obtain robust dendrograms, and for a large m EACCD can effectively cluster cancer patient data. PMID:24565417
An improved distance matrix computation algorithm for multicore clusters.
Al-Neama, Mohammed W; Reda, Naglaa M; Ghaleb, Fayed F M
2014-01-01
Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations. DistVect algorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes DistVect1 as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulates DistVect1 vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI. PMID:25013779
Non-equilibrium relaxation analysis in cluster algorithms
Nonomura, Yoshihiko
2014-03-01
In Monte Carlo study of phase transitions, the critical slowing down has been a serious problem. In order to overcome this difficulty, two kinds of approaches have been proposed. One is the cluster algorithms, where global update scheme based on a percolation theory is introduced in order to refrain from the power-law behavior at the critical point. Another is the non-equilibrium relaxation method, where the power-law critical relaxation process is analyzed by the dynamical scaling theory in order to refrain from time-consuming equilibration. Then, the next step is to fuse these two approaches -- to investigate phase transitions with early-stage relaxation process of cluster algorithms. Since the dynamical scaling theory does not hold in cluster algorithms in principle, such attempt had been considered impossible. In the present talk we show that such fusion is actually possible using an empirical scaling form obtained from the 2D Ising models instead of the dynamical scaling theory. Applications to the q >= 3 Potts models, +/- J Ising models etc. will also be explained in the presentation.
Comparison of cluster expansion fitting algorithms for interactions at surfaces
Herder, Laura M.; Bray, Jason M.; Schneider, William F.
2015-10-01
Cluster expansions (CEs) are Ising-type interaction models that are increasingly used to model interaction and ordering phenomena at surfaces, such as the adsorbate-adsorbate interactions that control coverage-dependent adsorption or surface-vacancy interactions that control surface reconstructions. CEs are typically fit to a limited set of data derived from density functional theory (DFT) calculations. The CE fitting process involves iterative selection of DFT data points to include in a fit set and selection of interaction clusters to include in the CE. Here we compare the performance of three CE fitting algorithms-the MIT Ab-initio Phase Stability code (MAPS, the default in ATAT software), a genetic algorithm (GA), and a steepest descent (SD) algorithm-against synthetic data. The synthetic data is encoded in model Hamiltonians of varying complexity motivated by the observed behavior of atomic adsorbates on a face-centered-cubic transition metal close-packed (111) surface. We compare the performance of the leave-one-out cross-validation score against the true fitting error available from knowledge of the hidden CEs. For these systems, SD achieves lowest overall fitting and prediction error independent of the underlying system complexity. SD also most accurately predicts cluster interaction energies without ignoring or introducing extra interactions into the CE. MAPS achieves good results in fewer iterations, while the GA performs least well for these particular problems.
A general state-selective multireference coupled-cluster algorithm
Kallay, Mihaly; Szalay, Peter G.; Surjan, Peter R.
2002-07-01
A state-selective multireference coupled-cluster algorithm is presented which is capable of describing single, double (or higher) excitations from an arbitrary complete model space. One of the active space determinants is chosen as a formal Fermi-vacuum and single, double (or higher) excitations from the other reference functions are considered as higher excitations from this determinant as it has been previously proposed by Oliphant and Adamowicz [J. Chem. Phys. 94, 1229 (1991)]. Coupled-cluster equations are generated in terms of antisymmetrized diagrams and restrictions are imposed on these diagrams to eliminate those cluster amplitudes which carry undesirable number of inactive indices. The corresponding algebraic expressions are factorized and contractions between cluster amplitudes and intermediates are evaluated by our recent string-based algorithm [J. Chem. Phys. 115, 2945 (2001)]. The method can be easily modified to solve multireference configuration interaction problems. Performance of the method is demonstrated by several test calculations on systems which require a multireference description. The problem related to the choice of the Fermi-vacuum has also been investigated.
NIC-based Reduction Algorithms for Large-scale Clusters
Petrini, F; Moody, A T; Fernandez, J; Frachtenberg, E; Panda, D K
2004-07-30
Efficient algorithms for reduction operations across a group of processes are crucial for good performance in many large-scale, parallel scientific applications. While previous algorithms limit processing to the host CPU, we utilize the programmable processors and local memory available on modern cluster network interface cards (NICs) to explore a new dimension in the design of reduction algorithms. In this paper, we present the benefits and challenges, design issues and solutions, analytical models, and experimental evaluations of a family of NIC-based reduction algorithms. Performance and scalability evaluations were conducted on the ASCI Linux Cluster (ALC), a 960-node, 1920-processor machine at Lawrence Livermore National Laboratory, which uses the Quadrics QsNet interconnect. We find NIC-based reductions on modern interconnects to be more efficient than host-based implementations in both scalability and consistency. In particular, at large-scale--1812 processes--NIC-based reductions of small integer and floating-point arrays provided respective speedups of 121% and 39% over the host-based, production-level MPI implementation.
Texture Detect on Rotary-Veneer Surface Based on Semi-Fuzzy Clustering Algorithm
Cheng, Wei; Liang, Ping; Cao, Suqun
The texture of rotary-veneer can interference in defects detection, this paper presented a modified semi-fuzzy clustering (SFC) algorithm. SFC algorithm incorporates Fisher discrimination method with fuzzy theory using fuzzy scatter matrix. By iteratively optimizing the fuzzy Fisher criterion function, the final clustering results are obtained. SFC algorithm exhibits its robustness and capability to obtain well separable clustering results. This algorithm can detect the texture and defects on rotary-veneer surface exactly.
Robust growing neural gas algorithm with application in cluster analysis.
Qin, A K; Suganthan, P N
2004-01-01
We propose a novel robust clustering algorithm within the Growing Neural Gas (GNG) framework, called Robust Growing Neural Gas (RGNG) network.The Matlab codes are available from . By incorporating several robust strategies, such as outlier resistant scheme, adaptive modulation of learning rates and cluster repulsion method into the traditional GNG framework, the proposed RGNG network possesses better robustness properties. The RGNG is insensitive to initialization, input sequence ordering and the presence of outliers. Furthermore, the RGNG network can automatically determine the optimal number of clusters by seeking the extreme value of the Minimum Description Length (MDL) measure during network growing process. The resulting center positions of the optimal number of clusters represented by prototype vectors are close to the actual ones irrespective of the existence of outliers. Topology relationships among these prototypes can also be established. Experimental results have shown the superior performance of our proposed method over the original GNG incorporating MDL method, called GNG-M, in static data clustering tasks on both artificial and UCI data sets. PMID:15555857
A Fast Clustering Algorithm for Data with a Few Labeled Instances
Yang, Jinfeng; Xiao, Yong; Wang, Jiabing; Ma, Qianli; Shen, Yanhua
2015-01-01
The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality. PMID:25861252
MECHANISTIC-BASED GENETIC ALGORITHM SEARCH ON A BEOWULF CLUSTER OF LINUX PCS
Hoffman, Forrest M.
MECHANISTIC-BASED GENETIC ALGORITHM SEARCH ON A BEOWULF CLUSTER OF LINUX PCS Jin-Ping Gwo), Beowulf Linux cluster. ABSTRACT A simple genetic algorithm (SGA) was implemented on a cluster of Linux PCs to environmental researchers and engineers. The Beowulf computer was built out of surplus personal computers at Oak
Qureshi, Kalim; Rashid, Haroon
In this paper, we present the performance analysis of two parallel Fast Fourier Transform algorithms, binary-exchange and transpose algorithms. These two algorithms were implemented and tested on a cluster of PCs using Message Passing Interface (MPI). The binary-exchange algorithm implementation was showing less processing and communication time than transpose algorithm.
Mammographic images segmentation based on chaotic map clustering algorithm
2014-01-01
Background This work investigates the applicability of a novel clustering approach to the segmentation of mammographic digital images. The chaotic map clustering algorithm is used to group together similar subsets of image pixels resulting in a medically meaningful partition of the mammography. Methods The image is divided into pixels subsets characterized by a set of conveniently chosen features and each of the corresponding points in the feature space is associated to a map. A mutual coupling strength between the maps depending on the associated distance between feature space points is subsequently introduced. On the system of maps, the simulated evolution through chaotic dynamics leads to its natural partitioning, which corresponds to a particular segmentation scheme of the initial mammographic image. Results The system provides a high recognition rate for small mass lesions (about 94% correctly segmented inside the breast) and the reproduction of the shape of regions with denser micro-calcifications in about 2/3 of the cases, while being less effective on identification of larger mass lesions. Conclusions We can summarize our analysis by asserting that due to the particularities of the mammographic images, the chaotic map clustering algorithm should not be used as the sole method of segmentation. It is rather the joint use of this method along with other segmentation techniques that could be successfully used for increasing the segmentation performance and for providing extra information for the subsequent analysis stages such as the classification of the segmented ROI. PMID:24666766
A new detection algorithm for microcalcification clusters in mammographic screening
Xie, Weiying; Ma, Yide; Li, Yunsong
2015-05-01
A novel approach for microcalcification clusters detection is proposed. At the first time, we make a short analysis of mammographic images with microcalcification lesions to confirm these lesions have much greater gray values than normal regions. After summarizing the specific feature of microcalcification clusters in mammographic screening, we make more focus on preprocessing step including eliminating the background, image enhancement and eliminating the pectoral muscle. In detail, Chan-Vese Model is used for eliminating background. Then, we do the application of combining morphology method and edge detection method. After the AND operation and Sobel filter, we use Hough Transform, it can be seen that the result have outperformed for eliminating the pectoral muscle which is approximately the gray of microcalcification. Additionally, the enhancement step is achieved by morphology. We make effort on mammographic image preprocessing to achieve lower computational complexity. As well known, it is difficult to robustly achieve mammograms analysis due to low contrast between normal and lesion tissues, there are also much noise in such images. After a serious preprocessing algorithm, a method based on blob detection is performed to microcalcification clusters according their specific features. The proposed algorithm has employed Laplace operator to improve Difference of Gaussians (DoG) function in terms of low contrast images. A preliminary evaluation of the proposed method performs on a known public database namely MIAS, rather than synthetic images. The comparison experiments and Cohen's kappa coefficients all demonstrate that our proposed approach can potentially obtain better microcalcification clusters detection results in terms of accuracy, sensitivity and specificity.
A Simple Alternative to Jet-Clustering Algorithms
Howard Georgi
2014-08-31
I describe a class of iterative jet algorithms that are based on maximizing a fixed function of the total 4-momentum rather than clustering of pairs of jets. I describe some of the properties of the simplest examples of this class, appropriate for jets at an $e^+e^-$ machine. These examples are sufficiently simple that many features of the jets that they define can be determined analytically with ease. The jets constructed in this way have some potentially useful properties, including a strong form of infrared safety.
PENS: An Algorithm for Density-Based Clustering in Peer-to-Peer Systems , Guanling Lee
Lee, Wang-Chien
PENS: An Algorithm for Density-Based Clustering in Peer-to-Peer Systems Mei Li , Guanling Lee, called Peer dENsity-based cluStering (PENS), which overcomes the challenge raised in performing clustering in peer-to-peer environments, i.e., cluster assembly. The main idea of PENS is hierarchical
Gatos, Ilias; Tsantis, Stavros; Skouroliakou, Aikaterini; Theotokas, Ioannis; Zoumpoulis, Pavlos S.; Kagadis, George C.
2015-09-01
The aim of the present study is to determine an optimal elasticity cut-off value for discriminating Healthy from Pathological fibrotic patients by means of Fuzzy C-Means automatic segmentation and maximum participation cluster mean value employment in Shear Wave Elastography (SWE) images. The clinical dataset comprised 32 subjects (16 Healthy and 16 histological or Fibroscan verified Chronic Liver Disease). An experienced Radiologist performed SWE measurement placing a region of interest (ROI) on each subject's right liver lobe providing a SWE image for each patient. Subsequently Fuzzy C-Means clustering was performed on every SWE image utilizing 5 clusters. Mean Stiffness value and pixels number of each cluster were calculated. The mean stiffness value feature of the cluster with maximum pixels number was then fed as input for ROC analysis. The selected Mean Stiffness value feature an Area Under the Curve (AUC) of 0.8633 with Optimum Cut-off value of 7.5 kPa with sensitivity and specificity values of 0.8438 and 0.875 and balanced accuracy of 0.8594. Examiner's classification measurements exhibited sensitivity, specificity and balanced accuracy value of 0.8125 with 7.1 kPa cutoff value. A new promising automatic algorithm was implemented with more objective criteria of defining optimum elasticity cut-off values for discriminating fibrosis stages for SWE. More subjects are needed in order to define if this algorithm is an objective tool to outperform manual ROI selection.
Thermodynamic Casimir effect in films: the exchange cluster algorithm.
Hasenbusch, Martin
2015-02-01
We study the thermodynamic Casimir force for films with various types of boundary conditions and the bulk universality class of the three-dimensional Ising model. To this end, we perform Monte Carlo simulations of the improved Blume-Capel model on the simple cubic lattice. In particular, we employ the exchange or geometric cluster cluster algorithm [Heringa and Blöte, Phys. Rev. E 57, 4976 (1998)]. In a previous work, we demonstrated that this algorithm allows us to compute the thermodynamic Casimir force for the plate-sphere geometry efficiently. It turns out that also for the film geometry a substantial reduction of the statistical error can achieved. Concerning physics, we focus on (O,O) boundary conditions, where O denotes the ordinary surface transition. These are implemented by free boundary conditions on both sides of the film. Films with such boundary conditions undergo a phase transition in the universality class of the two-dimensional Ising model. We determine the inverse transition temperature for a large range of thicknesses L(0) of the film and study the scaling of this temperature with L(0). In the neighborhood of the transition, the thermodynamic Casimir force is affected by finite size effects, where finite size refers to a finite transversal extension L of the film. We demonstrate that these finite size effects can be computed by using the universal finite size scaling function of the free energy of the two-dimensional Ising model. PMID:25768461
Thermodynamic Casimir effect in films: The exchange cluster algorithm
Hasenbusch, Martin
2015-02-01
We study the thermodynamic Casimir force for films with various types of boundary conditions and the bulk universality class of the three-dimensional Ising model. To this end, we perform Monte Carlo simulations of the improved Blume-Capel model on the simple cubic lattice. In particular, we employ the exchange or geometric cluster cluster algorithm [Heringa and Blöte, Phys. Rev. E 57, 4976 (1998), 10.1103/PhysRevE.57.4976]. In a previous work, we demonstrated that this algorithm allows us to compute the thermodynamic Casimir force for the plate-sphere geometry efficiently. It turns out that also for the film geometry a substantial reduction of the statistical error can achieved. Concerning physics, we focus on (O ,O ) boundary conditions, where O denotes the ordinary surface transition. These are implemented by free boundary conditions on both sides of the film. Films with such boundary conditions undergo a phase transition in the universality class of the two-dimensional Ising model. We determine the inverse transition temperature for a large range of thicknesses L0 of the film and study the scaling of this temperature with L0. In the neighborhood of the transition, the thermodynamic Casimir force is affected by finite size effects, where finite size refers to a finite transversal extension L of the film. We demonstrate that these finite size effects can be computed by using the universal finite size scaling function of the free energy of the two-dimensional Ising model.
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Zhexue Huang
1998-01-01
The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses
Inconsistent Denoising and Clustering Algorithms for Amplicon Sequence Data.
Koskinen, Kaisa; Auvinen, Petri; Björkroth, K Johanna; Hultman, Jenni
2015-08-01
Natural microbial communities have been studied for decades using the 16S rRNA gene as a marker. In recent years, the application of second-generation sequencing technologies has revolutionized our understanding of the structure and function of microbial communities in complex environments. Using these highly parallel techniques, a detailed description of community characteristics are constructed, and even the rare biosphere can be detected. The new approaches carry numerous advantages and lack many features that skewed the results using traditional techniques, but we are still facing serious bias, and the lack of reliable comparability of produced results. Here, we contrasted publicly available amplicon sequence data analysis algorithms by using two different data sets, one with defined clone-based structure, and one with food spoilage community with well-studied communities. We aimed to assess which software and parameters produce results that resemble the benchmark community best, how large differences can be detected between methods, and whether these differences are statistically significant. The results suggest that commonly accepted denoising and clustering methods used in different combinations produce significantly different outcome: clustering method impacts greatly on the number of operational taxonomic units (OTUs) and denoising algorithm influences more on taxonomic affiliations. The magnitude of the OTU number difference was up to 40-fold and the disparity between results seemed highly dependent on the community structure and diversity. Statistically significant differences in taxonomies between methods were seen even at phylum level. However, the application of effective denoising method seemed to even out the differences produced by clustering. PMID:25525895
Performance Evaluation for Clustering Algorithms in Object-Oriented Database Systems
Darmont, Jérôme; Gourgand, Michel
1995-01-01
It is widely acknowledged that good object clustering is critical to the performance of object-oriented databases. However, object clustering always involves some kind of overhead for the system. The aim of this paper is to propose a modelling methodology in order to evaluate the performances of different clustering policies. This methodology has been used to compare the performances of three clustering algorithms found in the literature (Cactis, CK and ORION) that we considered representative of the current research in the field of object clustering. The actual performance evaluation was performed using simulation. Simulation experiments we performed showed that the Cactis algorithm is better than the ORION algorithm and that the CK algorithm totally outperforms both other algorithms in terms of response time and clustering overhead.
Textural defect detect using a revised ant colony clustering algorithm
Zou, Chao; Xiao, Li; Wang, Bingwen
2007-11-01
We propose a totally novel method based on a revised ant colony clustering algorithm (ACCA) to explore the topic of textural defect detection. In this algorithm, our efforts are mainly made on the definition of local irregularity measurement and the implementation of the revised ACCA. The local irregular measurement defined evaluates the local textural inconsistency of each pixel against their mini-environment. In our revised ACCA, the behaviors of each ant are divided into two steps: release pheromone and act. The quantity of pheromone released is proportional to the irregularity measurement; the actions of the ants to act next are chosen independently of each other in a stochastic way according to some evaluated heuristic knowledge. The independency of ants implies the inherent parallel computation architecture of this algorithm. We apply the proposed method in some typical textural images with defects. From the series of pheromone distribution map (PDM), it can be clearly seen that the pheromone distribution approaches the textual defects gradually. By some post-processing, the final distribution of pheromone can demonstrate the shape and area of the defects well.
Doostparast Torshizi, Abolfazl; Fazel Zarandi, Mohammad Hossein
2015-09-01
This paper considers microarray gene expression data clustering using a novel two stage meta-heuristic algorithm based on the concept of ?-planes in general type-2 fuzzy sets. The main aim of this research is to present a powerful data clustering approach capable of dealing with highly uncertain environments. In this regard, first, a new objective function using ?-planes for general type-2 fuzzy c-means clustering algorithm is represented. Then, based on the philosophy of the meta-heuristic optimization framework 'Simulated Annealing', a two stage optimization algorithm is proposed. The first stage of the proposed approach is devoted to the annealing process accompanied by its proposed perturbation mechanisms. After termination of the first stage, its output is inserted to the second stage where it is checked with other possible local optima through a heuristic algorithm. The output of this stage is then re-entered to the first stage until no better solution is obtained. The proposed approach has been evaluated using several synthesized datasets and three microarray gene expression datasets. Extensive experiments demonstrate the capabilities of the proposed approach compared with some of the state-of-the-art techniques in the literature. PMID:25035233
Clustering PPI Data Based on Bacteria Foraging Optimization Algorithm Xiujuan Lei*1
Clustering PPI Data Based on Bacteria Foraging Optimization Algorithm Xiujuan Lei*1 Shuang Wu2@buffalo.edu * Corresponding author Abstract--This paper proposed a novel method using Bacteria Foraging Optimization, but also automatically determined the cluster number. Keywords-bacteria foraging optimization algorithm
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
Salman, Raied
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
Parallel Glowworm Swarm Optimization Clustering Algorithm based on MapReduce
Ludwig, Simone
Parallel Glowworm Swarm Optimization Clustering Algorithm based on MapReduce Nailah Al data sizes. In this paper, a scalable design and implementation of glowworm swarm optimization swarm optimization to formulate the clustering algorithm. Glowworm swarm optimization is used to take
A Memetic Clustering Algorithm for the Functional Partition of Genes Based on the Gene Ontology
Zell, Andreas
A Memetic Clustering Algorithm for the Functional Partition of Genes Based on the Gene Ontology. During the analysis of such data the need of a functional grouping of genes arises. In this paper, we propose a new clustering algorithm for the partition of genes or gene products according to their known
Security clustering algorithm based on reputation in hierarchical peer-to-peer network
Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji
For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.
A Formal Algorithm for Verifying the Validity of Clustering Results Based on Model Checking
The limitations in general methods to evaluate clustering will remain difficult to overcome if verifying the clustering validity continues to be based on clustering results and evaluation index values. This study focuses on a clustering process to analyze crisp clustering validity. First, we define the properties that must be satisfied by valid clustering processes and model clustering processes based on program graphs and transition systems. We then recast the analysis of clustering validity as the problem of verifying whether the model of clustering processes satisfies the specified properties with model checking. That is, we try to build a bridge between clustering and model checking. Experiments on several datasets indicate the effectiveness and suitability of our algorithms. Compared with traditional evaluation indices, our formal method can not only indicate whether the clustering results are valid but, in the case the results are invalid, can also detect the objects that have led to the invalidity. PMID:24608823
Identifying prototypical components in behaviour using clustering algorithms.
Quantitative analysis of animal behaviour is a requirement to understand the task solving strategies of animals and the underlying control mechanisms. The identification of repeatedly occurring behavioural components is thereby a key element of a structured quantitative description. However, the complexity of most behaviours makes the identification of such behavioural components a challenging problem. We propose an automatic and objective approach for determining and evaluating prototypical behavioural components. Behavioural prototypes are identified using clustering algorithms and finally evaluated with respect to their ability to represent the whole behavioural data set. The prototypes allow for a meaningful segmentation of behavioural sequences. We applied our clustering approach to identify prototypical movements of the head of blowflies during cruising flight. The results confirm the previously established saccadic gaze strategy by the set of prototypes being divided into either predominantly translational or rotational movements, respectively. The prototypes reveal additional details about the saccadic and intersaccadic flight sections that could not be unravelled so far. Successful application of the proposed approach to behavioural data shows its ability to automatically identify prototypical behavioural components within a large and noisy database and to evaluate these with respect to their quality and stability. Hence, this approach might be applied to a broad range of behavioural and neural data obtained from different animals and in different contexts. PMID:20179763
ROUTING ALGORITHMS FOR FPGAS WITH SPARSE INTRA-CLUSTER ROUTING CROSSBARS
Lemieux, Guy
ROUTING ALGORITHMS FOR FPGAS WITH SPARSE INTRA-CLUSTER ROUTING CROSSBARS Yehdhih Ould Mohammed of British Columbia lemieux@ece.ubc.ca ABSTRACT Modern FPGAs employ sparse crossbars in their intra- cluster routing heuristics for FPGAs with sparse intra-cluster routing crossbars: SElective RRG Expansion (SERRGE
An Event-Driven Algorithm for Fractal Cluster S. Gonzalez, A. R. Thornton, S. Luding
Luding, Stefan
") that aggregate into clusters. In the next section we explain the algorithm that we use and how it is related dimensional gas: particles move freely until they collide and "stick" together irreversibly. These clusters of clusters in a 2D gas with periodic Preprint submitted to Computer Physics Communications June 20, 2010 #12
Morales-Esteban, Antonio; Martínez-Álvarez, Francisco; Scitovski, Sanja; Scitovski, Rudolf
2014-12-01
In this paper we construct an efficient adaptive Mahalanobis k-means algorithm. In addition, we propose a new efficient algorithm to search for a globally optimal partition obtained by using the adoptive Mahalanobis distance-like function. The algorithm is a generalization of the previously proposed incremental algorithm (Scitovski and Scitovski, 2013). It successively finds optimal partitions with k = 2 , 3 , … clusters. Therefore, it can also be used for the estimation of the most appropriate number of clusters in a partition by using various validity indexes. The algorithm has been applied to the seismic catalogues of Croatia and the Iberian Peninsula. Both regions are characterized by a moderate seismic activity. One of the main advantages of the algorithm is its ability to discover not only circular but also elliptical shapes, whose geometry fits the faults better. Three seismogenic zonings are proposed for Croatia and two for the Iberian Peninsula and adjacent areas, according to the clusters discovered by the algorithm.
Exact Algorithms and Experiments for Hierarchical Tree Clustering Jiong Guo Sepp Hartung Christian) with provable performance guarantees and a simple search tree algorithm. These are used to find optimal solu- tions. Our experiments with synthetic and biological data show the effectiveness of our algorithms
Pass-Efficient Algorithms for Clustering Kevin Li-Chiang Chang
Abstract Pass-Efficient Algorithms for Clustering Kevin Li-Chiang Chang 2006 The proliferation in storage. Thus, in the pass-efficient model of compu- tation, an algorithm may make a constant number to be optimized are memory, number of passes, and per element processing time. We give pass-efficient algorithms
Clustering performance comparison using K-means and expectation maximization algorithms
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-01-01
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K-means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K-means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results. PMID:26019610
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect. PMID:25435862
A fast kernel-based multilevel algorithm for graph clustering
Inderjit S. Dhillon; Yuqiang Guan; Brian Kulis
2005-01-01
Graph clustering (also called graph partitioning) --- clustering the nodes of a graph --- is an important problem in diverse data mining applications. Traditional approaches involve optimization of graph clustering objectives such as normalized cut or ratio association; spectral methods are widely used for these objectives, but they require eigenvector computation which can be slow. Recently, graph clustering with a
A highly efficient multi-core algorithm for clustering extremely large datasets
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Deb, Suash; Yang, Xin-She
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Background Gene clustering algorithms are massively used by biologists when analysing omics data. Classical gene clustering strategies are based on the use of expression data only, directly as in Heatmaps, or indirectly as in clustering based on coexpression networks for instance. However, the classical strategies may not be sufficient to bring out all potential relationships amongst genes. Results We propose a new unsupervised gene clustering algorithm based on the integration of external biological knowledge, such as Gene Ontology annotations, into expression data. We introduce a new distance between genes which consists in integrating biological knowledge into the analysis of expression data. Therefore, two genes are close if they have both similar expression profiles and similar functional profiles at once. Then a classical algorithm (e.g. K-means) is used to obtain gene clusters. In addition, we propose an automatic evaluation procedure of gene clusters. This procedure is based on two indicators which measure the global coexpression and biological homogeneity of gene clusters. They are associated with hypothesis testing which allows to complement each indicator with a p-value. Our clustering algorithm is compared to the Heatmap clustering and the clustering based on gene coexpression network, both on simulated and real data. In both cases, it outperforms the other methodologies as it provides the highest proportion of significantly coexpressed and biologically homogeneous gene clusters, which are good candidates for interpretation. Conclusion Our new clustering algorithm provides a higher proportion of good candidates for interpretation. Therefore, we expect the interpretation of these clusters to help biologists to formulate new hypothesis on the relationships amongst genes. PMID:23387364
Cluster diversity and entropy on the percolation model: The lattice animal identification algorithm
Tsang, I. J.; Tsang, I. R.; Dyck, D. Van
2000-11-01
We present an algorithm to identify and count different lattice animals (LA's) in the site-percolation model. This algorithm allows a definition of clusters based on the distinction of cluster shapes, in contrast with the well-known Hoshen-Kopelman algorithm, in which the clusters are differentiated by their sizes. It consists in coding each unit cell of a cluster according to the nearest neighbors (NN) and ordering the codes in a proper sequence. In this manner, a LA is represented by a specific code sequence. In addition, with some modification the algorithm is capable of differentiating between fixed and free LA's. The enhanced Hoshen-Kopelman algorithm [J. Hoshen, M. W. Berry, and K. S. Minser, Phys. Rev. E 56, 1455 (1997)] is used to compose the set of NN code sequences of each cluster. Using Monte Carlo simulations on planar square lattices up to 2000×2000, we apply this algorithm to the percolation model. We calculate the cluster diversity and cluster entropy of the system, which leads to the determination of probabilities associated with the maximum of these functions. We show that these critical probabilities are associated with the percolation transition and with the complexity of the system.
An adaptive spatial clustering algorithm based on the minimum spanning tree-like
NASA Astrophysics Data System (ADS)
Deng, Min; Liu, Qiliang; Li, Guangqiang; Cheng, Tao
2009-10-01
Spatial clustering is an important means for spatial data mining and spatial analysis, and it can be used to discover the potential rules and outliers among the spatial data. Most existing spatial clustering methods cannot deal with the uneven density of the data and usually require predefined parameters which are hard to justify. In order to overcome such limitations, we firstly propose the concept of edge variation factor based upon the definition of distance variation among the entities in the spatial neighborhood. Then, an approach is presented to construct the minimum spanning tree-like (MST-L). Further, an adaptive MST-L based spatial clustering algorithm (AMSTLSC) is developed in this paper. The spatial clustering algorithm only involves the setting of the threshold of edge variation factor as an input parameter, which is easily made with the support of little priori information. Through this parameter, a series of MST-L can be automatically generated from the high-density region to the low-density one, where each MST-L represents a cluster. As a result, the algorithm proposed in this paper can adapt to the change of local density among spatial points. This property is also called the adaptiveness. Finally, two tests are implemented to demonstrate that the AMSTLSC algorithm is very robust and suitable to find the clusters with different shapes. Especially the algorithm has good adaptiveness. A comparative test is made to further prove the AMSTLSC algorithm better than classic DBSCAN algorithm.
2010-01-01
Segmentation is an important step in many medical imaging applications and a variety of image segmentation techniques do exist. Of them, a group of segmentation algorithms is based on the clustering concepts. In our research, we have intended to devise efficient variants of Fuzzy C-Means (FCM) clustering towards effective segmentation of medical images. The enhanced variants of FCM clustering are
Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William
2006-01-01
We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.
A novel harmony search-K means hybrid algorithm for clustering gene expression data
2013-01-01
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms. PMID:23390351
A novel harmony search-K means hybrid algorithm for clustering gene expression data.
Nazeer, Ka Abdul; Sebastian, Mp; Kumar, Sd Madhu
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms. PMID:23390351
, artificial intelligence, pattern recognition), life and medical sciences (genetics, microbiologyQuality Quantity and Repellent Scent Aware Artificial Bee Colony Algorithm for Clustering Unekwu attributes has found application in many areas such as computer sciences and engineering (security, web
Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation
Pal, Sankar Kumar
Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation Pradipta Maji of brain MR images. The RFCM algorithm comprises a judicious integration of the of rough sets, fuzzy sets with vagueness and incompleteness in class definition of brain MR images, the membership function of fuzzy sets
Zhang, Xian-Kun; Tian, Xue; Li, Ya-Nan; Song, Chen
2014-08-01
The label propagation algorithm (LPA) is a graph-based semi-supervised learning algorithm, which can predict the information of unlabeled nodes by a few of labeled nodes. It is a community detection method in the field of complex networks. This algorithm is easy to implement with low complexity and the effect is remarkable. It is widely applied in various fields. However, the randomness of the label propagation leads to the poor robustness of the algorithm, and the classification result is unstable. This paper proposes a LPA based on edge clustering coefficient. The node in the network selects a neighbor node whose edge clustering coefficient is the highest to update the label of node rather than a random neighbor node, so that we can effectively restrain the random spread of the label. The experimental results show that the LPA based on edge clustering coefficient has made improvement in the stability and accuracy of the algorithm.
2000-04-21
Clusters of galaxies are the most massive objects in the Universe and mapping their location is an important astronomical problem. This paper describes an algorithm (based on statistical signal processing methods), a software architecture (based on a hybrid layered approach) and a parallelization scheme (based on a client/server model) for finding clusters of galaxies in large astronomical databases. The Adaptive Matched Filter (AMF) algorithm presented here identifies clusters by finding the peaks in a cluster likelihood map generated by convolving a galaxy survey with a filter based on a cluster model and a background model. The method has proved successful in identifying clusters in real and simulated data. The implementation is flexible and readily executed in parallel on a network of workstations.
Fuzzy dynamic clustering algorithm Sankar K. PAL and Sushmita M ITRA
the feature space has ill-defined regions. The membership function in IR" involves the density of patterns gorithm can extract overlapping initial clusters (boundaries) when the feature space has ill-defined. The effectiveness of the algorithm is demonstrated on the speech recognition problem. Key words: Fuzzy clustering
A Robust Clustering Algorithm for Mobile Ad Hoc Networks Zhaowen Xing
battlefield applications, natural disaster recovery situations where the communication infrastructures have extra devices. This often leads to a higher rate of re-clustering. This chapter presents a robust a lower cluster head change rate and re-affiliation rate than other existing algorithms. INTRODUCTION
Evaluation and Comparison of Clustering Algorithms in Analyzing ES Cell Gene Expression Data
. DNA microarray technology has proved to be a fundamental tool in studying gene expression #12;2 Abstract Many clustering algorithms have been used to analyze microarray gene expression data of meaningful biological information from microarray expression data. Keywords cluster analysis; gene expression
A new clustering algorithm applicable to multispectral and polarimetric SAR images
Wong, Yiu-Fai; Posner, Edward C.
1993-01-01
We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.
A Fast General-Purpose Clustering Algorithm Based on FPGAs for High-Throughput Data Processing
2009-10-14
We present a fast general-purpose algorithm for high-throughput clustering of data "with a two dimensional organization". The algorithm is designed to be implemented with FPGAs or custom electronics. The key feature is a processing time that scales linearly with the amount of data to be processed. This means that clustering can be performed in pipeline with the readout, without suffering from combinatorial delays due to looping multiple times through all the data. This feature makes this algorithm especially well suited for problems where the data has high density, e.g. in the case of tracking devices working under high-luminosity condition such as those of LHC or Super-LHC. The algorithm is organized in two steps: the first step (core) clusters the data; the second step analyzes each cluster of data to extract the desired information. The current algorithm is developed as a clustering device for modern high-energy physics pixel detectors. However, the algorithm has much broader field of applications. In fact, its core does not specifically rely on the kind of data or detector it is working for, while the second step can and should be tailored for a given application. Applications can thus be foreseen to other detectors and other scientific fields ranging from HEP calorimeters to medical imaging. An additional advantage of this two steps approach is that the typical clustering related calculations (second step) are separated from the combinatorial complications of clustering. This separation simplifies the design of the second step and it enables it to perform sophisticated calculations achieving online-quality in online applications. The algorithm is general purpose in the sense that only minimal assumptions on the kind of clustering to be performed are made.
2004-01-01
Organizing Web search results into a hierarchy of topics and sub-topics facilitates browsing the collection and locating results of interest. In this paper, we propose a new hierarchical monothetic clustering algorithm to build a topic hierarchy for a collection of search results retrieved in response to a query. At every level of the hierarchy, the new algorithm progressively identifies topics
A Novel Clustering Algorithm Based on a Modified Model of Random Walk
2008-01-01
We introduce a modified model of random walk, and then develop two novel clustering algorithms based on it. In the algorithms, each data point in a dataset is considered as a particle which can move at random in space according to the preset rules in the modified model. Further, this data point may be also viewed as a local control
A vector reconstruction based clustering algorithm particularly for large-scale text collection.
Liu, Ming; Wu, Chong; Chen, Lei
2015-03-01
Along with the fast evolvement of internet technology, internet users have to face the large amount of textual data every day. Apparently, organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection, which mainly attributes to the high-dimensional vector space and semantic similarity among texts. To effectively and efficiently cluster large-scale text collection, this paper puts forward a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster's representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature's weight is fine-tuned by iterative process similar to self-organizing-mapping (SOM) algorithm. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster's representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high-quality performances on both small-scale and large-scale text collections. PMID:25539500
Chinese Text Clustering Algorithm Based k-means
Yao, Mingyu; Pi, Dechang; Cong, Xiangxiang
Text clustering is an important means and method in text mining. The process of Chinese text clustering based on k-means was emphasized, we found that new center of a cluster was easily effected by isolated text after some experiments. Average similarity of one cluster was used as a parameter, and multiplied it with a modulus between 0.75 and 1.25 to get the similarity threshold value, the texts whose similarity with original cluster center was greater than or equal to the threshold value ware collected as a candidate collection, then updated the cluster center with center of candidate collection. The experiments show that improved method averagely increased purity and F value about 10 percent over the original method.
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
Ju, Chunhua
2013-01-01
Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525
A randomized algorithm for two-cluster partition of a set of vectors
Kel'manov, A. V.; Khandeev, V. I.
2015-02-01
A randomized algorithm is substantiated for the strongly NP-hard problem of partitioning a finite set of vectors of Euclidean space into two clusters of given sizes according to the minimum-of-the sum-of-squared-distances criterion. It is assumed that the centroid of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The centroid of the other cluster is fixed at the origin. For an established parameter value, the algorithm finds an approximate solution of the problem in time that is linear in the space dimension and the input size of the problem for given values of the relative error and failure probability. The conditions are established under which the algorithm is asymptotically exact and runs in time that is linear in the space dimension and quadratic in the input size of the problem.
An improved clustering algorithm of tunnel monitoring data for cloud computing.
2014-01-01
With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
1996-02-01
An algorithm for detection and identification of image clusters or {open_quotes}blobs{close_quotes} based on color information for an autonomous mobile robot is developed. The input image data are first processed using a crisp color fuszzyfier, a binary smoothing filter, and a median filter. The processed image data is then inputed to the image clusters detection and identification program. The program employed the concept of {open_quotes}elastic rectangle{close_quotes}that stretches in such a way that the whole blob is finally enclosed in a rectangle. A C-program is develop to test the algorithm. The algorithm is tested only on image data of 8x8 sizes with different number of blobs in them. The algorithm works very in detecting and identifying image clusters.
Fast Algorithms for Projected Clustering Charu C. Aggarwal Cecilia Procopiuc
)@watson.ibm.com Abstract The clustering problem is well known in the database literature for its numerous. 1 Introduction The clustering problem has been discussed extensively in the database literature and classification. Various methods have been studied in considerable detail by both the statistics and database
A Fast Clustering Algorithm with Application to Cosmology
are the connected components of a level set Sc {f > c} where f is the probability density function. We use kernel the Fast Fourier Transform (FFT) to speed up the cal- culations. We show the cosmological definition of clusters of galaxies is equivalent to density contour clusters and present an application in cosmology. Key
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
?-ray DBSCAN: A clustering algorithm applied to Fermi-LAT ?-ray data
Tramacere, A.; Vecchio, C.
2012-12-01
The Density Based Spatial Clustering of Applications with Noise (DBSCAN) is a topometric algorithm used to cluster spatial data that are affected by background noise. For the first time, we propose the use of this method for the detection of sources in ?-ray astrophysical images obtained from the Fermi-LAT data, where each point corresponds to the arrival direction of a photon. We investigate the detection performance of the ?-ray DBSCAN in terms of detection efficiency and rejection of spurious clusters. We find that the ?-ray DBSCAN can be successfully used in the detection of clusters in ?-ray data. The significance returned by our algorithm is strongly correlated with that provided by the Maximum Likelihood analysis with standard Fermi-LAT software, and can be used to safely remove spurious clusters.
Efficient cluster Monte Carlo algorithm for Ising spin glasses in more than two space dimensions
NASA Astrophysics Data System (ADS)
Ochoa, Andrew J.; Zhu, Zheng; Katzgraber, Helmut G.
2015-03-01
A cluster algorithm that speeds up slow dynamics in simulations of nonplanar Ising spin glasses away from criticality is urgently needed. In theory, the cluster algorithm proposed by Houdayer poses no advantage over local moves in systems with a percolation threshold below 50%, such as cubic lattices. However, we show that the frustration present in Ising spin glasses prevents the growth of system-spanning clusters at temperatures roughly below the characteristic energy scale J of the problem. Adding Houdayer cluster moves to simulations of Ising spin glasses for T ~ J produces a speedup that grows with the system size over conventional local moves. We show results for the nonplanar quasi-two-dimensional Chimera graph of the D-Wave Two quantum annealer, as well as conventional three-dimensional Ising spin glasses, where in both cases the addition of cluster moves speeds up thermalization visibly in the physically-interesting low temperature regime.
Two evolutionary algorithms optimize clusters and automate feature selection in multispectral images
Burgin, George H.; Kagey, H. Price; Jafolla, James C.
2007-09-01
Evolutionary computation can increase the speed and accuracy of pattern recognition in multispectral images, for example, in automatic target tracking. The first method treats the clustering process. It determines a cluster of pixels around specified reference pixels so that the entire cluster is increasingly representative of the search object. An initial population (of clusters) evolves into populations of new clusters, with each cluster having an assigned fitness score. This population undergoes iterative mutation and selection. Mutation operators alter both the pixel cluster set cardinality and composition. Several stopping criteria can be applied to terminate the evolution. An advantage of this evolutionary cluster formulation is that the resulting cluster may have an arbitrary shape so that it most nearly fits the search pattern. The second algorithm automates the selection of features (the center-frequency and the bandwidth) for each population member. For each pixel in the image and for each population member, the Mahalanobis distance to the reference set is calculated and a decision is made whether or not this pixel belongs to a target. The sum of correct and false decisions defines a Receiver Operating Curve, which is used to measure the fitness of a population member. Based on this fitness, the algorithm decides which population members to use as parents for the next iteration.
A Community Detection Algorithm Based on Topology Potential and Spectral Clustering
Wang, Zhixiao; Chen, Zhaotong; Zhao, Ya; Chen, Shaoda
2014-01-01
Community detection is of great value for complex networks in understanding their inherent law and predicting their behavior. Spectral clustering algorithms have been successfully applied in community detection. This kind of methods has two inadequacies: one is that the input matrixes they used cannot provide sufficient structural information for community detection and the other is that they cannot necessarily derive the proper community number from the ladder distribution of eigenvector elements. In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with nodes' topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes. Experiments results showed that the new algorithm gave excellent performance on artificial networks and real world networks and outperforms other community detection methods. PMID:25147846
Accelerating Bayesian Hierarchical Clustering of Time Series Data with a Randomised Algorithm
Darkins, Robert; Cooke, Emma J.; Ghahramani, Zoubin; Kirk, Paul D. W.; Wild, David L.; Savage, Richard S.
2013-01-01
We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/. PMID:23565168
A multi-level spatial clustering algorithm for detection of disease outbreaks.
Que, Jialan; Tsui, Fu-Chiang
2008-01-01
In this paper, we proposed a Multi-level Spatial Clustering (MSC) algorithm for rapid detection of emerging disease outbreaks prospectively. We used the semi-synthetic data for algorithm evaluation. We applied BARD algorithm [1] to generate outbreak counts for simulation of aerosol release of Anthrax. We compared MSC with two spatial clustering algorithms: Kulldorff's spatial scan statistic [2] and Bayesian spatial scan statistic [3]. The evaluation results showed that the areas under ROC had no significant difference among the three algorithms, so did the areas under AMOC. MSC demonstrated significant computational efficiency (100 + times faster) and higher PPV. However, MSC showed 2-6 hours delay on average for outbreak detection when the false alarm rate was lower than 1 false alarm per 4 weeks. We concluded that the MSC algorithm is computationally efficient and it is able to provide more precise and compact clusters in a timely manner while keeping high detection accuracy (cluster sensitivity) and low false alarm rates. PMID:18999304
A Multi-level Spatial Clustering Algorithm for Detection of Disease Outbreaks
Que, Jialan; Tsui, Fu-Chiang
2008-01-01
In this paper, we proposed a Multi-level Spatial Clustering (MSC) algorithm for rapid detection of emerging disease outbreaks prospectively. We used the semi-synthetic data for algorithm evaluation. We applied BARD algorithm[1] to generate outbreak counts for simulation of aerosol release of Anthrax. We compared MSC with two spatial clustering algorithms: Kulldorff’s spatial scan statistic[2] and Bayesian spatial scan statistic[3]. The evaluation results showed that the areas under ROC had no significant difference among the three algorithms, so did the areas under AMOC. MSC demonstrated significant computational effciency (100+ times faster) and higher PPV. However, MSC showed 2–6 hours delay on average for outbreak detection when the false alarm rate was lower than 1 false alarm per 4 weeks. We concluded that the MSC algorithm is computationally efficient and it is able to provide more precise and compact clusters in a timely manner while keeping high detection accuracy (cluster sensitivity) and low false alarm rates. PMID:18999304
Simulation of DNA damage clustering after proton irradiation using an adapted DBSCAN algorithm.
Francis, Ziad; Villagrasa, Carmen; Clairand, Isabelle
2011-03-01
In this work the "Density Based Spatial Clustering of Applications with Noise" (DBSCAN) algorithm was adapted to early stage DNA damage clustering calculations. The resulting algorithm takes into account the distribution of energy deposit induced by ionising particles and a damage probability function that depends on the total energy deposit amount. Proton track simulations were carried out in small micrometric volumes representing small DNA containments. The algorithm was used to determine the damage concentration clusters and thus to deduce the DSB/SSB ratios created by protons between 500keV and 50MeV. The obtained results are compared to other calculations and to available experimental data of fibroblast and plasmid cells irradiations, both extracted from literature. PMID:21232812
A Clustering Based Niching Method for Evolutionary Algorithms
Zell, Andreas
://www-ra.informatik.uni-tuebingen.de 2 Institute of Formal Methods in Computer Science (FMI), University of Stuttgart, Breitwiesenstr. 20 generations as default settings. We compared these algorithms the Multinational GA (MN- GA) on four real
Cluster-Based Solidification and Growth Algorithm for Decagonal Quasicrystals
Kuczera, P.; Steurer, W.
2015-08-01
A novel approach is used for the simulation of decagonal quasicrystal (DQC) solidification and growth. It is based on the observation that in well-ordered DQCs the atoms are largely arranged along quasiperiodically spaced planes parallel to the tenfold axis, running throughout the whole structure in five different directions. The structures themselves can be described as quasiperiodic arrangements of decagonal columnar clusters (cluster covering) that partially overlap in a systematic way. Based on these findings, we define a cluster interaction model within the mean field approximation, with effectively asymmetric interactions ranging beyond the nearest neighbors. In our Monte Carlo simulations, this leads to a long-range ordered quasiperiodic ground state. Indications of two finite-temperature unlocking phase transitions are observed, and are related to the two fundamental length scales that are characteristic for the system.
Discovery of rules in urban public facility distribution based on DBSCAN clustering algorithm
Li, Xinyan; Li, Deren
2007-11-01
Recently Spatial Data Mining (SDM) has been recognized as a powerful technology that can complement traditional GIS to facilitate urban planning and management since it can be used to discover interesting, implicit knowledge from spatial database. DBSCAN spatial clustering algorithm as a SDM method is able to effectively discover clusters of arbitrary shape in large database with noise points. In this paper we applied this algorithm to detect distribution patterns of urban public facilities in a developed city, including primary school, high school and commercial facilities. Both qualitative and quantitative analysis were carried out to investigate how to determine optimal values of input parameters for DBSCAN algorithm, and the distribution patterns of public facilities were assessed against urban planning design standard using the algorithm.
2011-01-01
Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms. PMID:22163905
CLAGen: a tool for clustering and annotating gene sequences using a suffix tree algorithm.
Han, Sang il; Lee, Sung Gun; Kim, Kyung-Hoon; Choi, Chung Jung; Kim, Young Han; Hwang, Kyu Suk
2006-06-01
Most multiple gene sequence alignment methods rely on conventions regarding the score of a multiple alignment in pairwise fashion. Therefore, as the number of sequences increases, the runtime of sequencing expands exponentially. In order to solve the problem, this paper presents a multiple sequence alignment method using a linear-time suffix tree algorithm to cluster similar sequences at one time without pairwise alignment. After searching for common subsequences, cross-matching common subsequences were generated, and sometimes inexact matching was found. So, a procedure aimed at masking the inexact cross-matching pairs was suggested here. In addition, BLAST was combined with a clustering tool in order to annotate the clusters generated by suffix tree clustering. The proposed method for clustering and annotating genes consists of the following steps: (1) construction of a suffix tree; (2) searching and overlapping common subsequences; (3) grouping subsequence pairs; (4) masking cross-matching pairs; (5) clustering gene sequences; (6) annotating gene clusters by the BLAST search. The performance of the proposed system, CLAGen, was successfully evaluated with 42 gene sequences in a TCA cycle (a citrate cycle) of bacteria. The system generated 11 clusters and found the longest subsequences of each cluster, which are biologically significant. PMID:16384634
Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal
2008-01-01
Motivation: UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. Application: We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. Results: We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. Availability: A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request. Contact: lonshy@cs.huji.ac.il PMID:18586742
Experimental Realization of the Deutsch-Jozsa Algorithm with a Six-Qubit Cluster State
Giuseppe Vallone; Gaia Donati; Natalia Bruno; Andrea Chiuri; Paolo Mataloni
2010-03-24
We describe the first experimental realization of the Deutsch-Jozsa quantum algorithm to evaluate the properties of a 2-bit boolean function in the framework of one-way quantum computation. For this purpose a novel two-photon six-qubit cluster state was engineered. Its peculiar topological structure is the basis of the original measurement pattern allowing the algorithm realization. The good agreement of the experimental results with the theoretical predictions, obtained at $\\sim$1kHz success rate, demonstrate the correct implementation of the algorithm.
Experimental realization of the Deutsch-Jozsa algorithm with a six-qubit cluster state
2010-05-15
We describe an experimental realization of the Deutsch-Jozsa quantum algorithm to evaluate the properties of a two-bit Boolean function in the framework of one-way quantum computation. For this purpose, a two-photon six-qubit cluster state was engineered. Its peculiar topological structure is the basis of the original measurement pattern allowing the algorithm realization. The good agreement of the experimental results with the theoretical predictions, obtained at {approx}1 kHz success rate, demonstrates the correct implementation of the algorithm.
Clustering Markov States into Equivalence Classes using SVD and Heuristic Search Algorithms
Smyth, Padhraic
Clustering Markov States into Equivalence Classes using SVD and Heuristic Search Algorithms This paper investigates the problem of find- ing a K-state first-order Markov chain that approximates an M-state first-order Markov chain, where K is typically much smaller than M. A variety of greedy heuristic search
Clustering Markov States into Equivalence Classes using SVD and Heuristic Search Algorithms
Smyth, Padhraic
2010-09-15
We show that fundamental versions of the Deutsch-Jozsa and Bernstein-Vazirani quantum algorithms can be performed using a small entangled cluster state resource of only six qubits. We then investigate the minimal resource states needed to demonstrate general n-qubit versions and a scalable method to produce them. For this purpose, we propose a versatile photonic on-chip setup.
Implementing a systolic algorithm for QR factorization on multicore clusters with PaRSEC
Dongarra, Jack
An Algorithm for Clustering cDNA Fingerprints Erez Hartuv,* Armin O. Schmitt,,1
- thetic oligonucleotides to arrayed cDNAs yields a fin- gerprint for each cDNA clone. Cluster analysis of these fingerprints can identify clones corresponding to the same gene. We have developed a novel algorithm of their genes. Out of about 100,000 different human genes, the number of genes active in a human cell at any
An ideal seed non-hierarchical clustering algorithm for cellular manufacturing
1986-01-01
This paper describes the development of a non-heuristic algorithm for solving group techology problems. The problem is first formulated as a bipartite graph, and then an expression for the upper limit to the number of groups is derived. Using this limit, a non-hierarchical clustering method is adopted for grouping components into families and machines into cells. After diagonally correlating the
Color Image Segmentation Using a Spatial K-Means Clustering Algorithm
Color Image Segmentation Using a Spatial K-Means Clustering Algorithm Dana Elena Ilea and Paul F. Whelan Vision Systems Group School of Electronic Engineering Dublin City University Dublin 9, Ireland produces accurate segmentation results only when applied to images defined by homogenous regions
Techniques for Mapping Synthetic Aperture Radar Processing Algorithms to Multi-GPU Clusters
Techniques for Mapping Synthetic Aperture Radar Processing Algorithms to Multi-GPU Clusters Eric.Seetharaman@rl.af.mil Abstract - This paper presents a design for parallel processing of synthetic aperture radar (SAR) data-frequency electromagnetic radiation for object detection in a synthetic aperture radar (SAR) configuration comprises a key
A Study of Clustering Algorithms and Validity for Lossy Image Set Compression
A Study of Clustering Algorithms and Validity for Lossy Image Set Compression Anthony Schmieder1 image set compression al- gorithm (HMSTa) has recently been proposed for lossy compression of image sets, the compression performance depends on the qual- ity of the partition. In this paper, we examine a number of well
Conceptual Clustering Using Lingo Algorithm: Evaluation on Open Directory Project Data
2004-01-01
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of high-quality narrow- topic document references and mix them into several
An effective trust-based recommendation method using a novel graph clustering algorithm
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
A Clustering Algorithm-for Chinese Adjectives and Nouns 1 Yang Wen~,Chunfa Yuan~, Changning Huang2
A Clustering Algorithm-for Chinese Adjectives and Nouns 1 Yang Wen~,Chunfa Yuan~, Changning Huang2 described in Ji et al. (1996)[1], Ji (1997)[2]. The objective of their work is to obtain the clusters the question of clustering the nouns and adjectives simultaneously. Li's work shows that they optimize
of homogenous subsets of data. Algorithms such as k-means, Gaussian mixture models, hierarchical clustering experimentally gives state-of- the-art results similar to spectral clustering for non-convex clusters, and has from instabilities, either because they are cast as non- convex optimization problems, or because
BMI optimization by using parallel UNDX real-coded genetic algorithm with Beowulf cluster
Handa, Masaya; Kawanishi, Michihiro; Kanki, Hiroshi
2007-12-01
This paper deals with the global optimization algorithm of the Bilinear Matrix Inequalities (BMIs) based on the Unimodal Normal Distribution Crossover (UNDX) GA. First, analyzing the structure of the BMIs, the existence of the typical difficult structures is confirmed. Then, in order to improve the performance of algorithm, based on results of the problem structures analysis and consideration of BMIs characteristic properties, we proposed the algorithm using primary search direction with relaxed Linear Matrix Inequality (LMI) convex estimation. Moreover, in these algorithms, we propose two types of evaluation methods for GA individuals based on LMI calculation considering BMI characteristic properties more. In addition, in order to reduce computational time, we proposed parallelization of RCGA algorithm, Master-Worker paradigm with cluster computing technique.
Identifying Stable Breast Cancer Subgroups Using Semi-supervised Fuzzy c-means on a Reduced Panel clinically-useful and stable breast cancer subgroups using a reduced panel of biomarkers. First, we on clustering of breast cancer data. The stability of the subgroups found are assessed based on comparison
A priori data-driven multi-clustered reservoir generation algorithm for echo state network.
Li, Xiumin; Zhong, Ling; Xue, Fangzheng; Zhang, Anguo
2015-01-01
Echo state networks (ESNs) with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision. PMID:25875296
A clustering algorithm for sample data based on environmental pollution characteristics
Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun
2015-04-01
Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.
K-Boost: a scalable algorithm for high-quality clustering of microarray gene expression data.
Geraci, Filippo; Leoncini, Mauro; Montangero, Manuela; Pellegrini, Marco; Renda, M Elena
2009-06-01
Microarray technology for profiling gene expression levels is a popular tool in modern biological research. Applications range from tissue classification to the detection of metabolic networks, from drug discovery to time-critical personalized medicine. Given the increase in size and complexity of the data sets produced, their analysis is becoming problematic in terms of time/quality trade-offs. Clustering genes with similar expression profiles is a key initial step for subsequent manipulations and the increasing volumes of data to be analyzed requires methods that are at the same time efficient (completing an analysis in minutes rather than hours) and effective (identifying significant clusters with high biological correlations). In this paper, we propose K-Boost, a clustering algorithm based on a combination of the furthest-point-first (FPF) heuristic for solving the metric k-center problem, a stability-based method for determining the number of clusters, and a k-means-like cluster refinement. K-Boost runs in O (|N| x k) time, where N is the input matrix and k is the number of proposed clusters. Experiments show that this low complexity is usually coupled with a very good quality of the computed clusterings, which we measure using both internal and external criteria. Supporting data can be found as online Supplementary Material at www.liebertonline.com. PMID:19522668
An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering
is described and applied to DARPA's Topic De- tection and Tracking phase 2 (TDT2) data. This model, based combination of features to the TDT2 test data, obtaining partitions of the docu- ments that compare favorably with the results obtained by par- ticipants in the official TDT2 competition. 1 Introduction Clustering plays
An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering
A genetic algorithmic approach to antenna null-steering using a cluster computer.
Recine, Greg; Cui, Hong-Liang
2001-06-01
We apply a genetic algorithm (GA) to the problem of electronically steering the maximums and nulls of an antenna array to desired positions (null toward enemy listener/jammer, max toward friendly listener/transmitter). The antenna pattern itself is computed using NEC2 which is called by the main GA program. Since a GA naturally lends itself to parallelization, this simulation was applied to our new twin 64-node cluster computers (Gemini). Design issues and uses of the Gemini cluster in our group are also discussed.
KD-tree based clustering algorithm for fast face recognition on large-scale data
Wang, Yuanyuan; Lin, Yaping; Yang, Junfeng
2015-07-01
This paper proposes an acceleration method for large-scale face recognition system. When dealing with a large-scale database, face recognition is time-consuming. In order to tackle this problem, we employ the k-means clustering algorithm to classify face data. Specifically, the data in each cluster are stored in the form of the kd-tree, and face feature matching is conducted with the kd-tree based nearest neighborhood search. Experiments on CAS-PEAL and self-collected database show the effectiveness of our proposed method.
2013-09-23
Suggestions for improving the Basin-Hopping Monte Carlo (BHMC) algorithm for unbiased global optimization of clusters and nanoparticles are presented. The traditional basin-hopping exploration scheme with Monte Carlo sampling is improved by bringing together novel strategies and techniques employed in different global optimization methods, however, with the care of keeping the underlying algorithm of BHMC unchanged. The improvements include a total of eleven local and nonlocal trial operators tailored for clusters and nanoparticles that allow an efficient exploration of the potential energy surface, two different strategies (static and dynamic) of operator selection, and a filter operator to handle unphysical solutions. In order to assess the efficiency of our strategies, we applied our implementation to several classes of systems, including Lennard-Jones and Sutton-Chen clusters with up to 147 and 148 atoms, respectively, a set of Lennard-Jones nanoparticles with sizes ranging from 200 to 1500 atoms, binary Lennard-Jones clusters with up to 100 atoms, (AgPd)55 alloy clusters described by the Sutton-Chen potential, and aluminum clusters with up to 30 atoms described within the density functional theory framework. Using unbiased global search our implementation was able to reproduce successfully the great majority of all published results for the systems considered and in many cases with more efficiency than the standard BHMC. We were also able to locate previously unknown global minimum structures for some of the systems considered. This revised BHMC method is a valuable tool for aiding theoretical investigations leading to a better understanding of atomic structures of clusters and nanoparticles. PMID:23957311
A fast hierarchical clustering algorithm for large-scale protein sequence data sets.
Szilágyi, Sándor M; Szilágyi, László
2014-05-01
TRIBE-MCL is a Markov clustering algorithm that operates on a graph built from pairwise similarity information of the input data. Edge weights stored in the stochastic similarity matrix are alternately fed to the two main operations, inflation and expansion, and are normalized in each main loop to maintain the probabilistic constraint. In this paper we propose an efficient implementation of the TRIBE-MCL clustering algorithm, suitable for fast and accurate grouping of protein sequences. A modified sparse matrix structure is introduced that can efficiently handle most operations of the main loop. Taking advantage of the symmetry of the similarity matrix, a fast matrix squaring formula is also introduced to facilitate the time consuming expansion. The proposed algorithm was tested on protein sequence databases like SCOP95. In terms of efficiency, the proposed solution improves execution speed by two orders of magnitude, compared to recently published efficient solutions, reducing the total runtime well below 1min in the case of the 11,944proteins of SCOP95. This improvement in computation time is reached without losing anything from the partition quality. Convergence is generally reached in approximately 50 iterations. The efficient execution enabled us to perform a thorough evaluation of classification results and to formulate recommendations regarding the choice of the algorithm?s parameter values. PMID:24657908
Dynamic connectivity algorithms for Monte Carlo simulations of the random-cluster model
Elçi, Eren Metin
2013-01-01
We review Sweeny's algorithm for Monte Carlo simulations of the random cluster model. Straightforward implementations suffer from the problem of computational critical slowing down, where the computational effort per edge operation scales with a power of the system size. By using a tailored dynamic connectivity algorithm we are able to perform all operations with a poly-logarithmic computational effort. This approach is shown to be efficient in keeping online connectivity information and is of use for a number of applications also beyond cluster-update simulations, for instance in monitoring droplet shape transitions. As the handling of the relevant data structures is non-trivial, we provide a Python module with a full implementation for future reference.
Dynamic connectivity algorithms for Monte Carlo simulations of the random-cluster model
Metin Elçi, Eren; Weigel, Martin
2014-05-01
Karimi, Abbas; Afsharfarnia, Abbas; Zarafshan, Faraneh; Al-Haddad, S. A. R.
2014-01-01
The stability of clusters is a serious issue in mobile ad hoc networks. Low stability of clusters may lead to rapid failure of clusters, high energy consumption for reclustering, and decrease in the overall network stability in mobile ad hoc network. In order to improve the stability of clusters, weight-based clustering algorithms are utilized. However, these algorithms only use limited features of the nodes. Thus, they decrease the weight accuracy in determining node's competency and lead to incorrect selection of cluster heads. A new weight-based algorithm presented in this paper not only determines node's weight using its own features, but also considers the direct effect of feature of adjacent nodes. It determines the weight of virtual links between nodes and the effect of the weights on determining node's final weight. By using this strategy, the highest weight is assigned to the best choices for being the cluster heads and the accuracy of nodes selection increases. The performance of new algorithm is analyzed by using computer simulation. The results show that produced clusters have longer lifetime and higher stability. Mathematical simulation shows that this algorithm has high availability in case of failure. PMID:25114965
Parallelizing a multi-frame blind deconvolution algorithm on clusters of multicore processors
Richard Linderman; Scott Spetka; Susan Emeny; Dennis Fitzgerald
The parallelization strategy of the Physically-Constrained Iterative Deconvolution (PCID) algorithm is being altered and optimized to enhance performance on emerging multi-core architectures. This paper reports results from porting PCID to multi-core architectures including the JAWS supercomputer at the Maui HPC Center (60 TFLOPS of dual-dual Xeonreg nodes) and the Cell Cluster at AFRL in Rome, NY (52 TFLOPS of Playstation
Algorithms for Internal Validation Clustering Measures in the Post Genomic Era
Utro, Filippo
Inferring cluster structure in microarray datasets is a fundamental task for the -omic sciences. A fundamental question in Statistics, Data Analysis and Classification, is the prediction of the number of clusters in a dataset, usually established via internal validation measures. Despite the wealth of internal measures available in the literature, new ones have been recently proposed, some of them specifically for microarray data. In this dissertation, a study of internal validation measures is given, paying particular attention to the stability based ones. Indeed, this class of measures is particularly prominent and promising in order to have a reliable estimate the number of clusters in a dataset. For those measures, a new general algorithmic paradigm is proposed here that highlights the richness of measures in this class and accounts for the ones already available in the literature. Moreover, some of the most representative validation measures are also considered. Experiments on 12 benchmark datasets are p...
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
Evaluation of particle clustering algorithms in the prediction of brownout dust clouds
Govindarajan, Bharath Madapusi
2011-07-01
A study of three Lagrangian particle clustering methods has been conducted with application to the problem of predicting brownout dust clouds that develop when rotorcraft land over surfaces covered with loose sediment. A significant impediment in performing such particle modeling simulations is the extremely large number of particles needed to obtain dust clouds of acceptable fidelity. Computing the motion of each and every individual sediment particle in a dust cloud (which can reach into tens of billions per cubic meter) is computationally prohibitive. The reported work involved the development of computationally efficient clustering algorithms that can be applied to the simulation of dilute gas-particle suspensions at low Reynolds numbers of the relative particle motion. The Gaussian distribution, k-means and Osiptsov's clustering methods were studied in detail to highlight the nuances of each method for a prototypical flow field that mimics the highly unsteady, two-phase vortical particle flow obtained when rotorcraft encounter brownout conditions. It is shown that although clustering algorithms can be problem dependent and have bounds of applicability, they offer the potential to significantly reduce computational costs while retaining the overall accuracy of a brownout dust cloud solution.
A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets
Zhang, Yipu; Wang, Ping
2015-01-01
New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, ?d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, ?d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. PMID:26236718
A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets.
Zhang, Yipu; Wang, Ping
2015-01-01
New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, ?d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, ?d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. PMID:26236718
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
Parallel OSEM Reconstruction Algorithm for Fully 3-D SPECT on a Beowulf Cluster.
Rong, Zhou; Tianyu, Ma; Yongjie, Jin
2005-01-01
In order to improve the computation speed of ordered subset expectation maximization (OSEM) algorithm for fully 3-D single photon emission computed tomography (SPECT) reconstruction, an experimental beowulf-type cluster was built and several parallel reconstruction schemes were described. We implemented a single-program-multiple-data (SPMD) parallel 3-D OSEM reconstruction algorithm based on message passing interface (MPI) and tested it with combinations of different number of calculating processors and different size of voxel grid in reconstruction (64×64×64 and 128×128×128). Performance of parallelization was evaluated in terms of the speedup factor and parallel efficiency. This parallel implementation methodology is expected to be helpful to make fully 3-D OSEM algorithms more feasible in clinical SPECT studies. PMID:17282575
Clustering gene expression data using a diffraction?inspired framework
2012-01-01
Background The recent developments in microarray technology has allowed for the simultaneous measurement of gene expression levels. The large amount of captured data challenges conventional statistical tools for analysing and finding inherent correlations between genes and samples. The unsupervised clustering approach is often used, resulting in the development of a wide variety of algorithms. Typical clustering algorithms require selecting certain parameters to operate, for instance the number of expected clusters, as well as defining a similarity measure to quantify the distance between data points. The diffraction?based clustering algorithm however is designed to overcome this necessity for user?defined parameters, as it is able to automatically search the data for any underlying structure. Methods The diffraction?based clustering algorithm presented in this paper is tested using five well?known expression datasets pertaining to cancerous tissue samples. The clustering results are then compared to those results obtained from conventional algorithms such as the k?means, fuzzy c?means, self?organising map, hierarchical clustering algorithm, Gaussian mixture model and density?based spatial clustering of applications with noise (DBSCAN). The performance of each algorithm is measured using an average external criterion and an average validity index. Results The diffraction?based clustering algorithm is shown to be independent of the number of clusters as the algorithm searches the feature space and requires no form of parameter selection. The results show that the diffraction?based clustering algorithm performs significantly better on the real biological datasets compared to the other existing algorithms. Conclusion The results of the diffraction?based clustering algorithm presented in this paper suggest that the method can provide researchers with a new tool for successfully analysing microarray data. PMID:23164195
A new concept of wildland-urban interface based on city clustering algorithm
NASA Astrophysics Data System (ADS)
Kanevski, M.; Champendal, A.; Vega Orozco, C.; Tonini, M.; Conedera, M.
2012-04-01
Wildland-Urban-Interface (WUI) is a widely used term in the context of wild and forest fires to indicate areas where human infrastructures interact with wildland/forest areas. Many complex problems are associated to the WUI; but the most relevant ones are those related to forest fire hazard and management in dense populated areas where fire regime is dominated by anthropogenic-induced ignition fires. This coexistence enhances both anthropogenic-ignition sources and flammable fuels. Furthermore, the growing trend of the WUI and global change effects may even worsening the situation in the near future. Therefore, many studies are dedicated to the WUI problem, focusing on refinement of its definition, development of mapping methods, implementation of measures into specific fire management plans and the validation of the proposed approaches. The present study introduces a new concept of WUI based on city clustering algorithm (CCA) introduced in Rosenfeld et al., 2008. CCA was proposed as an automatic tool for studying the definition of cities and their distribution. The algorithm uses demographic data - either on a regular or non-regular grid in space - where a city (urban zone) is detected as a cluster of connected populated cells with maximal size. In the present study the CCA is proposed as a tool to develop a new concept of population dynamic analysis crucial to define and to localise WUI. The real case study is based on demographic/census data - organised in a regular grid with a resolution of 100 m and the forest fire ignition points database from canton Ticino, Switzerland. By changing spatial scales of demographic cells the relationships between urban zones (demographic clusters) and forest fire events were statistically analyzed. Corresponding scaling laws were used to understand the interaction between urban zones and forest fires. The first results are good and indicate that the method can be applied to define WUI in an innovative way. Keywords: forest fires, wild-land-user interface, city clustering algorithms.
Cloud classification from satellite data using a fuzzy sets algorithm - A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1989-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine like areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
Cloud classification from satellite data using a fuzzy sets algorithm: A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1988-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
Khan, Maleq
networks in a time- and space-efficient manner. We were able to count triangles in a graph with 2 billions ofParallel Algorithms for Counting Triangles and Computing Clustering Coefficients S M Arifuzzaman 24061 Parallel Algorithm for Triangle Counting With P processors, the graph is partitioned into P
McCaffrey, James D.; Dierking, Howard
This study investigates the use of a biologically inspired meta-heuristic algorithm to extract rule sets from clustered categorical data. A computer program which implemented the algorithm was executed against six benchmark data sets and successfully discovered the underlying generation rules in all cases. Compared to existing approaches, the simulated bee colony (SBC) algorithm used in this study has the advantage of allowing full customization of the characteristics of the extracted rule set, and allowing arbitrarily large data sets to be analyzed. The primary disadvantages of the SBC algorithm for rule set extraction are that the approach requires a relatively large number of input parameters, and that the approach does not guarantee convergence to an optimal solution. The results demonstrate that an SBC algorithm for rule set extraction of clustered categorical data is feasible, and suggest that the approach may have the ability to outperform existing algorithms in certain scenarios.
Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm
NASA Technical Reports Server (NTRS)
Mitra, Sunanda; Pemmaraju, Surya
1992-01-01
Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.
2009-01-01
Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
An improved scheduling algorithm for 3D cluster rendering with platform LSF
Xu, Wenli; Zhu, Yi; Zhang, Liping
2013-10-01
High-quality photorealistic rendering of 3D modeling needs powerful computing systems. On this demand highly efficient management of cluster resources develops fast to exert advantages. This paper is absorbed in the aim of how to improve the efficiency of 3D rendering tasks in cluster. It focuses research on a dynamic feedback load balance (DFLB) algorithm, the work principle of load sharing facility (LSF) and optimization of external scheduler plug-in. The algorithm can be applied into match and allocation phase of a scheduling cycle. Candidate hosts is prepared in sequence in match phase. And the scheduler makes allocation decisions for each job in allocation phase. With the dynamic mechanism, new weight is assigned to each candidate host for rearrangement. The most suitable one will be dispatched for rendering. A new plugin module of this algorithm has been designed and integrated into the internal scheduler. Simulation experiments demonstrate the ability of improved plugin module is superior to the default one for rendering tasks. It can help avoid load imbalance among servers, increase system throughput and improve system utilization.
2014-01-01
Background Accurate genotype calling is a pre-requisite of a successful Genome-Wide Association Study (GWAS). Although most genotyping algorithms can achieve an accuracy rate greater than 99% for genotyping DNA samples without copy number alterations (CNAs), almost all of these algorithms are not designed for genotyping tumor samples that are known to have large regions of CNAs. Results This study aims to develop a statistical method that can accurately genotype tumor samples with CNAs. The proposed method adds a Bayesian layer to a cluster regression model and is termed a Bayesian Cluster Regression-based genotyping algorithm (BCRgt). We demonstrate that high concordance rates with HapMap calls can be achieved without using reference/training samples, when CNAs do not exist. By adding a training step, we have obtained higher genotyping concordance rates, without requiring large sample sizes. When CNAs exist in the samples, accuracy can be dramatically improved in regions with DNA copy loss and slightly improved in regions with copy number gain, comparing with the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM). Conclusions In conclusion, we have demonstrated that BCRgt can provide accurate genotyping calls for tumor samples with CNAs. PMID:24629125
MSClust: A Multi-Seeds based Clustering algorithm for microbiome profiling using 16S rRNA sequence.
Chen, Wei; Cheng, Yongmei; Zhang, Clarence; Zhang, Shaowu; Zhao, Hongyu
2013-09-01
Recent developments of next generation sequencing technologies have led to rapid accumulation of 16S rRNA sequences for microbiome profiling. One key step in data processing is to cluster short sequences into operational taxonomic units (OTUs). Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency, where inference accuracy is often sacrificed to accommodate the need to analyze large numbers of sequences. Inspired by the hierarchical clustering method and a modified greedy network clustering algorithm, we propose a novel multi-seeds based heuristic clustering method, named MSClust, for OTU inference. MSClust first adaptively selects multi-seeds instead of one seed for each candidate cluster, and the reads are then processed using a greedy clustering strategy. Through many numerical examples, we demonstrate that MSClust enjoys less memory usage, and better biological accuracy compared to existing heuristic clustering methods while preserving efficiency and scalability. PMID:23899776
Schuetter, Jared Michael
Excavating cairns in southern Arabia is a way for anthropologists to understand which factors led ancient settlers to transition from a pastoral lifestyle and tribal narrative to the formation of states that exist today. Locating these monuments has traditionally been done in the field, relying on eyewitness reports and costly searches through the arid landscape. In this thesis, an algorithm for automatically detecting cairns in satellite imagery is presented. The algorithm uses a set of filters in a window based approach to eliminate background pixels and other objects that do not look like cairns. The resulting set of detected objects constitutes fewer than 0.001% of the pixels in the satellite image, and contains the objects that look the most like cairns in imagery. When a training set of cairns is available, a further reduction of this set of objects can take place, along with a likelihood-based ranking system. To aid in cairn detection, the satellite image is also clustered to determine land-form classes that tend to be consistent with the presence of cairns. Due to the large number of pixels in the image, a subsample spectral clustering algorithm called "Multiple Sample Data Spectroscopic clustering" is used. This multiple sample clustering procedure is motivated by perturbation studies on single sample spectral algorithms. The studies, presented in this thesis, show that sampling variability in the single sample approach can cause an unsatisfactory level of instability in clustering results. The multiple sample data spectroscopic clustering algorithm is intended to stabilize this perturbation by combining information from different samples. While sampling variability is still present, the use of multiple samples mitigates its effect on cluster results. Finally, a step-through of the cairn detection algorithm and satellite image clustering are given for an image in the Hadramawt region of Yemen. The top ranked detected objects are presented, and a discussion of parameter selection and future work follows.
Mutwil, Marek; Usadel, Björn; Schütte, Moritz; Loraine, Ann; Ebenhöh, Oliver; Persson, Staffan
2010-01-01
A vital quest in biology is comprehensible visualization and interpretation of correlation relationships on a genome scale. Such relationships may be represented in the form of networks, which usually require disassembly into smaller manageable units, or clusters, to facilitate interpretation. Several graph-clustering algorithms that may be used to visualize biological networks are available. However, only some of these support weighted edges, and none provides good control of cluster sizes, which is crucial for comprehensible visualization of large networks. We constructed an interactive coexpression network for the Arabidopsis (Arabidopsis thaliana) genome using a novel Heuristic Cluster Chiseling Algorithm (HCCA) that supports weighted edges and that may control average cluster sizes. Comparative clustering analyses demonstrated that the HCCA performed as well as, or better than, the commonly used Markov, MCODE, and k-means clustering algorithms. We mapped MapMan ontology terms onto coexpressed node vicinities of the network, which revealed transcriptional organization of previously unrelated cellular processes. We further explored the predictive power of this network through mutant analyses and identified six new genes that are essential to plant growth. We show that the HCCA-partitioned network constitutes an ideal “cartographic” platform for visualization of correlation networks. This approach rapidly provides network partitions with relative uniform cluster sizes on a genome-scale level and may thus be used for correlation network layouts also for other species. PMID:19889879
Applying Social Networking and Clustering Algorithms to Galaxy Groups in ALFALFA
Bramson, Ali; Wilcots, E. M.
2012-01-01
Because most galaxies live in groups, and the environment in which it resides affects the evolution of a galaxy, it is crucial to develop tools to understand how galaxies are distributed within groups. At the same time we must understand how groups are distributed and connected in the larger scale structure of the Universe. I have applied a variety of networking techniques to assess the substructure of galaxy groups, including distance matrices, agglomerative hierarchical clustering algorithms and dendrograms. We use distance matrices to locate groupings spatially in 3-D. Dendrograms created from agglomerative hierarchical clustering results allow us to quantify connections between galaxies and galaxy groups. The shape of the dendrogram reveals if the group is spatially homogenous or clumpy. These techniques are giving us new insight into the structure and dynamical state of galaxy groups and large scale structure. We specifically apply these techniques to the ALFALFA survey of the Coma-Abell 1367 supercluster and its resident galaxy groups.
CLUSTAG & WCLUSTAG: Hierarchical Clustering Algorithms for Efficient Tag-SNP Selection
Ao, Sio-Iong
More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clustering algorithms have been proposed for efficient tag SNP selection.
Multispectral image classification of MRI data using an empirically-derived clustering algorithm
1998-08-01
Multispectral image analysis of magnetic resonance imaging (MRI) data has been performed using an empirically-derived clustering algorithm. This algorithm groups image pixels into distinct classes which exhibit similar response in the T{sub 2} 1st and 2nd-echo, and T{sub 1} (with ad without gadolinium) MRI images. The grouping is performed in an n-dimensional mathematical space; the n-dimensional volumes bounding each class define each specific tissue type. The classification results are rendered again in real-space by colored-coding each grouped class of pixels (associated with differing tissue types). This classification method is especially well suited for class volumes with complex boundary shapes, and is also expected to robustly detect abnormal tissue classes. The classification process is demonstrated using a three dimensional data set of MRI scans of a human brain tumor.
2012-01-01
Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm. PMID:22619492
A Computational Algorithm for Functional Clustering of Proteome Dynamics During Development
2014-01-01
Phenotypic traits, such as seed development, are a consequence of complex biochemical interactions among genes, proteins and metabolites, but the underlying mechanisms that operate in a coordinated and sequential manner remain elusive. Here, we address this issue by developing a computational algorithm to monitor proteome changes during the course of trait development. The algorithm is built within the mixture-model framework in which each mixture component is modeled by a specific group of proteins that display a similar temporal pattern of expression in trait development. A nonparametric approach based on Legendre orthogonal polynomials was used to fit dynamic changes of protein expression, increasing the power and flexibility of protein clustering. By analyzing a dataset of proteomic dynamics during early embryogenesis of the Chinese fir, the algorithm has successfully identified several distinct types of proteins that coordinate with each other to determine seed development in this forest tree commercially and environmentally important to China. The algorithm will find its immediate applications for the characterization of mechanistic underpinnings for any other biological processes in which protein abundance plays a key role. PMID:24955031
2012-01-01
Automated spike sorting algorithm based on Laplacian eigenmaps and k-means clustering.
Chah, E; Hok, V; Della-Chiesa, A; Miller, J J H; O'Mara, S M; Reilly, R B
2011-02-01
This study presents a new automatic spike sorting method based on feature extraction by Laplacian eigenmaps combined with k-means clustering. The performance of the proposed method was compared against previously reported algorithms such as principal component analysis (PCA) and amplitude-based feature extraction. Two types of classifier (namely k-means and classification expectation-maximization) were incorporated within the spike sorting algorithms, in order to find a suitable classifier for the feature sets. Simulated data sets and in-vivo tetrode multichannel recordings were employed to assess the performance of the spike sorting algorithms. The results show that the proposed algorithm yields significantly improved performance with mean sorting accuracy of 73% and sorting error of 10% compared to PCA which combined with k-means had a sorting accuracy of 58% and sorting error of 10%.A correction was made to this article on 22 February 2011. The spacing of the title was amended on the abstract page. No changes were made to the article PDF and the print version was unaffected. PMID:21248378
Saeed, Fahad; Hoffert, Jason D; Knepper, Mark A
2013-11-22
High-throughput mass spectrometers can produce massive amounts of redundant data at an astonishing rate with many of them having poor signal-to-noise (S/N) ratio. These low S/N ratio spectra may not get interpreted using conventional spectra-to-database matching techniques. In this paper, we present an efficient algorithm, CAMS-RS (Clustering Algorithm for Mass Spectra using Restricted Space and Sampling) for clustering of raw mass spectrometry data. CAMS-RS utilizes a novel metric (called F-set) that exploits the temporal and spatial patterns to accurately assess similarity between two given spectra. The F-set similarity metric is independent of the retention time and allows clustering of mass spectrometry data from independent LC-MS/MS runs. A novel restricted search space strategy is devised to limit the comparisons of the number of spectra. An intelligent sampling method is executed on individual bins that allow merging of the results to make the final clusters. Our experiments, using experimentally generated datasets, show that the proposed algorithm is able to cluster spectra with high accuracy and is helpful in interpreting low S/N ratio spectra. The CAMS-RS algorithm is highly scalable with increasing number of spectra and our implementation allows clustering of up to a million spectra within minutes. PMID:24277952
Location Fingerprint Positioning Based on Interval-valued Data FCM Algorithm
Li, Fang; Tong, Weiming; Wang, Tiecheng
In order to reduce positioning calculation power consumption of ZigBee module, a fingerprint positioning method was proposed in the paper based on interval-valued data fuzzy c-means algorithm. Fingerprints were regarded as interval-valued data which could reflect its uncertainty caused by measurement error and interference. In high-dimensional feature space spanned by interval midpoint and length, fingerprints were clustered by FCM algorithm to lower computation complexity. Compared with traditional clustering technologies, such as c-mean, the method got better clustering results of location fingerprints in the positioning experiment designed in the paper. Results from the clustering and positioning experiments show that the method provides a feasible solution to decrease the positioning calculation power consumption of ZigBee module remarkably, as well as ensures the positioning precision.
`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny
Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila
2010-10-01
Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.
Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok
2015-01-01
This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, 'Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.
Tramacere, A.; Vecchio, C.
2013-01-01
Context. The density based spatial clustering of applications with noise (DBSCAN) is a topometric algorithm used to cluster spatial data that are affected by background noise. For the first time, we propose this method to detect sources in ?-ray astrophysical images obtained from the Fermi-LAT data, where each point corresponds to the arrival direction of a photon. Aims: We investigate the detection performance of the ?-ray DBSCAN in terms of detection efficiency and rejection of spurious clusters. Methods: We used a parametric approach, exploring a large volume of the ?-ray DBSCAN parameter space. By means of simulated data we statistically characterized the ?-ray DBSCAN, finding signatures that distinguish purely random fields from fields with sources. We defined a significance level for the detected clusters and successfully tested this significance with our simulated data. We applied the method to real data and found an excellent agreement with the results obtained with simulated data. Results.We find that the ?-ray DBSCAN can be successfully used in detecting clusters in ?-ray data. The significance returned by our algorithm is strongly correlated with that provided by the maximum likelihood analysis with standard Fermi-LAT software, and can be used to safely remove spurious clusters. The positional accuracy of the reconstructed cluster centroid compares to that returned by standard maximum likelihood analysis, allowing one to look for astrophysical counterparts in narrow regions, which minimizes the chance probability in the counterpart association. Conclusions.We found that ?-ray DBSCAN is a powerful tool for detecting of clusters in ?-ray data. It can be used to look for both point-like sources and extended sources, and can be potentially applied to any astrophysical field related to detecting clusters in data. In a companion paper we will present the application of the ?-ray DBSCAN to the full Fermi-LAT sky, discussing the potential of the algorithm to discover new sources.
Algorithmic Identification for Wings in Butterfly Diagrams.
Illarionov, E. A.; Sokolov, D. D.
2012-12-01
We investigate to what extent the wings of solar butterfly diagrams can be separated without an explicit usage of Hale's polarity law as well as the location of the solar equator. Two algorithms of cluster analysis, namely DBSCAN and C-means, have demonstrated their ability to separate the wings of contemporary butterfly diagrams based on the sunspot group density in the diagram only. Here we generalize the method for continuous tracers, give results concerning the migration velocities and presented clusters for 12 - 20 cycles.
Wu, Jia-Rui; Guo, Wei-Xian; Zhang, Xiao-Meng; Yang, Bing; Zhang, Bing
2014-02-01
Based on the data mining methods of association rules and clustering algorithm, the 188 prescriptions for cough that built by Yan Zhenghua were collected and analyzed to get the frequency of drug usage and the relationship between drugs. From which we could conclude the experiences of Yan Zhenghua for the treatment of cough. The results of the analysis were that 20 core combinations were dig out, such as Bambusae Caulis in Taenias-Almond-Sactmarsh Aster. And there were 10 new prescriptions were found out, such as Sactmarsh Aster-Scutellariae Radix-Album Viscum-Bambusae Caulis in Taenian-Eriobotryae Folium. The results of the analysis were proved that Yan Zhenghua was good at curing cough by using the traditional Chinese medicine that can dispel wind and heat from the body, and remove heat from the lung to relieve cough. PMID:25204134
Structural studies of adatom clusters on metal fcc(110) surfaces by a genetic algorithm method
Sun, Zhihua; Liu, Qingwei; Li, Yufen; Zhuang, Jun
2005-09-01
We study systematically the lowest-energy structures of adatom clusters on a series of metal fcc(110) surfaces using the genetic algorithm (GA). The atomic interactions are modeled by the realistic model potentials including embedded-atom method potential, surface-embedded-atom method potential, and Rosato-Guillopé-Legrand potential. The results show that on some surfaces, with increasing number of adatoms, the lowest-energy structures transit from linear chains oriented along the [1¯10] direction to two-dimensional islands with two rows, and then to islands with three rows. On other surfaces, the lowest-energy structures are all linear chains for all numbers of adatoms. The competition between the nearest-neighbor adatom-adatom interaction and the overall interaction of the next-nearest-neighbor and the third neighbor adatoms plays a key role in determining the lowest-energy structure.
Bagheripour, Parisa; Asoodeh, Mojtaba
2013-12-01
Porosity, the void portion of reservoir rocks, determines the volume of hydrocarbon accumulation and has a great control on assessment and development of hydrocarbon reservoirs. Accurate determination of porosity from core analysis is highly cost, time, and labor intensive. Therefore, the mission of finding an accurate, fast and cheap way of determining porosity is unavoidable. On the other hand, conventional well log data, available in almost all wells contain invaluable implicit information about the porosity. Therefore, an intelligent system can explicate this information. Fuzzy logic is a powerful tool for handling geosciences problem which is associated with uncertainty. However, determination of the best fuzzy formulation is still an issue. This study purposes an improved strategy, called hybrid genetic algorithm-pattern search (GA-PS) technique, against the widely held subtractive clustering (SC) method for setting up fuzzy rules between core porosity and petrophysical logs. Hybrid GA-PS technique is capable of extracting optimal parameters for fuzzy clusters (membership functions) which consequently results in the best fuzzy formulation. Results indicate that GA-PS technique manipulates both mean and variance of Gaussian membership functions contrary to SC that only has a control on mean of Gaussian membership functions. A comparison between hybrid GA-PS technique and SC method confirmed the superiority of GA-PS technique in setting up fuzzy rules. The proposed strategy was successfully applied to one of the Iranian carbonate reservoir rocks.
Dasarathy, B. V.
1976-01-01
An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.
Ansari, Elnaz Saberi; Eslahchi, Changiz; Pezeshk, Hamid; Sadeghi, Mehdi
2014-09-01
Decomposition of structural domains is an essential task in classifying protein structures, predicting protein function, and many other proteomics problems. As the number of known protein structures in PDB grows exponentially, the need for accurate automatic domain decomposition methods becomes more essential. In this article, we introduce a bottom-up algorithm for assigning protein domains using a graph theoretical approach. This algorithm is based on a center-based clustering approach. For constructing initial clusters, members of an independent dominating set for the graph representation of a protein are considered as the centers. A distance matrix is then defined for these clusters. To obtain final domains, these clusters are merged using the compactness principle of domains and a method similar to the neighbor-joining algorithm considering some thresholds. The thresholds are computed using a training set consisting of 50 protein chains. The algorithm is implemented using C++ language and is named ProDomAs. To assess the performance of ProDomAs, its results are compared with seven automatic methods, against five publicly available benchmarks. The results show that ProDomAs outperforms other methods applied on the mentioned benchmarks. The performance of ProDomAs is also evaluated against 6342 chains obtained from ASTRAL SCOP 1.71. ProDomAs is freely available at http://www.bioinf.cs.ipm.ir/software/prodomas. PMID:24596179
Pochron, William
2012-10-01
The STAR detector at the Relativistic Heavy Ion Collider (RHIC) at Brookhaven National Laboratory uses polarized proton collisions to determine the origin of the proton spin, using measurements such as neutral pion asymmetries. The Endcap Electromagnetic Calorimeter (EEMC) in the STAR detector is especially useful for detecting photons from 0? decays at forward angles. This latter measurement is obtained from the Shower Maximum Detector (SMD) in the EEMC where narrow crossed scintillator strips measure the energy deposited in them and can be used to identify the location of the photon shower. The electromagnetic shower most often deposits energy in a small number of adjacent strips that collectively form a ``cluster.'' This work has focused on a qualitative and quantitative comparison of two different clustering algorithms that were developed to reliably identify 0? events and to effectively discriminate against background cluster selection that produce false 0? signals. This comparative analysis will be presented and the strengths and weaknesses of the algorithms will be discussed.
Reproducible Clusters from Microarray Research: Whither?
Garge, Nikhil R; Page, Grier P; Sprague, Alan P; Gorman, Bernard S; Allison, David B
2005-01-01
Motivation In cluster analysis, the validity of specific solutions, algorithms, and procedures present significant challenges because there is no null hypothesis to test and no 'right answer'. It has been noted that a replicable classification is not necessarily a useful one, but a useful one that characterizes some aspect of the population must be replicable. By replicable we mean reproducible across multiple samplings from the same population. Methodologists have suggested that the validity of clustering methods should be based on classifications that yield reproducible findings beyond chance levels. We used this approach to determine the performance of commonly used clustering algorithms and the degree of replicability achieved using several microarray datasets. Methods We considered four commonly used iterative partitioning algorithms (Self Organizing Maps (SOM), K-means, Clutsering LARge Applications (CLARA), and Fuzzy C-means) and evaluated their performances on 37 microarray datasets, with sample sizes ranging from 12 to 172. We assessed reproducibility of the clustering algorithm by measuring the strength of relationship between clustering outputs of subsamples of 37 datasets. Cluster stability was quantified using Cramer's v2 from a kXk table. Cramer's v2 is equivalent to the squared canonical correlation coefficient between two sets of nominal variables. Potential scores range from 0 to 1, with 1 denoting perfect reproducibility. Results All four clustering routines show increased stability with larger sample sizes. K-means and SOM showed a gradual increase in stability with increasing sample size. CLARA and Fuzzy C-means, however, yielded low stability scores until sample sizes approached 30 and then gradually increased thereafter. Average stability never exceeded 0.55 for the four clustering routines, even at a sample size of 50. These findings suggest several plausible scenarios: (1) microarray datasets lack natural clustering structure thereby producing low stability scores on all four methods; (2) the algorithms studied do not produce reliable results and/or (3) sample sizes typically used in microarray research may be too small to support derivation of reliable clustering results. Further research should be directed towards evaluating stability performances of more clustering algorithms on more datasets specially having larger sample sizes with larger numbers of clusters considered. PMID:16026595
Marchal, Rémi; Carbonnière, Philippe; Pouchan, Claude
2015-01-01
The study of atomic clusters has become an increasingly active area of research in the recent years because of the fundamental interest in studying a completely new area that can bridge the gap between atomic and solid state physics. Due to their specific properties, such compounds are of great interest in the field of nanotechnology [1,2]. Here, we would present our GSAM algorithm based on a DFT exploration of the PES to find the low lying isomers of such compounds. This algorithm includes the generation of an intial set of structure from which the most relevant are selected. Moreover, an optimization process, called raking optimization, able to discard step by step all the non physically reasonnable configurations have been implemented to reduce the computational cost of this algorithm. Structural properties of GanAs m clusters will be presented as an illustration of the method.
NASA Astrophysics Data System (ADS)
Valaparla, Sunil K.; Peng, Qi; Gao, Feng; Clarke, Geoffrey D.
2014-03-01
Accurate measurements of human body fat distribution are desirable because excessive body fat is associated with impaired insulin sensitivity, type 2 diabetes mellitus (T2DM) and cardiovascular disease. In this study, we hypothesized that the performance of water suppressed (WS) MRI is superior to non-water suppressed (NWS) MRI for volumetric assessment of abdominal subcutaneous (SAT), intramuscular (IMAT), visceral (VAT), and total (TAT) adipose tissues. We acquired T1-weighted images on a 3T MRI system (TIM Trio, Siemens), which was analyzed using semi-automated segmentation software that employs a fuzzy c-means (FCM) clustering algorithm. Sixteen contiguous axial slices, centered at the L4-L5 level of the abdomen, were acquired in eight T2DM subjects with water suppression (WS) and without (NWS). Histograms from WS images show improved separation of non-fatty tissue pixels from fatty tissue pixels, compared to NWS images. Paired t-tests of WS versus NWS showed a statistically significant lower volume of lipid in the WS images for VAT (145.3 cc less, p=0.006) and IMAT (305 cc less, p<0.001), but not SAT (14.1 cc more, NS). WS measurements of TAT also resulted in lower fat volumes (436.1 cc less, p=0.002). There is strong correlation between WS and NWS quantification methods for SAT measurements (r=0.999), but poorer correlation for VAT studies (r=0.845). These results suggest that NWS pulse sequences may overestimate adipose tissue volumes and that WS pulse sequences are more desirable due to the higher contrast generated between fatty and non-fatty tissues.
Possibilistic clustering for shape recognition
Keller, James M.; Krishnapuram, Raghu
1992-01-01
Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, we cast the clustering problem into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We constructed an appropriate objective function whose minimum will characterize a good possibilistic partition of the data, and we derived the membership and prototype update equations from necessary conditions for minimization of our criterion function. In this paper, we show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.
Fleisch, Markus C.; Maxell, Christopher A.; Kuper, Claudia K.; Brown, Erika T.; Parvin, Bahram; Barcellos-Hoff, Mary-Helen; Costes,Sylvain V.
2006-03-08
Bhattacharya, Anindya; De, Rajat K
2010-08-01
Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software. PMID:20144735
M. S. Tame; M. S. Kim
2010-10-01
We show that fundamental versions of the Deutsch-Jozsa and Bernstein-Vazirani quantum algorithms can be performed using a small entangled cluster state resource of only six qubits. We then investigate the minimal resource states needed to demonstrate arbitrary n-qubit versions and a scalable method to produce them. For this purpose we propose a versatile on-chip photonic waveguide setup.
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
Sai, Linwei; Zhao, Jijun; Huang, Xiaoming; Wang, Jun
2012-01-01
Using genetic algorithm incorporated with density functional theory, we have explored the size evolution of structural and electronic properties of neutral gallium clusters of 20-40 atoms in terms of their ground state structures, binding energies, second differences of energy, HOMO-LUMO gaps, distributions of bond length and bond angle, and electron density of states. In the size range studied, the Ga(n) clusters exhibit several growth patterns, and the core-shell structures become dominant from Ga31. With high point group symmetries, Ga23 and Ga36 show particularly high stability and Ga36 owns a large HOMO-LUMO gap. The atomic structures and electronic states of Ga(n) clusters significantly differ from the a solid but resemble beta solid and liquid to certain extent. PMID:22523956
Tzerpos, Vassilios "Bil"
that exhibit similar features or properties. Such catÂ egories (commonly referred to as clusters) can be dis on the effectiveÂ ness and behaviour of these techniques has given rise to the field of cluster analysis. Software
Creating Personalised Energy Plans: From Groups to Individuals using Fuzzy C Means Clustering
of electricity gen- eration technologies in order to reduce greenhouse gas emis- sions, the desire to reduce Economy Research Institute. such as the need to reduce carbon emissions and the de- clining sources of hydro-carbon fuels. New technologies, such as electric cars needing household charging facilities
2011-10-12
San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
Evolution Strategy for the C-Means Algorithm: Application to multimodal image
Francesco Masulli DIBRIS - Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi.massone@spin.cnr.it Andrea Schenone DIBRIS - Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi
Inhomogeneity correction for magnetic resonance images with fuzzy C-mean algorithm
) visualization of the brain tissues are very helpful to detect pathology, for example, multiple sclerosis (MS. It is well known that the inhomogeneity effect is multiple, which is also called as bias field. Many
An Algorithm for Testing Unidimensionality and Clustering Items in Rasch Measurement
Debelak, Rudolf; Arendasy, Martin
2012-01-01
A new approach to identify item clusters fitting the Rasch model is described and evaluated using simulated and real data. The proposed method is based on hierarchical cluster analysis and constructs clusters of items that show a good fit to the Rasch model. It thus gives an estimate of the number of independent scales satisfying the postulates of…
Lennington, R. K.; Malek, H.
1978-01-01
A clustering method, CLASSY, was developed, which alternates maximum likelihood iteration with a procedure for splitting, combining, and eliminating the resulting statistics. The method maximizes the fit of a mixture of normal distributions to the observed first through fourth central moments of the data and produces an estimate of the proportions, means, and covariances in this mixture. The mathematical model which is the basic for CLASSY and the actual operation of the algorithm is described. Data comparing the performances of CLASSY and ISOCLS on simulated and actual LACIE data are presented.
Davis, Jack B A; Shayeghi, Armin; Horswell, Sarah L; Johnston, Roy L
2015-09-01
A new open-source parallel genetic algorithm, the Birmingham parallel genetic algorithm, is introduced for the direct density functional theory global optimisation of metallic nanoparticles. The program utilises a pool genetic algorithm methodology for the efficient use of massively parallel computational resources. The scaling capability of the Birmingham parallel genetic algorithm is demonstrated through its application to the global optimisation of iridium clusters with 10 to 20 atoms, a catalytically important system with interesting size-specific effects. This is the first study of its type on Iridium clusters of this size and the parallel algorithm is shown to be capable of scaling beyond previous size restrictions and accurately characterising the structures of these larger system sizes. By globally optimising the system directly at the density functional level of theory, the code captures the cubic structures commonly found in sub-nanometre sized Ir clusters. PMID:26239404
2015-08-01
A new open-source parallel genetic algorithm, the Birmingham parallel genetic algorithm, is introduced for the direct density functional theory global optimisation of metallic nanoparticles. The program utilises a pool genetic algorithm methodology for the efficient use of massively parallel computational resources. The scaling capability of the Birmingham parallel genetic algorithm is demonstrated through its application to the global optimisation of iridium clusters with 10 to 20 atoms, a catalytically important system with interesting size-specific effects. This is the first study of its type on Iridium clusters of this size and the parallel algorithm is shown to be capable of scaling beyond previous size restrictions and accurately characterising the structures of these larger system sizes. By globally optimising the system directly at the density functional level of theory, the code captures the cubic structures commonly found in sub-nanometre sized Ir clusters.A new open-source parallel genetic algorithm, the Birmingham parallel genetic algorithm, is introduced for the direct density functional theory global optimisation of metallic nanoparticles. The program utilises a pool genetic algorithm methodology for the efficient use of massively parallel computational resources. The scaling capability of the Birmingham parallel genetic algorithm is demonstrated through its application to the global optimisation of iridium clusters with 10 to 20 atoms, a catalytically important system with interesting size-specific effects. This is the first study of its type on Iridium clusters of this size and the parallel algorithm is shown to be capable of scaling beyond previous size restrictions and accurately characterising the structures of these larger system sizes. By globally optimising the system directly at the density functional level of theory, the code captures the cubic structures commonly found in sub-nanometre sized Ir clusters. Electronic supplementary information (ESI) available. See DOI: 10.1039/C5NR03774C
Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.
2014-06-01
The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test ? coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.
KtJet: A C++ implementation of the Kt clustering algorithm
2002-10-01
A C++ implementation of the Kt jet algorithm for high energy particle collisions is presented. The time performance of this implementation is comparable to the widely used Fortran implementation. Identical algorithmic functionality is provided, with a clean and intuitive user interface and additional recombination schemes. A short description of the algorithm and examples of its use are given.
2011-01-01
This paper introduces a novel methodology for the segmentation of brain MS lesions in MRI volumes using a new clustering algorithm named SCPFCM. SCPFCM uses membership, typicality and spatial information to cluster each voxel. The proposed method relies on an initial segmentation of MS lesions in T1-w and T2-w images by applying SCPFCM algorithm, and the T1 image is then used as a mask and is compared with T2 image. The proposed method was applied to 10 clinical MRI datasets. The results obtained on different types of lesions have been evaluated by comparison with manual segmentations. PMID:22606670
Chen, Wei-Chen; Maitra, Ranjan
2011-01-01
We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithm (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.
Krippendorff, Klaus
Clustering techniques seek to group together objects or variables that share some observed qualities or, alternatively, to partition a set of objects or variables into mutually exclusive classes whose boundaries reflect differences in the observed qualities of their members. This paper reviews the general principles underlying clustering…
2014-01-01
Wireless sensor networks (WSNs) have emerged as a promising solution for various applications due to their low cost and easy deployment. Typically, their limited power capability, i.e., battery powered, make WSNs encounter the challenge of extension of network lifetime. Many hierarchical protocols show better ability of energy efficiency in the literature. Besides, data reduction based on the correlation of sensed readings can efficiently reduce the amount of required transmissions. Therefore, we use a sub-clustering procedure based on spatial data correlation to further separate the hierarchical (clustered) architecture of a WSN. The proposed algorithm (2TC-cor) is composed of two procedures: the prediction model construction procedure and the sub-clustering procedure. The energy conservation benefits by the reduced transmissions, which are dependent on the prediction model. Also, the energy can be further conserved because of the representative mechanism of sub-clustering. As presented by simulation results, it shows that 2TC-cor can effectively conserve energy and monitor accurately the environment within an acceptable level. PMID:25412220
A neural network clustering algorithm for the ATLAS silicon pixel detector
The ATLAS collaboration
2014-09-01
A novel technique to identify and split clusters created by multiple charged particles in the ATLAS pixel detector using a set of artificial neural networks is presented. Such merged clusters are a common feature of tracks originating from highly energetic objects, such as jets. Neural networks are trained using Monte Carlo samples produced with a detailed detector simulation. This technique replaces the former clustering approach based on a connected component analysis and charge interpolation. The performance of the neural network splitting technique is quantified using data from proton-proton collisions at the LHC collected by the ATLAS detector in 2011 and from Monte Carlo simulations. This technique reduces the number of clusters shared between tracks in highly energetic jets by up to a factor of three. It also provides more precise position and error estimates of the clusters in both the transverse and longitudinal impact parameter resolution.
2013-01-01
We develop a parallel EM algorithm for multivariate Gaussian mixture models and use it to perform model-based clustering of a large climate data set. Three variants of the EM algorithm are reformulated in parallel and a new variant that is faster is presented. All are implemented using the single program, multiple data (SPMD) programming model, which is able to take advantage of the combined collective memory of large distributed computer architectures to process larger data sets. Displays of the estimated mixture model rather than the data allow us to explore multivariate relationships in a way that scales to arbitrary size data. We study the performance of our methodology on simulated data and apply our methodology to a high resolution climate dataset produced by the community atmosphere model (CAM5). This article has supplementary material online.
A new method based on Dempster–Shafer theory and fuzzy c-means for brain MRI segmentation
Liu, Jie; Lu, Xi; Li, Yunpeng; Chen, Xiaowu; Deng, Yong
2015-10-01
In this paper, a new method is proposed to decrease sensitiveness to motion noise and uncertainty in magnetic resonance imaging (MRI) segmentation especially when only one brain image is available. The method is approached with considering spatial neighborhood information by fusing the information of pixels with their neighbors with Dempster–Shafer (DS) theory. The basic probability assignment (BPA) of each single hypothesis is obtained from the membership function of applying fuzzy c-means (FCM) clustering to the gray levels of the MRI. Then multiple hypotheses are generated according to the single hypothesis. Then we update the objective pixel’s BPA by fusing the BPA of the objective pixel and those of its neighbors to get the final result. Some examples in MRI segmentation are demonstrated at the end of the paper, in which our method is compared with some previous methods. The results show that the proposed method is more effective than other methods in motion-blurred MRI segmentation.
2004-01-01
This thesis examines two methods for speeding up MCNP KCODE calculations. The first approach is assembly of a low cost Beowulf Cluster for parallel computation. The first half describes the MIT Nuclear Engineering Department's ...
An algorithm for identifying clusters of functionally related genes in genomes
2009-05-15
(jEj) time (Cormen [18])). 8 c = g1 g2 g3 g4 g5 g6 n = 6 c(w) = g1 g2 g3 k = 3 t(c(w)) = t3 t4 t5 cluster 1 [t4] g2 g3 n0 = 2 k0 = 2 cluster 2 [t3,t4] g1 g2 g3 n0 = 3 k0 = 3 cluster 3 [t3,t5] g1 g3 n0 = 3 k0 = 2 Fig. 2. Illustration of all clusters of flxed... are given in Figure 5. To compute c(v), flrst initialize its set of genes according to the function F. Then consider each vertex u in reversed topological order (which can be obtained by depth-flrst search in O(jEj) time (Cormen [18])), and update c...
Mass distributed clustering: a new algorithm for repeated measurements in gene expression data.
Matsumoto, Shinya; Aisaki, Ken-ichi; Kanno, Jun
2005-01-01
The availability of whole-genome sequence data and high-throughput techniques such as DNA microarray enable researchers to monitor the alteration of gene expression by a certain organ or tissue in a comprehensive manner. The quantity of gene expression data can be greater than 30,000 genes per one measurement, making data clustering methods for analysis essential. Biologists usually design experimental protocols so that statistical significance can be evaluated; often, they conduct experiments in triplicate to generate a mean and standard deviation. Existing clustering methods usually use these mean or median values, rather than the original data, and take significance into account by omitting data showing large standard deviations, which eliminates potentially useful information. We propose a clustering method that uses each of the triplicate data sets as a probability distribution function instead of pooling data points into a median or mean. This method permits truly unsupervised clustering of the data from DNA microarrays. PMID:16901101
Risk Mapping of Cutaneous Leishmaniasis via a Fuzzy C Means-based Neuro-Fuzzy Inference System
Akhavan, P.; Karimi, M.; Pahlavani, P.
2014-10-01
Finding pathogenic factors and how they are spread in the environment has become a global demand, recently. Cutaneous Leishmaniasis (CL) created by Leishmania is a special parasitic disease which can be passed on to human through phlebotomus of vector-born. Studies show that economic situation, cultural issues, as well as environmental and ecological conditions can affect the prevalence of this disease. In this study, Data Mining is utilized in order to predict CL prevalence rate and obtain a risk map. This case is based on effective environmental parameters on CL and a Neuro-Fuzzy system was also used. Learning capacity of Neuro-Fuzzy systems in neural network on one hand and reasoning power of fuzzy systems on the other, make it very efficient to use. In this research, in order to predict CL prevalence rate, an adaptive Neuro-fuzzy inference system with fuzzy inference structure of fuzzy C Means clustering was applied to determine the initial membership functions. Regarding to high incidence of CL in Ilam province, counties of Ilam, Mehran, and Dehloran have been examined and evaluated. The CL prevalence rate was predicted in 2012 by providing effective environmental map and topography properties including temperature, moisture, annual, rainfall, vegetation and elevation. Results indicate that the model precision with fuzzy C Means clustering structure rises acceptable RMSE values of both training and checking data and support our analyses. Using the proposed data mining technology, the pattern of disease spatial distribution and vulnerable areas become identifiable and the map can be used by experts and decision makers of public health as a useful tool in management and optimal decision-making.
R. Linderman
2008-01-01
A 50 TeraFLOPS cluster of Playstation 3sreg (PS3) is being built at the Air Force Research Laboratory, Information Directorate (AFRL\\/RI) in Rome, NY as an High Performance Computing Modernization Program (HPCMP) effort to provide early access to the IBM Cell Broadband engine chip technology included in the low priced commodity gaming consoles. A heterogeneous cluster with powerful subcluster headnodes is
Hazenberg, P.; Torfs, P. J. J. F.; Leijnse, H.; Uijlenhoet, R.
2012-04-01
Over the last decades the amount of spatial geographic data obtained from satellite and radar remote sensing, geographical and other types of spatial information has increased tremendously, making it impossible for a user to examine all in detail. Therefore, a considerable amount of research has focused on smart and efficient solutions to segment a spatial image into its dominant regions, extracting most essential information. The current research presents a new spatial image cluster identification method. The delineation of clusters is performed in two separate steps. First, we identify a regions outer contour using the properties of a rotating carpenter square. Secondly, we define all inner pixels belonging to a cluster based on the same principle, excluding inner contour regions if necessary. As such, a cluster identification method will be presented which has considerable similarity to some of the tracing type and connected component image segmentation algorithms developed in the literature during the last decade. However, since the characteristic shape of a carpenter square can easily be extended, the algorithm presented here does not strictly label neighboring pixels to the same component only. On the contrary, our algorithm is able to connect non-neighboring pixels for varying pixel distances as well. In addition, since our algorithm takes a continuous grid as input, it is possible to define transition pixels, that connect pixels that belong to a given cluster. Therefore, this newly developed algorithm presents a link between the traditional image segmentation methods implemented on binary grids and the partitional density and grid-based cluster identification methods that use continuous datasets. We will demonstrate the impact of this new cluster identification method for a number of typical geophysical cases ranging from global drought identification to weather radar based precipitation cell delineation.
Carreira-PerpiÃ±Ã¡n, Miguel Ã.
n 0.5 1 10 100 1000 0 1000 2000 3000 4000 5000 2moons small cameraman 2moons
Fujii, M; Funato, Y; Makino, J
2007-01-01
We developed a new direct-tree hybrid $N$-body algorithm for fully self-consistent $N$-body simulations of star clusters in their parent galaxies. In such simulations, star clusters need high accuracy, while galaxies need a fast scheme because of the large number of the particles required to model it. In our new algorithm, the internal motion of the star cluster is calculated accurately using the direct Hermite scheme with individual timesteps and all other motions are calculated using the tree code with second-order leapfrog integrator. The direct and tree schemes are combined using an extension of the mixed variable symplectic (MVS) scheme. Thus, the Hamiltonian corresponding to everything other than the internal motion of the star cluster is integrated with the leapfrog, which is symplectic. Using this algorithm, we performed fully self-consistent $N$-body simulations of star clusters in their parent galaxy. The internal and orbital evolutions of the star cluster agreed well with those obtained using the d...
A clustering algorithm for asymmetrically related data with applications to text mining
K. Krishna; Raghu Krishnapuram
2001-01-01
Clustering techniques find a collection of subsets of a data set such that the collection satisfies a criterion that is dependent on a relation defined on the data set. The underlying relation is traditionally assumed to be symmetric. However, there exist many practical scenarios where the underlying relation is asymmetric. One example of an asymmetric relation in text analysis is
UNCORRECTEDPROOF 2 Spectral and meta-heuristic algorithms for software clustering
Mancoridis, Spiros
the spectral methods to the software clustering problem and make com- 18 parisons to Bunch. We conducted a case.elsevier.com/locate/jss The Journal of Systems and Software xxx (2004) xxxxxx JSS 7666 No. of Pages 11, DTD = 5.0
Myasnikov, Aleksey
is a decision tree whose nodes store a pro- jection and threshold and whose leaves represent the clus- ters (classes). Experiments with various real and synthetic datasets show the effectiveness of the approach. 1 parameter and produces a compact description of the clusters found in the form of a binary tree. This tree
CLICKS: An Effective Algorithm for Mining Subspace Clusters in Categorical Datasets
Zaki, Mohammed Javeed
and Subject Descriptors: H.2.8 [Database Management]: Database Applications - Data Mining General Terms groups of points in a given dataset. Clustering of numeric (or real-valued) data has been widely studied of this work for personal or classroom use is granted without fee provided that copies are not made
Ogata, H; Fujibuchi, W; Goto, S; Kanehisa, M
2000-10-15
The availability of computerized knowledge on biochemical pathways in the KEGG database opens new opportunities for developing computational methods to characterize and understand higher level functions of complete genomes. Our approach is based on the concept of graphs; for example, the genome is a graph with genes as nodes and the pathway is another graph with gene products as nodes. We have developed a simple method for graph comparison to identify local similarities, termed correlated clusters, between two graphs, which allows gaps and mismatches of nodes and edges and is especially suitable for detecting biological features. The method was applied to a comparison of the complete genomes of 10 microorganisms and the KEGG metabolic pathways, which revealed, not surprisingly, a tendency for formation of correlated clusters called FRECs (functionally related enzyme clusters). However, this tendency varied considerably depending on the organism. The relative number of enzymes in FRECs was close to 50% for Bacillus subtilis and Escherichia coli, but was <10% for SYNECHOCYSTIS: and Saccharomyces cerevisiae. The FRECs collection is reorganized into a collection of ortholog group tables in KEGG, which represents conserved pathway motifs with the information about gene clusters in all the completely sequenced genomes. PMID:11024183
Scalable Model-based Clustering Algorithms for Large Databases and Their Applications
Jin, Huidong "Warren"
clus- tering algorithms for databases with large number of data items. It proposes a scalable model-based appropriate summary statistics. Besides the BIRCH's data summarization procedure, there exist many approaches. Our adaptive grid-based data summarization procedure simply partitions data space and sum up the data
Unsupervised Algorithms for Segmentation and Clustering Applied to Soccer Players Classification
Paolo Spagnolo; Pier Luigi Mazzeo; Marco Leo; Tiziana D'orazio
2007-01-01
In this work we consider the problem of soccer player detection and classification. The approach we propose starts from the monocular images acquired by a still camera. Firstly, players are detected by means of background subtraction. An algorithm based on pixels energy content has been implemented in order to detect moving objects. The use of energy information, combined with a
DWT-CEM: an algorithm for scale-temporal clustering in fMRI
João Ricardo Sato; André Fujita; Edson Amaro Jr.; Janaina Mourão Miranda; Pedro Alberto Morettin; Michal John Brammer
2007-01-01
The number of studies using functional magnetic resonance imaging (fMRI) has grown very rapidly since the first description\\u000a of the technique in the early 1990s. Most published studies have utilized data analysis methods based on voxel-wise application\\u000a of general linear models (GLM). On the other hand, temporal clustering analysis (TCA) focuses on the identification of relationships\\u000a between cortical areas by
Kavitha, Muthu Subash; Asano, Akira; Taguchi, Akira
2013-01-01
Purpose To prevent low bone mineral density (BMD), that is, osteoporosis, in postmenopausal women, it is essential to diagnose osteoporosis more precisely. This study presented an automatic approach utilizing a histogram-based automatic clustering (HAC) algorithm with a support vector machine (SVM) to analyse dental panoramic radiographs (DPRs) and thus improve diagnostic accuracy by identifying postmenopausal women with low BMD or osteoporosis. Materials and Methods We integrated our newly-proposed histogram-based automatic clustering (HAC) algorithm with our previously-designed computer-aided diagnosis system. The extracted moment-based features (mean, variance, skewness, and kurtosis) of the mandibular cortical width for the radial basis function (RBF) SVM classifier were employed. We also compared the diagnostic efficacy of the SVM model with the back propagation (BP) neural network model. In this study, DPRs and BMD measurements of 100 postmenopausal women patients (aged >50 years), with no previous record of osteoporosis, were randomly selected for inclusion. Results The accuracy, sensitivity, and specificity of the BMD measurements using our HAC-SVM model to identify women with low BMD were 93.0% (88.0%-98.0%), 95.8% (91.9%-99.7%) and 86.6% (79.9%-93.3%), respectively, at the lumbar spine; and 89.0% (82.9%-95.1%), 96.0% (92.2%-99.8%) and 84.0% (76.8%-91.2%), respectively, at the femoral neck. Conclusion Our experimental results predict that the proposed HAC-SVM model combination applied on DPRs could be useful to assist dentists in early diagnosis and help to reduce the morbidity and mortality associated with low BMD and osteoporosis. PMID:24083208
VIDEO OBJECT SEGMENTATION AND TRACKING USING PROBABILISTIC FUZZY C-MEANS
Zhang, Xiao-Ping
VIDEO OBJECT SEGMENTATION AND TRACKING USING PROBABILISTIC FUZZY C-MEANS Jian Zhou, Xiao-Ping Zhang, Canada, M5B 2K3 E-mail: {jzhou, xzhang}@ee.ryerson.ca ABSTRACT Automatic video object segmentation to associate the segmented regions to form video objects. Temporal tracking is achieved by projecting
Improving Semi-supervised Fuzzy C-Means Classification of Breast Cancer Data
Aickelin, Uwe
of breast cancer cases with genetic information based on 25 protein biomarkers. The six classes were derivedImproving Semi-supervised Fuzzy C-Means Classification of Breast Cancer Data Using Feature, six clinically novel and useful subgroups of breast cancer were identified using rules and clinicians
Muhammad, Durreshahwar; Foret, Jessica; Brady, Siobhan M.; Ducoste, Joel J.; Tuck, James; Long, Terri A.; Williams, Cranos
2015-01-01
Time course transcriptome datasets are commonly used to predict key gene regulators associated with stress responses and to explore gene functionality. Techniques developed to extract causal relationships between genes from high throughput time course expression data are limited by low signal levels coupled with noise and sparseness in time points. We deal with these limitations by proposing the Cluster and Differential Alignment Algorithm (CDAA). This algorithm was designed to process transcriptome data by first grouping genes based on stages of activity and then using similarities in gene expression to predict influential connections between individual genes. Regulatory relationships are assigned based on pairwise alignment scores generated using the expression patterns of two genes and some inferred delay between the regulator and the observed activity of the target. We applied the CDAA to an iron deficiency time course microarray dataset to identify regulators that influence 7 target transcription factors known to participate in the Arabidopsis thaliana iron deficiency response. The algorithm predicted that 7 regulators previously unlinked to iron homeostasis influence the expression of these known transcription factors. We validated over half of predicted influential relationships using qRT-PCR expression analysis in mutant backgrounds. One predicted regulator-target relationship was shown to be a direct binding interaction according to yeast one-hybrid (Y1H) analysis. These results serve as a proof of concept emphasizing the utility of the CDAA for identifying unknown or missing nodes in regulatory cascades, providing the fundamental knowledge needed for constructing predictive gene regulatory networks. We propose that this tool can be used successfully for similar time course datasets to extract additional information and infer reliable regulatory connections for individual genes. PMID:26317202
NASA Astrophysics Data System (ADS)
Cameron, Maria; Vanden-Eijnden, Eric
2014-08-01
A set of analytical and computational tools based on transition path theory (TPT) is proposed to analyze flows in complex networks. Specifically, TPT is used to study the statistical properties of the reactive trajectories by which transitions occur between specific groups of nodes on the network. Sampling tools are built upon the outputs of TPT that allow to generate these reactive trajectories directly, or even transition paths that travel from one group of nodes to the other without making any detour and carry the same probability current as the reactive trajectories. These objects permit to characterize the mechanism of the transitions, for example by quantifying the width of the tubes by which these transitions occur, the location and distribution of their dynamical bottlenecks, etc. These tools are applied to a network modeling the dynamics of the Lennard-Jones cluster with 38 atoms () and used to understand the mechanism by which this cluster rearranges itself between its two most likely states at various temperatures.
A Self-stabilizing (k,r)-clustering Algorithm with Multiple Paths for Wireless Ad-hoc Networks
Tsigas, Philippas
recover from errors and temporarily broken assumptions. Clustering nodes within ad-hoc networks can help bones for efficient communication can be formed using cluster heads. Clusters can be used for routing
Production of light and intermediate mass fragments using various clusterization algorithms
Ekta,; Puri, Rajeev K
2010-01-01
For the present analysis we simulate the reaction 129Xe 54+197Au79 at E=50 MeV/nucleon respectively[3]. This reaction is simulated at different impact parameters using hard equation of state. The stored phase space is then analyzed by using MST, MSTP and MSTB algorithms. MSTB and MSTP identifies the free nucleons as early as possible. In the two cases, a check in the form of binding energy and momentum cut helps to identify the fragments quite early. The normal MST takes quite a long time to identify the stable fragments which are residual of excited fragments.
Production of light and intermediate mass fragments using various clusterization algorithms
Ekta; Suneel Kumar; Rajeev K. Puri
2010-09-27
A new time dependent density functional algorithm for large systems and plasmons in metal clusters
NASA Astrophysics Data System (ADS)
Baseggio, Oscar; Fronzoni, Giovanna; Stener, Mauro
2015-07-01
A new algorithm to solve the Time Dependent Density Functional Theory (TDDFT) equations in the space of the density fitting auxiliary basis set has been developed and implemented. The method extracts the spectrum from the imaginary part of the polarizability at any given photon energy, avoiding the bottleneck of Davidson diagonalization. The original idea which made the present scheme very efficient consists in the simplification of the double sum over occupied-virtual pairs in the definition of the dielectric susceptibility, allowing an easy calculation of such matrix as a linear combination of constant matrices with photon energy dependent coefficients. The method has been applied to very different systems in nature and size (from H2 to [Au147]-). In all cases, the maximum deviations found for the excitation energies with respect to the Amsterdam density functional code are below 0.2 eV. The new algorithm has the merit not only to calculate the spectrum at whichever photon energy but also to allow a deep analysis of the results, in terms of transition contribution maps, Jacob plasmon scaling factor, and induced density analysis, which have been all implemented.
A new time dependent density functional algorithm for large systems and plasmons in metal clusters.
Baseggio, Oscar; Fronzoni, Giovanna; Stener, Mauro
2015-07-14
Adaptive fuzzy leader clustering of complex data sets in pattern recognition
A modular, unsupervised neural network architecture for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns on-line in a stable and efficient manner. The initial classification is performed in two stages: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from fuzzy C-means system equations for the centroids and the membership values. The AFLC algorithm is applied to the Anderson Iris data and laser-luminescent fingerprint image data. It is concluded that the AFLC algorithm successfully classifies features extracted from real data, discrete or continuous.
Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's pairedt-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readers’ manual segmentation, the proposed FCM-Atlas method achieves a correlation ofr = 0.92 for FGT% and r = 0.93 for |FGT|, and the automated segmentation is not statistically significantly different (p = 0.46 for FGT% and p = 0.55 for |FGT|). The bilateral correlation between left breasts and right breasts for the FGT% is 0.94, 0.92, and 0.95 for reader 1, reader 2, and the FCM-Atlas, respectively; likewise, for the |FGT|, it is 0.92, 0.92, and 0.93, respectively. For the spatial segmentation agreement, the automated algorithm achieves a DSC of 0.69 ± 0.1 when compared to reader 1 and 0.61 ± 0.1 for reader 2, respectively, while the DSC between the two readers’ manual segmentation is 0.67 ± 0.15. Additional robustness analysis shows that the segmentation performance of the authors' method is stable both with respect to selecting different cases and to varying the number of cases needed to construct the prior probability atlas. The authors' results also show that the proposed FCM-Atlas method outperforms the commonly used two-cluster FCM-alone method. The authors' method runs at ?5 min for each 3D bilateral MR scan (56 slices) for computing the FGT% and |FGT|, compared to ?55 min needed for manual segmentation for the same purpose. Conclusions: The authors' method achieves robust segmentation and can serve as an efficient tool for processing large clinical datasets for quantifying the fibroglandular tissue content in breast MRI. It holds a great potential to support clinical applications in the future including breast cancer risk assessment.
Deployment of wireless sensor networks (WSNs) has drawn much attention in recent years. Given the limited energy for sensor nodes, it is critical to implement WSNs with energy efficiency designs. Sensing coverage in networks, on the other hand, may degrade gradually over time after WSNs are activated. For mission-critical applications, therefore, energy-efficient coverage control should be taken into consideration to support the quality of service (QoS) of WSNs. Usually, coverage-controlling strategies present some challenging problems: (1) resolving the conflicts while determining which nodes should be turned off to conserve energy; (2) designing an optimal wake-up scheme that avoids awakening more nodes than necessary. In this paper, we implement an energy-efficient coverage control in cluster-based WSNs using a Memetic Algorithm (MA)-based approach, entitled CoCMA, to resolve the challenging problems. The CoCMA contains two optimization strategies: a MA-based schedule for sensor nodes and a wake-up scheme, which are responsible to prolong the network lifetime while maintaining coverage preservation. The MA-based schedule is applied to a given WSN to avoid unnecessary energy consumption caused by the redundant nodes. During the network operation, the wake-up scheme awakens sleeping sensor nodes to recover coverage hole caused by dead nodes. The performance evaluation of the proposed CoCMA was conducted on a cluster-based WSN (CWSN) under either a random or a uniform deployment of sensor nodes. Simulation results show that the performance yielded by the combination of MA and wake-up scheme is better than that in some existing approaches. Furthermore, CoCMA is able to activate fewer sensor nodes to monitor the required sensing area. PMID:22408561
An improved FCM medical image segmentation algorithm based on MMTD.
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) is one of the popular clustering algorithms for medical image segmentation. But FCM is highly vulnerable to noise due to not considering the spatial information in image segmentation. This paper introduces medium mathematics system which is employed to process fuzzy information for image segmentation. It establishes the medium similarity measure based on the measure of medium truth degree (MMTD) and uses the correlation of the pixel and its neighbors to define the medium membership function. An improved FCM medical image segmentation algorithm based on MMTD which takes some spatial features into account is proposed in this paper. The experimental results show that the proposed algorithm is more antinoise than the standard FCM, with more certainty and less fuzziness. This will lead to its practicable and effective applications in medical image segmentation. PMID:24648852
A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining
The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ?-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.
A binned clustering algorithm to detect high-Z material using cosmic muons
We present a novel approach to the detection of special nuclear material using cosmic rays. Muon Scattering Tomography (MST) is a method for using cosmic muons to scan cargo containers and vehicles for special nuclear material. Cosmic muons are abundant, highly penetrating, not harmful for organic tissue, cannot be screened against, and can easily be detected, which makes them highly suited to the use of cargo scanning. Muons undergo multiple Coulomb scattering when passing through material, and the amount of scattering is roughly proportional to the square of the atomic number Z of the material. By reconstructing incoming and outgoing tracks, we can obtain variables to identify high-Z material. In a real life application, this has to happen on a timescale of 1 min and thus with small numbers of muons. We have built a detector system using resistive plate chambers (RPCs): 12 layers of RPCs allow for the readout of 6 x and 6 y positions, by which we can reconstruct incoming and outgoing tracks. In this work we detail the performance of an algorithm by which we separate high-Z targets from low-Z background, both for real data from our prototype setup and for MC simulation of a cargo container-sized setup. (c) British Crown Owned Copyright 2013/AWE
We investigate remapping multi-dimensional arrays on cluster of SMP architectures under OpenMP, MPI, and hybrid paradigms. Traditional method of array transpose needs an auxiliary array of the same size and a copy back stage. We recently developed an in-place method using vacancy tracking cycles. The vacancy tracking algorithm outperforms the traditional 2-array method as demonstrated by extensive comparisons. The independence
We investigate remapping multi-dimensional arrays on cluster of SMP architectures under OpenMP, MPI, and hybrid paradigms. Traditional method of array transpose needs an auxiliary array of the same size and a copy back stage. We recently developed an in-place method using vacancy tracking cycles. The vacancy tracking algorithm outperforms the traditional 2-array method as demonstrated by extensive comparisons. The independence
Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,000 color infrared aerial photography. Confusion matrices for the two clustering algorithms were generated and studied to determine which is most applicable to forest and rangeland inventories in future projects. In an unsupervised mode, ISOCLS requires many trial-and-error runs to find the proper parameters to separate desired information classes. CLASSY tells more in a single run concerning the classes that can be separated, shows more promise for forest stratification than ISOCLS, and shows more promise for consistency. One major drawback to CLASSY is that important forest and range classes that are smaller than a minimum cluster size will be combined with other classes. The algorithm requires so much computer storage that only data sets as small as a quadrangle can be used at one time.
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
IGroup: web image search results clustering
In this paper, we propose, IGroup, an efficient and effective algorithm that organizes Web image search results into clusters. IGroup is different from all existing Web image search results clustering algorithms that only cluster the top few images using visual or textual features. Our proposed algorithm first identifies several query-related semantic clusters based on a key phrases extraction algorithm originally
Information Clustering Based on Fuzzy Multisets.
Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…
A stable and unsupervised Fuzzy C-Means for data classification
In this paper a stable and unsupervised version of FCM algorithm named FCMO is presented. The originality of the proposed FCMO algorithm relies: i) on the usage of an adaptive incremental technique to initialize the class centres that calls into question the intermediate initializations; this technique renders the algorithm stable and deterministic, and the classification results do not vary from a run to another, and ii) on the unsupervised evaluation criteria of the intermediate classification result to estimate the optimal number of classes; this makes the algorithm unsupervised. The efficiency of this optimized version of FCM is shown through some experimental results for its stability and its correct class number estimation.
Quantum Annealing for Clustering
This paper studies quantum annealing (QA) for clustering, which can be seen as an extension of simulated annealing (SA). We derive a QA algorithm for clustering and propose an annealing schedule, which is crucial in practice. Experiments show the proposed QA algorithm finds better clustering assignments than SA. Furthermore, QA is as easy as SA to implement.
compression technique, when applied to a set of medical images.
During a typical hurricane season in the Atlantic, there are usually more than 100 cloud clusters
Logic-oriented fuzzy clustering
The paper is concerned with a logic-based expansion of the standard FCM clustering. The proposed algorithm captures the logic fabric of the structure in a dataset by describing it in the form of a union of the clusters (that is fuzzy relations) determined by the clustering algorithm. In contrast to the standard FCM, the elements (clusters) are combined together as
On the analysis of BIS stage epochs via fuzzy clustering.
Among various types of clustering methods, partition-based methods such as k-means and FCM are widely used in the analysis of such data. However, when duration between stimuli is different, such methods are not able to provide satisfactory results because they find equal size clusters according to the fundamental running principle of these methods. In such cases, neighborhood-based clustering methods can give more satisfactory results because measurement series are separated from one another according to dramatic breaking points. In recent years, bispectral index (BIS) monitoring, which is used for monitoring the level of anesthesia, has been used in sleep studies. Sleep stages are classically scored according to the Rechtschaffen and Kales (R&K) scoring system. BIS has been shown to have a strong correlation with the R&K scoring system. In this study, fuzzy neighborhood/density-based spatial clustering of applications with noise (FN-DBSCAN) that combines speed of the DBSCAN algorithm and robustness of the NRFJP algorithm is applied to BIS measurement series. As a result of experiments, we can conclude that, by using BIS data, the FN-DBSCAN method estimates sleep stages better than the fuzzy c-means method. PMID:20156029
Segmentation of Spin-Echo MRI brain images: a comparison study of Crisp and Fuzzy algorithms
This thesis presents a scheme for segmenting Spin-Echo MRI brain images based on Fuzzy C-Mean (FCM) clustering techniques. This scheme consists of feature extraction, feature conditioning or evaluation, and thresholded FCM clustering. Feature...
On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.
Naim, Iftekhar; Datta, Suprakash; Rebhahn, Jonathan; Cavenaugh, James S; Mosmann, Tim R; Sharma, Gaurav
2014-05-01
We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems. PMID:24677621
Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r(4)), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T). PMID:23927246
Accelerated clustering through locality-sensitive hashing
We obtain improved running times for two algorithms for clustering data: the expectation-maximization (EM) algorithm and Lloyd's algorithm. The EM algorithm is a heuristic for finding a mixture of k normal distributions ...
Vilalta, Ricardo
THEMATIC MAPS OF MARTIAN TOPOGRAPHY GENERATED BY A CLUSTERING ALGORITHM.
On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained clustering, in which some partial information about item assignments or other components of the resulting output are already known and must be accommodated by the solution. Some algorithms seek a partition of the data set into distinct clusters, while others build a hierarchy of nested clusters that can capture taxonomic relationships. Some produce a single optimal solution, while others construct a probabilistic model of cluster membership. More formally, clustering algorithms operate on a data set X composed of items represented by one or more features (dimensions). These could include physical location, such as right ascension and declination, as well as other properties such as brightness, color, temporal change, size, texture, and so on. Let D be the number of dimensions used to represent each item, xi ? RD. The clustering goal is to produce an organization P of the items in X that optimizes an objective function f : P -> R, which quantifies the quality of solution P. Often f is defined so as to maximize similarity within a cluster and minimize similarity between clusters. To that end, many algorithms make use of a measure d : X x X -> R of the distance between two items. A partitioning algorithm produces a set of clusters P = {c1, . . . , ck} such that the clusters are nonoverlapping (c_i intersected with c_j = empty set, i != j) subsets of the data set (Union_i c_i=X). Hierarchical algorithms produce a series of partitions P = {p1, . . . , pn }. For a complete hierarchy, the number of partitions n’= n, the number of items in the data set; the top partition is a single cluster containing all items, and the bottom partition contains n clusters, each containing a single item. For model-based clustering, each cluster c_j is represented by a model m_j , such as the cluster center or a Gaussian distribution. The wide array of available clustering algorithms may seem bewildering, and covering all of them is beyond the scope of this chapter. Choosing among them for a particular application involves considerations of the kind
We have studied the Ostwald ripening of three-dimensional islands on a homogeneous surface with an original off-lattice kinetic Monte Carlo algorithm. In this algorithm, adatom trajectories are highly simplified, while still ensuring that the adatom fluxes between islands are exactly described. From the simulations, we obtained the evolution of the island size distribution over a large time range. The simulations obtained are compared with the results of numerical integration of rate equations derived from a mean-field approximation. Both results indicate that the equilibrium radius of the islands follows a power-law behavior in the limit of a very dilute phase, with an exponent close to 1/4. A general, excellent agreement is obtained, showing the validity of our approach, whereas the validity of the mean-field approximation is discussed for a very small mean island size, or for a large fraction of the surface covered by the islands.
Gel electrophoresis (GE) is one of the most used method to separate DNA, RNA, protein molecules according to size, weight and quantity parameters in many areas such as genetics, molecular biology, biochemistry, microbiology. The main way to separate each molecule is to find borders of each molecule fragment. This paper presents a software application that show columns edges of DNA fragments in 3 steps. In the first step the application obtains lane histograms of agarose gel electrophoresis images by doing projection based on x-axis. In the second step, it utilizes k-means clustering algorithm to classify point values of lane histogram such as left side values, right side values and undesired values. In the third step, column edges of DNA fragments is shown by using mean algorithm and mathematical processes to separate DNA fragments from the background in a fully automated way. In addition to this, the application presents locations of DNA fragments and how many DNA fragments exist on images captured by a scientific camera.
Cluster analysis for pattern recognition in solar butterfly diagrams
We investigate to what extent the wings of solar butterfly diagrams can be separated without an explicit usage of Hale's polarity law as well as the location of the solar equator. We apply two algorithms of cluster analysis for this purpose, namely DBSCAN and C-means, and demonstrate their ability to separate the wings of contemporary butterfly diagrams based on the sunspot group density in the diagram only. Then we apply the method to historical data concerning the solar activity in the 18th century (Staudacher data). The method separates the two wings for Cycle 2, but fails to separate them for Cycle 1. In our opinion, this finding supports the interpretation of the Staudacher data as an indication of the unusual nature of the solar cycle in the 18th century.
Cluster-Based Cumulative Ensembles
In this paper, we propose a cluster-based cumulative rep- resentation for cluster ensembles. Cluster labels are mapped to incre- mentally accumulated clusters, and a matching criterion based on maxi- mum similarity is used. The ensemble method is investigated with boot- strap re-sampling, where the k-means algorithm is used to generate high granularity clusterings. For combining, group average hierarchical meta- clustering
Model-based overlapping clustering
While the vast majority of clustering algorithms are partitional, many real world datasets have inherently overlapping clusters. Several approaches to finding overlapping clusters have come from work on analysis of biological datasets. In this paper, we interpret an overlapping clustering model proposed by Segal et al. [23] as a generalization of Gaussian mixture models, and we extend it to an
Combining Multiple Clusterings Using Evidence Accumulation
Abstract We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble - a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), difierent ways of producing data partitions are: (1)- applying difierent clustering algorithms, and (2)- applying the same clustering algorithm with difierent
Change detection in synthetic aperture radar images based on image fusion and fuzzy clustering.
This paper presents an unsupervised distribution-free change detection approach for synthetic aperture radar (SAR) images based on an image fusion strategy and a novel fuzzy clustering algorithm. The image fusion technique is introduced to generate a difference image by using complementary information from a mean-ratio image and a log-ratio image. In order to restrain the background information and enhance the information of changed regions in the fused difference image, wavelet fusion rules based on an average operator and minimum local area energy are chosen to fuse the wavelet coefficients for a low-frequency band and a high-frequency band, respectively. A reformulated fuzzy local-information C-means clustering algorithm is proposed for classifying changed and unchanged regions in the fused difference image. It incorporates the information about spatial context in a novel fuzzy way for the purpose of enhancing the changed information and of reducing the effect of speckle noise. Experiments on real SAR images show that the image fusion strategy integrates the advantages of the log-ratio operator and the mean-ratio operator and gains a better performance. The change detection results obtained by the improved fuzzy clustering algorithm exhibited lower error than its preexistences. PMID:21984509
Terrestrial Laser Scanners (TLS) are used frequently in three dimensional documentation studies and present an alternative method for three dimensional modeling without any deformation of scale. In this study, point cloud data segmentation is used for photogrammetrical image data production from laser scanner data. The segmentation studies suggest several methods for automation of curve surface determination for digital terrain modeling. In this study, fuzzy logic approach has been used for the automatic segmentation of the regular curve surfaces which differ in their depths to the instrument. This type of shapes has been usually observed in the dome surfaces for close range architectural documentation. The model of C-means integrated fuzzy logic approach has been developed with MatLAB 7.0 software. Gauss2mf membership functions algorithm has been tested with original data set. These results were used in photogrammetric 3D modeling process. As the result of the study, testing the results of point cloud data set has been discussed and interpreted with all of its advantages and disadvantages in Section 5.
A robust elicitation algorithm for discovering DNA motifs using fuzzy self-organizing maps.
It is important to identify DNA motifs in promoter regions to understand the mechanism of gene regulation. Computational approaches for finding DNA motifs are well recognized as useful tools to biologists, which greatly help in saving experimental time and cost in wet laboratories. Self-organizing maps (SOMs), as a powerful clustering tool, have demonstrated good potential for problem solving. However, the current SOM-based motif discovery algorithms unfairly treat data samples lying around the cluster boundaries by assigning them to one of the nodes, which may result in unreliable system performance. This paper aims to develop a robust framework for discovering DNA motifs, where fuzzy SOMs, with an integration of fuzzy c-means membership functions and a standard batch-learning scheme, are employed to extract putative motifs with varying length in a recursive manner. Experimental results on eight real datasets show that our proposed algorithm outperforms the other searching tools such as SOMBRERO, SOMEA, MEME, AlignACE, and WEEDER in terms of the F-measure and algorithm reliability. It is observed that a remarkable 24.6% improvement can be achieved compared to the state-of-the-art SOMBRERO. Furthermore, our algorithm can produce a 20% and 6.6% improvement over SOMBRERO and SOMEA, respectively, in finding multiple motifs on five artificial datasets. PMID:24808603
A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.
This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method. PMID:25248211
Metamodel-based global optimization using fuzzy clustering for design space reduction
High fidelity analysis are utilized in modern engineering design optimization problems which involve expensive black-box models. For computation-intensive engineering design problems, efficient global optimization methods must be developed to relieve the computational burden. A new metamodel-based global optimization method using fuzzy clustering for design space reduction (MGO-FCR) is presented. The uniformly distributed initial sample points are generated by Latin hypercube design to construct the radial basis function metamodel, whose accuracy is improved with increasing number of sample points gradually. Fuzzy c-mean method and Gath-Geva clustering method are applied to divide the design space into several small interesting cluster spaces for low and high dimensional problems respectively. Modeling efficiency and accuracy are directly related to the design space, so unconcerned spaces are eliminated by the proposed reduction principle and two pseudo reduction algorithms. The reduction principle is developed to determine whether the current design space should be reduced and which space is eliminated. The first pseudo reduction algorithm improves the speed of clustering, while the second pseudo reduction algorithm ensures the design space to be reduced. Through several numerical benchmark functions, comparative studies with adaptive response surface method, approximated unimodal region elimination method and mode-pursuing sampling are carried out. The optimization results reveal that this method captures the real global optimum for all the numerical benchmark functions. And the number of function evaluations show that the efficiency of this method is favorable especially for high dimensional problems. Based on this global design optimization method, a design optimization of a lifting surface in high speed flow is carried out and this method saves about 10 h compared with genetic algorithms. This method possesses favorable performance on efficiency, robustness and capability of global convergence and gives a new optimization strategy for engineering design optimization problems involving expensive black box models.
1 Advances in Fuzzy Clustering and its Applications Chapter 20 Novel Developments in
Aickelin, Uwe
to analyse the FTIR spectroscopic data from tissue samples, multivariate clustering techniques have often types of cells can be separated within biological tissue. Among existing clustering techniques, it has been shown that fuzzy clustering techniques such as fuzzy c-means can have clear advantages over crisp
Formosat-2 image is a kind of high-spatial-resolution (2 meters GSD) remote sensing satellite data, which includes one panchromatic band and four multispectral bands (Blue, Green, Red, near-infrared). An essential sector in the daily processing of received Formosat-2 image is to estimate the cloud statistic of image using Automatic Cloud Coverage Assessment (ACCA) algorithm. The information of cloud statistic of image is subsequently recorded as an important metadata for image product catalog. In this paper, we propose an ACCA method with two consecutive stages: preprocessing and post-processing analysis. For pre-processing analysis, the un-supervised K-means classification, Sobel's method, thresholding method, non-cloudy pixels reexamination, and cross-band filter method are implemented in sequence for cloud statistic determination. For post-processing analysis, Box-Counting fractal method is implemented. In other words, the cloud statistic is firstly determined via pre-processing analysis, the correctness of cloud statistic of image of different spectral band is eventually cross-examined qualitatively and quantitatively via post-processing analysis. The selection of an appropriate thresholding method is very critical to the result of ACCA method. Therefore, in this work, We firstly conduct a series of experiments of the clustering-based and spatial thresholding methods that include Otsu's, Local Entropy(LE), Joint Entropy(JE), Global Entropy(GE), and Global Relative Entropy(GRE) method, for performance comparison. The result shows that Otsu's and GE methods both perform better than others for Formosat-2 image. Additionally, our proposed ACCA method by selecting Otsu's method as the threshoding method has successfully extracted the cloudy pixels of Formosat-2 image for accurate cloud statistic estimation.
Literature Clustering using Citation Semantics
Clustering is a common and powerful technique for statistical data analysis, document categorization and topic discovery. The majority of traditional clustering methods, especially for document clustering, are based on the vector space model for distance measure, where the vector is the word profile of a document in the context of the entire corpus. However, algorithms using this measure achieve limited
Online Clustering of Moving Hyperplanes Rene Vidal
simple and temporally coherent online algorithm for clustering point trajectories lying in a variableOnline Clustering of Moving Hyperplanes Ren´e Vidal Center for Imaging Science, Department, USA rvidal@cis.jhu.edu Abstract We propose a recursive algorithm for clustering trajectories lying
A Modified Clustering Method with Fuzzy Ants
Ant-based clustering due to its flexibility, stigmergic and self-organization has been applied in variety areas from problems arising in commerce, to circuit design, and to text-mining, etc. A modified clustering method with fuzzy ants has been presented in this paper. Firstly, fuzzy ants and its behavior are defined; secondly, the new clustering algorithm has been constructed based on fuzzy ants. In this algorithm, we consider multiple ants based on Schockaert's algorithm. This algorithm can be accelerated by the use of parallel ants, global memory banks and density-based `look ahead' method. Experimental results show that this algorithm is more efficient to other ant clustering methods.
Dynamic clustering via asymptotics of the dependent Dirichlet process mixture
This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via ...
Symmetry Based Automatic Evolution of Clusters: A New Approach to Data Clustering
We present a multiobjective genetic clustering approach, in which data points are assigned to clusters based on new line symmetry distance. The proposed algorithm is called multiobjective line symmetry based genetic clustering (MOLGC). Two objective functions, first the Davies-Bouldin (DB) index and second the line symmetry distance based objective functions, are used. The proposed algorithm evolves near-optimal clustering solutions using multiple clustering criteria, without a priori knowledge of the actual number of clusters. The multiple randomized K dimensional (Kd) trees based nearest neighbor search is used to reduce the complexity of finding the closest symmetric points. Experimental results based on several artificial and real data sets show that proposed clustering algorithm can obtain optimal clustering solutions in terms of different cluster quality measures in comparison to existing SBKM and MOCK clustering algorithms.
Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf12 and [LaPb7Bi7](4-). For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the "pure" genetic algorithm. PMID:25296780
Road extracted from satellite imagery have been used for many different purposes, e.g. military, map publishing, transportation, and car navigations, etc. Many method such as, neural network, Knowledge-based, Optimal search, Snake model, Semantic model, Road operator model, etc. was researched to identify road from satellite image, but because of complicated characteristics of road and image itself, and automated road network extraction still remains a challenge problem, and no existing software is able to perform the task reliably. This paper presents a hybrid method which combines Fuzzy-C-Means with back-propagation neural network and knowledge processing technique to detect roads in SPOT image. The basic idea of the paper is "easiest first" principal, and firstly focus to extract local salient road segments most easily and reliably, then use contextual knowledge and supervised back-propagation neural network model to extract fuzzy road segments among salient road segment, and then grouping these extracted pixel as seed point, candidate point, and not-road point, and then according to appropriate knowledge rule to traversal and join, guide the further road link in the whole image. At last, some post-processing steps are taken to refine the result. The resultant image shows this hybrid identification method performs better than only using knowledge-based method or neural network techniques.
Brightest Cluster Galaxy Identification
Brightest cluster galaxies (BCGs) play an important role in several fields of astronomical research. The literature includes many different methods and criteria for identifying the BCG in the cluster, such as choosing the brightest galaxy, the galaxy nearest the X-ray peak, or the galaxy with the most extended profile. Here we examine a sample of 75 clusters from the Archive of Chandra Cluster Entropy Profile Tables (ACCEPT) and the Sloan Digital Sky Survey (SDSS), measuring masked magnitudes and profiles for BCG candidates in each cluster. We first identified galaxies by hand; in 15% of clusters at least one team member selected a different galaxy than the others.We also applied 6 other identification methods to the ACCEPT sample; in 30% of clusters at least one of these methods selected a different galaxy than the other methods. We then developed an algorithm that weighs brightness, profile, and proximity to the X-ray peak and centroid. This algorithm incorporates the advantages of by-hand identification (weighing multiple properties) and automated selection (repeatable and consistent). The BCG population chosen by the algorithm is more uniform in its properties than populations selected by other methods, particularly in the relation between absolute magnitude (a proxy for galaxy mass) and average gas temperature (a proxy for cluster mass). This work supported by a Barry M. Goldwater Scholarship and a Sid Jansma Summer Research Fellowship.
Cluster Seeking Techniques in Pattern Recognition
A cluster seeking technique is defined as a method of dividing data into subsets, called clusters. These clusters contain data points that are similar to each other and different from the elements of other clusters. Various cluster seeking techniques were broken down into seven categories: (1) probabilistic, (2) signal detection, (3) clustering, (4) clumping, (5) eigenvalue, (6) minimal mode seeking, and (7) miscellaneous. Each category is described and one or more algorithms of that type are presented.
Data Clustering Using Evidence Accumulation
We explore the idea of evidence accumulation for com- bining the results of multiple clusterings. Initially, n d dimensional data is decomposed into a large number of compact clusters; the K-means algorithm performs this decomposition, with several clusterings obtained byN random initializations of the K-means. Taking the co- occurrences of pairs of patterns in the same cluster as votes for
Time series clustering analysis of health-promoting behavior
Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.
Bayesian Decision Theoretical Framework for Clustering
In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the…
Document Clustering with Committees Patrick Pantel and Dekang Lin
are low. We present a clustering algorithm called CBC (Clustering By Committee) that is shown to produce). This evaluation measure is more intuitive and easier to interpret than previous evaluation measures. Categories larger. In this paper, we propose a clustering algorithm, CBC (Clustering By Committee), which produces
functional groups of chemical compounds absorb infrared radiation (IR) at characteristic frequencies and the intensities of IR bands depend on their concentration. This technology can detect changes in cellular inspection.
Fully Automated Complementary DNA Microarray Segmentation using a Novel Fuzzy-based Algorithm
2015-01-01
DNA microarray is a powerful approach to study simultaneously, the expression of 1000 of genes in a single experiment. The average value of the fluorescent intensity could be calculated in a microarray experiment. The calculated intensity values are very close in amount to the levels of expression of a particular gene. However, determining the appropriate position of every spot in microarray images is a main challenge, which leads to the accurate classification of normal and abnormal (cancer) cells. In this paper, first a preprocessing approach is performed to eliminate the noise and artifacts available in microarray cells using the nonlinear anisotropic diffusion filtering method. Then, the coordinate center of each spot is positioned utilizing the mathematical morphology operations. Finally, the position of each spot is exactly determined through applying a novel hybrid model based on the principle component analysis and the spatial fuzzy c-means clustering (SFCM) algorithm. Using a Gaussian kernel in SFCM algorithm will lead to improving the quality in complementary DNA microarray segmentation. The performance of the proposed algorithm has been evaluated on the real microarray images, which is available in Stanford Microarray Databases. Results illustrate that the accuracy of microarray cells segmentation in the proposed algorithm reaches to 100% and 98% for noiseless/noisy cells, respectively.
DNA clustering and genome complexity.
Early global measures of genome complexity (power spectra, the analysis of fluctuations in DNA walks or compositional segmentation) uncovered a high degree of complexity in eukaryotic genome sequences. The main evolutionary mechanisms leading to increases in genome complexity (i.e. gene duplication and transposon proliferation) can all potentially produce increases in DNA clustering. To quantify such clustering and provide a genome-wide description of the formed clusters, we developed GenomeCluster, an algorithm able to detect clusters of whatever genome element identified by chromosome coordinates. We obtained a detailed description of clusters for ten categories of human genome elements, including functional (genes, exons, introns), regulatory (CpG islands, TFBSs, enhancers), variant (SNPs) and repeat (Alus, LINE1) elements, as well as DNase hypersensitivity sites. For each category, we located their clusters in the human genome, then quantifying cluster length and composition, and estimated the clustering level as the proportion of clustered genome elements. In average, we found a 27% of elements in clusters, although a considerable variation occurs among different categories. Genes form the lowest number of clusters, but these are the longest ones, both in bp and the average number of components, while the shortest clusters are formed by SNPs. Functional and regulatory elements (genes, CpG islands, TFBSs, enhancers) show the highest clustering level, as compared to DNase sites, repeats (Alus, LINE1) or SNPs. Many of the genome elements we analyzed are known to be composed of clusters of low-level entities. In addition, we found here that the clusters generated by GenomeCluster can be in turn clustered into high-level super-clusters. The observation of 'clusters-within-clusters' parallels the 'domains within domains' phenomenon previously detected through global statistical methods in eukaryotic sequences, and reveals a complex human genome landscape dominated by hierarchical clustering. PMID:25182383
Color segmentation using MDL clustering
This paper describes a procedure for segmentation of color face images. A cluster analysis algorithm uses a subsample of the input image color pixels to detect clusters in color space. The clustering program consists of two parts. The first part searches for a hierarchical clustering using the NIHC algorithm. The second part searches the resultant cluster tree for a level clustering having minimum description length (MDL). One of the primary advantages of the MDL paradigm is that it enables writing robust vision algorithms that do not depend on user-specified threshold parameters or other " magic numbers. " This technical note describes an application of minimal length encoding in the analysis of digitized human face images at the NTT Human Interface Laboratories. We use MDL clustering to segment color images of human faces. For color segmentation we search for clusters in color space. Using only a subsample of points from the original face image our clustering program detects color clusters corresponding to the hair skin and background regions in the image. Then a maximum likelyhood classifier assigns the remaining pixels to each class. The clustering program tends to group small facial features such as the nostrils mouth and eyes together but they can be separated from the larger classes through connected components analysis.
DNA Microarray Data Clustering Based on Temporal Variation: FCV with TSD Preclustering
in the clustering environment (i.e., data-space), which is not achieved by other conventional clustering algorithms of clustering algorithms. Hierarchical clustering (Eisen et al. 1998), self-organizing maps (Tamayo et al. 1999.e., time course vs. comparative study, single or replicated experiment) determines the choice of algorithm
Document Clustering using Particle Swarm Optimization Xiaohui Cui, Thomas E. Potok, Paul Palathingal
a Particle Swarm Optimization (PSO) document clustering algorithm. Contrary to the localized searching of the K-means algorithm, the PSO clustering algorithm performs a globalized search in the entire solution space. In the experiments we conducted, we applied the PSO, K-means and hybrid PSO clustering algorithm
Efficiently Clustering Documents with Committees Patrick Pantel and Dekang Lin
are high and the inter-group similarities are low. We present a clustering algorithm called CBC (Clustering classes (the answer key). This evaluation measure is more intuitive and easier to interpret than previous, CBC (Clustering By Committee), which produces higher quality clusters in document clustering tasks
Efficiently Clustering Documents with Committees Patrick Pantel and Dekang Lin
are high and the intergroup similarities are low. We present a clustering algorithm called CBC (Clustering classes (the answer key). This evaluation measure is more intuitive and easier to interpret than previous, CBC (Clustering By Committee), which produces higher quality clusters in document clustering tasks
Swarm Intelligence in Text Document Clustering
Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
The properties of pixel clusters in dense environments are studied with $\\sqrt{s}$ = 13 TeV proton-proton collisions from the LHC, recorded by ATLAS from June to July 2015. A novel method to evaluate the performance of the artificial neural network used for identifying pixel clusters created by multiple particles is presented. Using this method, the results in data and Monte Carlo simulation are compared. The neural network, as part of the track reconstruction, shows the expected response when used on collimated tracks.
Efficient clustering aggregation based on data fragments.
Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy. PMID:22334025
To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals). PMID:23526258
Enhancing Hand Gesture Recognition using Fuzzy Clustering-based Mixture-of-Experts Model
Cho, Sung-Bae
Time series inference from clustering
2001-06-01
This paper presents a toolbox for analyzing inferences drawn from clustering. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. These classes represent different random vectors. Each random vector is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random vectors. Clustering algorithms are evaluated based on class variance and performance improvement with respect to increasing numbers of experimental replications. The study is presented on a website, which includes error tables and graphs, confusion matrices, principle-component plots, and validation measures. There, the toolbox is applied to gene- expression clustering based on cDNA microarrays using real data.
Hierarchical Dirichlet process model for gene expression clustering
2013-01-01
Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447
A GMBCG galaxy cluster catalog of 55,880 rich clusters from SDSS DR7
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
Hierarchical clustering in minimum spanning trees
The identification of clusters or communities in complex networks is a reappearing problem. The minimum spanning tree (MST), the tree connecting all nodes with minimum total weight, is regarded as an important transport backbone of the original weighted graph. We hypothesize that the clustering of the MST reveals insight in the hierarchical structure of weighted graphs. However, existing theories and algorithms have difficulties to define and identify clusters in trees. Here, we first define clustering in trees and then propose a tree agglomerative hierarchical clustering (TAHC) method for the detection of clusters in MSTs. We then demonstrate that the TAHC method can detect clusters in artificial trees, and also in MSTs of weighted social networks, for which the clusters are in agreement with the previously reported clusters of the original weighted networks. Our results therefore not only indicate that clusters can be found in MSTs, but also that the MSTs contain information about the underlying clusters of the original weighted network.
2013-01-01
Background Potentially inappropriate prescribing in older people is common in primary care and can result in increased morbidity, adverse drug events, hospitalizations and mortality. In Ireland, 36% of those aged 70 years or over received at least one potentially inappropriate medication, with an associated expenditure of over €45 million. The main objective of this study is to determine the effectiveness and acceptability of a complex, multifaceted intervention in reducing the level of potentially inappropriate prescribing in primary care. Methods/design This study is a pragmatic cluster randomized controlled trial, conducted in primary care (OPTI-SCRIPT trial), involving 22 practices (clusters) and 220 patients. Practices will be allocated to intervention or control arms using minimization, with intervention participants receiving a complex multifaceted intervention incorporating academic detailing, medicines review with web-based pharmaceutical treatment algorithms that provide recommended alternative treatment options, and tailored patient information leaflets. Control practices will deliver usual care and receive simple patient-level feedback on potentially inappropriate prescribing. Routinely collected national prescribing data will also be analyzed for nonparticipating practices, acting as a contemporary national control. The primary outcomes are the proportion of participant patients with potentially inappropriate prescribing and the mean number of potentially inappropriate prescriptions per patient. In addition, economic and qualitative evaluations will be conducted. Discussion This study will establish the effectiveness of a multifaceted intervention in reducing potentially inappropriate prescribing in older people in Irish primary care that is generalizable to countries with similar prescribing challenges. Trial registration Current controlled trials ISRCTN41694007 PMID:23497575
Mesh Clustering by Approximating Centroidal Voronoi Tessellation
Cheng, Fuhua "Frank"
Mesh Clustering by Approximating Centroidal Voronoi Tessellation Fengtao Fan Depart. of Computer Science Virginia State University slai@vsu.edu ABSTRACT An elegant and efficient mesh clustering algorithm is pre- sented. The faces of a polygonal mesh are divided into dif- ferent clusters for mesh coarsening
Iterative Projected Clustering by Subspace Mining
Iterative Projected Clustering by Subspace Mining Man Lung Yiu and Nikos Mamoulis Abstract. Recently, several algorithms that discover projected clusters and their associated subspaces have been projected clusters around random points. Based on this, we propose a technique that improves the efficiency
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters
A family of graph-theoretical algorithms based on the minimal spanning tree are capable of detecting several kinds of cluster structure in arbitrary point sets; description of the detected clusters is possible in some cases by extensions of the method. Development of these clustering algorithms was based on examples from two-dimensional space because we wanted to copy the human perception of
CORECLUSTER: A Degeneracy Based Graph Clustering Framework
, biological networks, rec- ommender systems and image segmentation. Due to its im- portance clustering algorithm. Our approach capitalizes on processing the graph in a hierarchi- cal manner provided that preserves its clustering structure, while making the execution of the cho- sen clustering algorithm much
Distance to second cluster as a measure of classification confidence
Scott W. Mitchell; Tarmo K. Remmel; Ferenc Csillag; Michael A. Wulder
Most image classification algorithms rely on computing the distance between the unique spectral signature of a given pixel and a set of possible clusters within an n-dimensional feature space that represents discrete land cover categories. Each scrutinized pixel will ultimately be closest to one of the predefined clusters; different classification algorithms differ in the details of which cluster is considered
Using Greedy algorithm: DBSCAN revisited II.
The density-based clustering algorithm presented is different from the classical Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al., 1996), and has the following advantages: first, Greedy algorithm substitutes for R(*)-tree (Bechmann et al., 1990) in DBSCAN to index the clustering space so that the clustering time cost is decreased to great extent and I/O memory load is reduced as well; second, the merging condition to approach to arbitrary-shaped clusters is designed carefully so that a single threshold can distinguish correctly all clusters in a large spatial dataset though some density-skewed clusters live in it. Finally, authors investigate a robotic navigation and test two artificial datasets by the proposed algorithm to verify its effectiveness and efficiency. PMID:15495334
When children walk on their toes for no known reason, the condition is called Idiopathic Toe Walking (ITW). Assessing the true severity of ITW can be difficult because children can alter their gait while under observation in clinic. The ability to monitor the foot angle during daily life outside of clinic may improve the assessment of ITW. A foot-worn, battery-powered inertial sensing device has been designed to monitor patients' foot angle during daily activities. The monitor includes a 3-axis accelerometer, 2-axis gyroscope, and a low-power microcontroller. The device is necessarily small, with limited battery capacity and processing power. Therefore a high-accuracy but low-complexity inertial sensing algorithm is needed. This paper compares several low-complexity algorithms' aptitude for foot-angle measurement: accelerometer-only measurement, finite impulse response (FIR) and infinite impulse response (IIR) complementary filtering, and a new dynamic predict-correct style algorithm developed using fuzzy c-means clustering. A total of 11 subjects each walked 20 m with the inertial sensing device fixed to one foot; 10 m with normal gait and 10 m simulating toe walking. A cross-validation scheme was used to obtain a low-bias estimate of each algorithm's angle measurement accuracy. The new predict-correct algorithm achieved the lowest angle measurement error: <5° mean error during normal and toe walking. The IIR complementary filtering algorithm achieved almost-as good accuracy with less computational complexity. These two algorithms seem to have good aptitude for the foot-angle measurement problem, and would be good candidates for use in a long-term monitoring device for toe-walking assessment. PMID:24050952
Image segmentation using fuzzy LVQ clustering networks
In this note we formulate image segmentation as a clustering problem. Feature vectors extracted from a raw image are clustered into subregions, thereby segmenting the image. A fuzzy generalization of a Kohonen learning vector quantization (LVQ) which integrates the Fuzzy c-Means (FCM) model with the learning rate and updating strategies of the LVQ is used for this task. This network, which segments images in an unsupervised manner, is thus related to the FCM optimization problem. Numerical examples on photographic and magnetic resonance images are given to illustrate this approach to image segmentation.
for Data Stream Clustering Mingzhou (Joe) Song and Lin Zhang Department of Computer Science New Mexico Introduction The richness of cluster representations of unavailable historical data constitutes the repertoire, and a cluster size. The STREAM algorithm clus- ters each chunk of the data using the LocalSearch algorithm [11
Grid-based DBSCAN Algorithm with Referential Parameters
A new algorithm GRPDBSCAN (Grid-based DBSCAN Algorithm with Referential Parameters) is proposed in this paper. GRPDBSCAN, which combined the grid partition technique and multi-density based clustering algorithm, has improved its efficiency. On the other hand, because the Eps and Minpts parameters of the DBSCAN algorithm were auto-generated, so they were more objective. Experimental results shown that the new algorithm not only can better differentiate between noises and discovery clusters of arbitrary shapes but also have more robust.
A close neighbour algorithm for designing cellular manufacturing systems
The first step in creating a cellular manufacturing system is to identify machine groups and form part families. Clustering and data organization (CDR) algorithms (such as the bond energy algorithm) and array sorting (ARS) methods (such as the rank order clustering algorithm) have been proposed to solve the machine and part grouping problem. However, these methods do not always produce
Absolute classification with unsupervised clustering
An absolute classification algorithm is proposed in which the class definition through training samples or otherwise is required only for a particular class of interest. The absolute classification is considered as a problem of unsupervised clustering when one cluster is known initially. The definitions and statistics of the other classes are automatically developed through the weighted unsupervised clustering procedure, which is developed to keep the cluster corresponding to the class of interest from losing its identity as the class of interest. Once all the classes are developed, a conventional relative classifier such as the maximum-likelihood classifier is used in the classification.
clustering uncertain categorical data, based on the COBWEB conceptual clustering algorithm. Experimental of inaccurate data has continuously been a challenge for many data mining applications. We suggest incorporatingConceptual Clustering Categorical Data with Uncertainty Yuni Xia Indiana University Purdue
In the present work, Acid Mine Drainage (AMD) processes in the Chorrito Stream, which flows into the Cobica River (Iberian Pyrite Belt, Southwest Spain) are characterized by means of clustering techniques based on fuzzy logic. Also, pH behavior in contrast to precipitation is clearly explained, proving that the influence of rainfall inputs on the acidity and, as a result, on the metal load of a riverbed undergoing AMD processes highly depends on the moment when it occurs. In general, the riverbed dynamic behavior is the response to the sum of instant stimuli produced by isolated rainfall, the seasonal memory depending on the moment of the target hydrological year and, finally, the own inertia of the river basin, as a result of an accumulation process caused by age-long mining activity. PMID:15798799
An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis
Clustering is a popular data analysis and data mining technique. A popular technique for clustering is based on k-means such that the data is partitioned into K clusters. However, the k-means algorithm highly depends on the initial state and converges to local optimum solution. This paper presents a new hybrid evolutionary algorithm to solve nonlinear partitional clustering problem. The proposed
Automatic subspace clustering of high dimensional data for data mining applications
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces
Partially supervised speaker clustering.
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance. PMID:21844626
Du, Cheng-Jin; Sun, Da-Wen; Jackman, Patrick; Allen, Paul
2008-12-01
An automatic method for estimating the content of intramuscular fat (IMF) in beef M. longissimus dorsi (LD) was developed using a sequence of image processing algorithm. To extract IMF particles within the LD muscle from structural features of intermuscular fat surrounding the muscle, three steps of image processing algorithm were developed, i.e. bilateral filter for noise removal, kernel fuzzy c-means clustering (KFCM) for segmentation, and vector confidence connected and flood fill for IMF extraction. The technique of bilateral filtering was firstly applied to reduce the noise and enhance the contrast of the beef image. KFCM was then used to segment the filtered beef image into lean, fat, and background. The IMF was finally extracted from the original beef image by using the techniques of vector confidence connected and flood filling. The performance of the algorithm developed was verified by correlation analysis between the IMF characteristics and the percentage of chemically extractable IMF content (P<0.05). Five IMF features are very significantly correlated with the fat content (P<0.001), including count densities of middle (CDMiddle) and large (CDLarge) fat particles, area densities of middle and large fat particles, and total fat area per unit LD area. The highest coefficient is 0.852 for CDLarge. PMID:22063863
Exploring functional connectivity in fMRI via clustering
In this paper we investigate the use of data driven clustering methods for functional connectivity analysis in fMRI. In particular, we consider the k-means and spectral clustering algorithms as alternatives to the commonly ...
Discriminative clustering via extreme learning machine.
Discriminative clustering is an unsupervised learning framework which introduces the discriminative learning rule of supervised classification into clustering. The underlying assumption is that a good partition (clustering) of the data should yield high discrimination, namely, the partitioned data can be easily classified by some classification algorithms. In this paper, we propose three discriminative clustering approaches based on Extreme Learning Machine (ELM). The first algorithm iteratively trains weighted ELM (W-ELM) classifier to gradually maximize the data discrimination. The second and third methods are both built on Fisher's Linear Discriminant Analysis (LDA); but one approach adopts alternative optimization, while the other leverages kernel k-means. We show that the proposed algorithms can be easily implemented, and yield competitive clustering accuracy on real world data sets compared to state-of-the-art clustering methods. PMID:26143036
Segmentation of color images using genetic algorithm with image histogram
This paper proposes a family of color image segmentation algorithms using genetic approach and color similarity threshold in terns of Just noticeable difference. Instead of segmenting and then optimizing, the proposed technique directly uses GA for optimized segmentation of color images. Application of GA on larger size color images is computationally heavy so they are applied on 4D-color image histogram table. The performance of the proposed algorithms is benchmarked on BSD dataset with color histogram based segmentation and Fuzzy C-means Algorithm using Probabilistic Rand Index (PRI). The proposed algorithms yield better analytical and visual results.
The last decade has witnessed a tremendous growth in the area of randomized algorithms.During this period, randomized algorithms went from being a tool in computational number theory to finding widespread application in many types of algorithms. Two benefits of randomization have spearheaded this growth: simplicity and speed. For many applications, a randomized algorithm is the simplest algorithm available, or the
Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing
This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309
Performance Comparisons of PSO based Clustering
In this paper we have investigated the performance of PSO Particle Swarm Optimization based clustering on few real world data sets and one artificial data set. The performances are measured by two metric namely quantization error and inter-cluster distance. The K means clustering algorithm is first implemented for all data sets, the results of which form the basis of comparison of PSO based approaches. We have explored different variants of PSO such as gbest, lbest ring, lbest vonneumann and Hybrid PSO for comparison purposes. The results reveal that PSO based clustering algorithms perform better compared to K means in all data sets.
Structural transitions in clusters.
If one adds more particles to a cluster, the energetically optimal structure is neither preserved nor does it change in a continuous fashion. Instead, one finds several cluster size regions where one structural principle dominates almost without exception, and rather narrow boundary regions in-between. The structure of the solid is usually reached only at relatively large sizes, after more than one structural transition. The occurrence of this general phenomenon of size-dependent structural transitions does not seem to depend on the nature of the particles, it is found for atomic, molecular, homogeneous, and heterogeneous clusters alike. Clearly, it is a collective many-body phenomenon which can in principle be calculated but not understood in a fully reductionistic manner. Actual calculations with sufficient accuracy are not feasible today, because of the enormous computational expense, even when unconventional evolutionary algorithms are employed for global geometry optimization. Therefore, simple rules for cluster structures are highly desirable. In fact, we are dealing here not just with the academic quest for linkages between cluster structure and features of the potential energy surface, but structural transitions in clusters are also of immediate relevance for many natural and industrial processes, ranging from crystal growth all the way to nanotechnology. This article provides an exemplary overview of research on this topic, from simple model systems where first qualitative explanations start to be successful, up to more realistic complex systems which are still beyond our understanding. PMID:19750647
Clustering of financial time series
This paper addresses the topic of classifying financial time series in a fuzzy framework proposing two fuzzy clustering models both based on GARCH models. In general clustering of financial time series, due to their peculiar features, needs the definition of suitable distance measures. At this aim, the first fuzzy clustering model exploits the autoregressive representation of GARCH models and employs, in the framework of a partitioning around medoids algorithm, the classical autoregressive metric. The second fuzzy clustering model, also based on partitioning around medoids algorithm, uses the Caiado distance, a Mahalanobis-like distance, based on estimated GARCH parameters and covariances that takes into account the information about the volatility structure of time series. In order to illustrate the merits of the proposed fuzzy approaches an application to the problem of classifying 29 time series of Euro exchange rates against international currencies is presented and discussed, also comparing the fuzzy models with their crisp version.
DBRS: A Density-Based Spatial Clustering Method with Random Sampling
DBRS: A Density-Based Spatial Clustering Method with Random Sampling
An energy-efficient unequal clustering mechanism for wireless sensor networks
Clustering provides an effective way for prolonging the lifetime of a wireless sensor network. Current clustering algorithms usually utilize two techniques, selecting cluster heads with more residual energy and rotating cluster heads periodically, to distribute the energy consumption among nodes in each cluster and extend the network lifetime. However, they rarely consider the hot spots problem in multihop wireless sensor
An unequal cluster-based routing protocol in wireless sensor networks
Clustering provides an effective method for pro- longing the lifetime of a wireless sensor network. Current clustering algorithms usually utilize two techniques; select- ing cluster heads with more residual energy, and rotating cluster heads periodically to distribute the energy consump- tion among nodes in each cluster and extend the network lifetime. However, they rarely consider the hot spot prob- lem
Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data.
Performing clustering analysis is one of the important research topics in cancer discovery using gene expression profiles, which is crucial in facilitating the successful diagnosis and treatment of cancer. While there are quite a number of research works which perform tumor clustering, few of them considers how to incorporate fuzzy theory together with an optimization process into a consensus clustering framework to improve the performance of clustering analysis. In this paper, we first propose a random double clustering based cluster ensemble framework (RDCCE) to perform tumor clustering based on gene expression data. Specifically, RDCCE generates a set of representative features using a randomly selected clustering algorithm in the ensemble, and then assigns samples to their corresponding clusters based on the grouping results. In addition, we also introduce the random double clustering based fuzzy cluster ensemble framework (RDCFCE), which is designed to improve the performance of RDCCE by integrating the newly proposed fuzzy extension model into the ensemble framework. RDCFCE adopts the normalized cut algorithm as the consensus function to summarize the fuzzy matrices generated by the fuzzy extension models, partition the consensus matrix, and obtain the final result. Finally, adaptive RDCFCE (A-RDCFCE) is proposed to optimize RDCFCE and improve the performance of RDCFCE further by adopting a self-evolutionary process (SEPP) for the parameter set. Experiments on real cancer gene expression profiles indicate that RDCFCE and A-RDCFCE works well on these data sets, and outperform most of the state-of-the-art tumor clustering algorithms. PMID:26357330
SMART: Unique Splitting-While-Merging Framework for Gene Clustering
Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms. PMID:24714159
Histamine headache; Headache - histamine; Migrainous neuralgia; Headache - cluster; Horton's headache ... A cluster headache begins as a severe, sudden headache. The headache commonly strikes 2 to 3 hours after you fall asleep. ...
Clustering with Transitive Distance and K-Means Duality
Chunjing Xu; Jianzhuang Liu; Xiaoou Tang
2007-01-01
On evaluating clustering procedures for use in classification
The problem of evaluating clustering algorithms and their respective computer programs for use in a preprocessing step for classification is addressed. In clustering for classification the probability of correct classification is suggested as the ultimate measure of accuracy on training data. A means of implementing this criterion and a measure of cluster purity are discussed. Examples are given. A procedure for cluster labeling that is based on cluster purity and sample size is presented.
AMIC@: All MIcroarray Clusterings @ once
Geraci, Filippo; Pellegrini, Marco; Renda, M. Elena
The AMIC@ Web Server offers a light-weight multi-method clustering engine for microarray gene-expression data. AMIC@ is a highly interactive tool that stresses user-friendliness and robustness by adopting AJAX technology, thus allowing an effective interleaved execution of different clustering algorithms and inspection of results. Among the salient features AMIC@ offers, there are: (i) automatic file format detection, (ii) suggestions on the number of clusters using a variant of the stability-based method of Tibshirani et al. (iii) intuitive visual inspection of the data via heatmaps and (iv) measurements of the clustering quality using cluster homogeneity. Large data sets can be processed efficiently by selecting algorithms (such as FPF-SB and k-Boost), specifically designed for this purpose. In case of very large data sets, the user can opt for a batch-mode use of the system by means of the Clustering wizard that runs all algorithms at once and delivers the results via email. AMIC@ is freely available and open to all users with no login requirement at the following URL http://bioalgo.iit.cnr.it/amica. PMID:18477631
AMIC@: All MIcroarray Clusterings @ once.
The AMIC@ Web Server offers a light-weight multi-method clustering engine for microarray gene-expression data. AMIC@ is a highly interactive tool that stresses user-friendliness and robustness by adopting AJAX technology, thus allowing an effective interleaved execution of different clustering algorithms and inspection of results. Among the salient features AMIC@ offers, there are: (i) automatic file format detection, (ii) suggestions on the number of clusters using a variant of the stability-based method of Tibshirani et al. (iii) intuitive visual inspection of the data via heatmaps and (iv) measurements of the clustering quality using cluster homogeneity. Large data sets can be processed efficiently by selecting algorithms (such as FPF-SB and k-Boost), specifically designed for this purpose. In case of very large data sets, the user can opt for a batch-mode use of the system by means of the Clustering wizard that runs all algorithms at once and delivers the results via email. AMIC@ is freely available and open to all users with no login requirement at the following URL http://bioalgo.iit.cnr.it/amica. PMID:18477631
Feature Clustering for Accelerating Parallel Coordinate Descent
We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.
\\u000a This article investigates internet commerce security applications of a novel combined method, which uses unsupervised consensus\\u000a clustering algorithms in combination with supervised classification methods. First, a variety of independent clustering algorithms\\u000a are applied to a randomized sample of data. Second, several consensus functions and sophisticated algorithms are used to combine\\u000a these independent clusterings into one final consensus clustering. Third, the
We present an approach to the disambiguation of cluster labels that capitalizes on the notion of semantic similarity to assign WordNet senses to cluster labels. The approach provides interesting insights on how document clustering can provide the basis for developing a novel approach to word sense disambiguation.
This article surveys the state of the art in quantum computer algorithms, including both black-box and non-black-box results. It is infeasible to detail all the known quantum algorithms, so a representative sample is given. This includes a summary of the early quantum algorithms, a description of the Abelian Hidden Subgroup algorithms (including Shor's factoring and discrete logarithm algorithms), quantum searching and amplitude amplification, quantum algorithms for simulating quantum mechanical systems, several non-trivial generalizations of the Abelian Hidden Subgroup Problem (and related techniques), the quantum walk paradigm for quantum algorithms, the paradigm of adiabatic algorithms, a family of ``topological'' algorithms, and algorithms for quantum tasks which cannot be done by a classical computer, followed by a discussion.
A Fast Implementation of the ISOCLUS Algorithm
Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor modification to the ISOCLUS specification. We provide empirical evidence, on both synthetic and Landsat image data sets, that our algorithm's performance is essentially the same as that of ISOCLUS, but with significantly lower running times. We show that our algorithm runs from 3 to 30 times faster than a straightforward implementation of ISOCLUS. Our adaptation of the filtering algorithm involves the efficient computation of a number of cluster statistics that are needed for ISOCLUS, but not for k-means.
Comparison of 2D and 3D Clustering on Short-Axis Magnetic Resonance Images of the Left Ventricle
Whelan, Paul F.
Analyzing geographic clustered response
Merrill, D.W.; Selvin, S.; Mohr, M.S.
1991-08-01
ASteCA - Automated Stellar Cluster Analysis
Perren, Gabriel I; Piatti, Andrés E
We present ASteCA (Automated Stellar Cluster Analysis), a suit of tools designed to fully automatize the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its unce...
Binary Rule Generation via Hamming Clustering
The generation of a set of rules underlying a classification problem is performed by applying a new algorithm, called Hamming Clustering (HC). It reconstructs the and-or expression associated with any Boolean function from a training set of samples. The basic kernel of the method is the generation of clusters of input patterns that belong to the same class and are
On clusterings: Good, bad and spectral
We motivate and develop a natural bicriteria measure for assessing the quality of a clustering that avoids the drawbacks of existing measures. A simple recursive heuristic is shown to have poly-logarithmic worst-case guarantees under the new measure. The main result of the article is the analysis of a popular spectral algorithm. One variant of spectral clustering turns out to have
On Clusterings - Good, Bad and Spectral
We motivate and develop a natural bicriteria measure for assessing the quality of a clustering that avoids the drawbacks of existing measures. A simple recursive heuristic is shown to have poly- logarithmic worst-case guarantees under the new measure. The main result of the article is the analysis of a popular spectral algorithm. One variant of spectral clustering turns out to
Quantum Annealing for Clustering Kenichi Kurihara
Quantum Annealing for Clustering Kenichi Kurihara Google, Tokyo, Japan Shu Tanaka Institute of Tokyo, Tokyo, Japan CREST, Saitama, Japan Abstract This paper studies quantum annealing (QA) for clustering, which can be seen as an exten- sion of simulated annealing (SA). We derive a QA algorithm
Clustering Binary Data in the Presence of Masking Variables
2004-01-01
A number of important applications require the clustering of binary data sets. Traditional nonhierarchical cluster analysis techniques, such as the popular K-means algorithm, can often be successfully applied to these data sets. However, the presence of masking variables in a data set can impede the ability of the K-means algorithm to recover the…
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632
Efficient and Accurate Clustering for Large-Scale Genetic Mapping
Efficient and Accurate Clustering for Large-Scale Genetic Mapping
Clustering II Hierarchical Clustering
/Proximity Matrix ... p1 p2 p3 p4 p9 p10 p11 p12 #12;Intermediate State Â· After some merging steps, we have some clusters C1 C4 C2 C5 C3 C2C1 C1 C3 C5 C4 C2 C3 C4 C5 Distance/Proximity Matrix ... p1 p2 p3 p4 p9 p10 p11 p. C1 C4 C2 C5 C3 C2C1 C1 C3 C5 C4 C2 C3 C4 C5 Distance/Proximity Matrix ... p1 p2 p3 p4 p9 p10 p11 p12
Wittkop, Tobias; Baumbach, Jan; Lobo, Francisco P; Rahmann, Sven
Background Detecting groups of functionally related proteins from their amino acid sequence alone has been a long-standing challenge in computational genome research. Several clustering approaches, following different strategies, have been published to attack this problem. Today, new sequencing technologies provide huge amounts of sequence data that has to be efficiently clustered with constant or increased accuracy, at increased speed. Results We advocate that the model of weighted cluster editing, also known as transitive graph projection is well-suited to protein clustering. We present the FORCE heuristic that is based on transitive graph projection and clusters arbitrary sets of objects, given pairwise similarity measures. In particular, we apply FORCE to the problem of protein clustering and show that it outperforms the most popular existing clustering tools (Spectral clustering, TribeMCL, GeneRAGE, Hierarchical clustering, and Affinity Propagation). Furthermore, we show that FORCE is able to handle huge datasets by calculating clusters for all 192 187 prokaryotic protein sequences (66 organisms) obtained from the COG database. Finally, FORCE is integrated into the corynebacterial reference database CoryneRegNet. Conclusion FORCE is an applicable alternative to existing clustering algorithms. Its theoretical foundation, weighted cluster editing, can outperform other clustering paradigms on protein homology clustering. FORCE is open source and implemented in Java. The software, including the source code, the clustering results for COG and CoryneRegNet, and all evaluation datasets are available at . PMID:17941985
Temporal event clustering for digital photo collections
Matthew L. Cooper; Jonathan Foote; Andreas Girgensohn; Lynn Wilcox
2003-01-01
We present similarity-based methods to cluster digital photos by time and image content. The approach is general, unsupervised, and makes minimal assumptions regarding the structure or statistics of the photo collection. We present results for the algorithm based solely on temporal similarity, and jointly on temporal and content-based similarity. We also describe a supervised algorithm based on learning vector quantization.
Constrained spectral clustering under a local proximity structure assumption
2005-01-01
This work focuses on incorporating pairwise constraints into a spectral clustering algorithm. A new constrained spectral clustering method is proposed, as well as an active constraint acquisition technique and a heuristic for parameter selection. We demonstrate that our constrained spectral clustering method, CSC, works well when the data exhibits what we term local proximity structure.
Stereotyping: improving particle swarm performance with cluster analysis
Individuals in the particle swarm population were “stereotyped” by cluster analysis of their previous best positions. The cluster centers then were substituted for the individuals' and neighbors' best previous positions in the algorithm. The experiments, which were inspired by the social-psychological metaphor of social stereotyping, found that performance could be generally improved by substituting individuals', but not neighbors', cluster centers
Training Support Vector Machines via Adaptive Clustering Daniel Boley
Training Support Vector Machines via Adaptive Clustering Daniel Boley Dongwei Cao Abstract Training support vector machines involves a huge optimiza- tion problem and many specially designed algorithms have- joint clusters. Then, the representatives of these clusters are used to train an initial support vector
Textual Article Clustering in Newspaper Pages
Textual Article Clustering in Newspaper Pages Marco Aiello & Andrea Pegoretti Dep. of Information@dit.unitn.it andpego@supereva.it Abstract In the analysis of a newspaper page an important step is the clustering processing techniques to cluster articles in newspaper pages. Based on the complexity of the three algorithms
Clustering Very Large Data Sets with Principal Direction Divisive Partitioning
Boley, Daniel
CLUSTERING BREAST CANCER DATA BY CONSENSUS OF DIFFERENT VALIDITY INDICES
Aickelin, Uwe
CLUSTERING BREAST CANCER DATA BY CONSENSUS OF DIFFERENT VALIDITY INDICES D. Soria , J.M. Garibaldi and Mathematical Sciences, Liverpool John Moores University, UK Keywords: Clustering algorithms, Breast cancer, Va several different clustering techniques and ap- ply them to a particular data set of breast cancer data
DEDICATED FILTER FOR DEFECTS CLUSTERING IN RADIOGRAPHIC IMAGE
Defect clusters such as linear or clustered porosity are in some cases even more important than single flaws. This paper presents two methods of defect clustering and algorithm for calculation of distances between flaws in digital radiographic image. Dedicated lookup table based filter is used for calculation of distances between objects in the specified range. For defect clustering two functions were developed. First one is based on MMD (Minimum Mean Distance) algorithm. Second one uses hierarchical procedures for clustering defects of various types, shapes and size.
Low-Power Clustering with Minimum Logic Replication for Coarse-grained, Antifuse based FPGAs
Pedram, Massoud
NSDL National Science Digital Library
CSC 325. (MAT 325) Numerical Algorithms (3) Prerequisite: CSC 112 or 121, MAT 162. An introduction to the numerical algorithms fundamental to scientific computer work. Includes elementary discussion of error, polynomial interpolation, quadrature, linear systems of equations, solution of nonlinear equations and numerical solution of ordinary differential equations. The algorithmic approach and the efficient use of the computer are emphasized.
On Discovery of Extremely Low-Dimensional Clusters using Semi-Supervised Projected Clustering
Cheung, David Wai-lok
On Discovery of Extremely Low-Dimensional Clusters using Semi-Supervised Projected Clustering
2010-02-01
Maximum margin clustering (MMC) is a newly proposed clustering method which has shown promising performance in recent studies. It extends the computational techniques of support vector machine (SVM) to the unsupervised scenario. Traditionally, MMC is formulated as a nonconvex integer programming problem which makes it difficult to solve. Several methods have been proposed in the literature to solve the MMC problem based on either semidefinite programming (SDP) or alternating optimization. However, these methods are still time demanding when handling large scale data sets, which limits its application in real-world problems. In this paper, we propose a cutting plane maximum margin clustering (CPMMC) algorithm. It first decomposes the nonconvex MMC problem into a series of convex subproblems by making use of the constrained concave-convex procedure (CCCP), then for each subproblem, our algorithm adopts the cutting plane algorithm to solve it. Moreover, we show that the CPMMC algorithm takes O(sn) time to converge with guaranteed accuracy, where n is the number of samples in the data set and s is the sparsity of the data set, i.e., the average number of nonzero features of the data samples. We also derive the multiclass version of our CPMMC algorithm. Experimental evaluations on several real-world data sets show that CPMMC performs better than existing MMC methods, both in efficiency and accuracy. PMID:20083456
Paul Gray
Content prepared for the Supercomputing 2002 session on "Using Clustering Technologies in the Classroom". Contains a series of exercises for teaching parallel computing concepts through kinesthetic activities.
Landscape of Web Search Results Clustering Algorithms
\\u000a Searching for information on the Webhas attracted great attention in many research com-communities. Due to the enormous size\\u000a of the Web and low precision of user queries, results returned from present web search engines can reach hundreds or even\\u000a hundreds of thousands documents. Therefore, finding the right information can be difficult if not impossible. One approach\\u000a that tries to solve
Clustering Algorithm for Mutually Constraining Heterogeneous Features
Parallel squared error clustering on hypercube arrays
Rivera, F.F.; Zapata, E.L. ); Ismail, M.A. )
Though new parallel algorithms are continually appearing in the literature, it is still quite rare for them to be capable of dealing with problems of sizes that do not conveniently fit the machine for which they are designed. This article presents the algorithm PSEC, a parallel squared error clustering algorithm for hypercube SIMD computers of arbitrary cube dimension with local memory. PSEC owes its flexibility to the association of each of the three dimensions of the problem (numbers of data points, features, and clusters) with a distinct subset of the dimensions of the hypercube.
A global-local approach to cluster analysis
A global to local philosophy is presented as a methodology for cluster analysis. The global algorithm is presented as an estimator of initial cluster centers, while the local algorithm is presented as a refinement procedure of the global algorithm's estimate. Global-local techniques are discussed and experimental results are presented. This research was sponsored in part by NASA Grant NAS9-12931 and
Image texture classification using a manifold-distance-based evolutionary clustering method
, and a genetic-algorithm-based clustering technique in partitioning most of the test problems. © 2008 Society; texture features; evolutionary algorithms; genetic algorithms; clustering; dissimilarity measure. Paper class labels to the feature vectors. These approaches can be cat- egorized into two groups:4
2013-02-01
Taxonomic clustering of species from millions of DNA fragments sequenced from their genomes is an important and frequently arising problem in metagenomics. In this paper, we present a parallel algorithm for taxonomic clustering of large metagenomic samples with support for overlapping clusters. We develop sketching techniques, akin to those created for web document clustering, to deduce significant similarities between pairs of sequences without resorting to expensive all vs. all comparison. We formulate the metagenomic classification problem as that of maximal quasi-clique enumeration in the resulting similarity graph, at multiple levels of the hierarchy as prescribed by different similarity thresholds. We cast execution of the underlying algorithmic steps as applications of the map-reduce framework to achieve a cloud ready implementation. We show that the resulting framework can produce high quality clustering of metagenomic samples consisting of millions of reads, in reasonable time limits, when executed on a modest size cluster. PMID:23427983
Bipartite graph partitioning and data clustering
Many data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, the authors propose a new data clustering method based on partitioning the underlying biopartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. They show that an approximate solution to the minimization problem can be obtained by computing a partial singular value decomposition (SVD) of the associated edge weight matrix of the bipartite graph. They point out the connection of their clustering algorithm to correspondence analysis used in multivariate analysis. They also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, they apply their clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency.
MIP Reconstruction Techniques and Minimum Spanning Tree Clustering
The development of a tracking algorithm for minimum ionizing particles in the calorimeter and of a clustering algorithm based on the Minimum Spanning Tree approach are described. They do not depend on information from the central tracking system. Both are important components of a particle flow algorithm currently under development.
Top 10 Algorithms in Data Mining Xindong Wu ( )
: Xindong Wu and Vipin Kumar 18 Candidates (2) Clustering #11. K-Means: MacQueen, J. B., Some methods1 Top 10 Algorithms in Data Mining Xindong Wu ( ) Department of Computer Science University of Vermont, USA; #12;2 Top 10 Algorithms in Data Mining: Xindong Wu and Vipin Kumar "Top 10 Algorithms
A New Image Representation Algorithm Inspired by Image Submodality
A New Image Representation Algorithm Inspired by Image Submodality
Cluster Analysis of Data from the Particle Analysis by Laser Mass Spectrometry (PALMS) Instrument
We describe the use of a hierarchical clustering algorithm on mass spectra of single particles. In this method, the first cluster is found by searching a data set for the two most similar mass spectra. Then the next most similar spectra or clusters are combined sequentially until a stopping condition is met. Chemically reasonable clusters were obtained for several sets
Scalable Parallel Density-based Clustering and Applications
2014-04-01
Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
Penetrating 25,000 light-years of obscuring dust and myriad stars, NASA's Hubble Space Telescope has provided the clearest view yet of one of the largest young clusters of stars inside our Milky Way galaxy, located less than 100 light-years from the very center of the Galaxy. Having the equivalent mass greater than 10,000 stars like our sun, the monster cluster is ten times larger than typical young star clusters scattered throughout our Milky Way. It is destined to be ripped apart in just a few million years by gravitational tidal forces in the galaxy's core. But in its brief lifetime it shines more brightly than any other star cluster in the Galaxy. Quintuplet Cluster is 4 million years old. It has stars on the verge of blowing up as supernovae. It is the home of the brightest star seen in the galaxy, called the Pistol star. This image was taken in infrared light by Hubble's NICMOS camera in September 1997. The false colors correspond to infrared wavelengths. The galactic center stars are white, the red stars are enshrouded in dust or behind dust, and the blue stars are foreground stars between us and the Milky Way's center. The cluster is hidden from direct view behind black dust clouds in the constellation Sagittarius. If the cluster could be seen from earth it would appear to the naked eye as a 3rd magnitude star, 1/6th of a full moon's diameter apart.
Large-Scale Multi-Dimensional Document Clustering on GPU Clusters
Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive to the initial state. One limitation of this approach is that the algorithmic complexity is inherently quadratic in the number of documents. As a result, execution time becomes a bottleneck with large number of documents. In this paper, we assess the benefits of exploiting the computational power of Beowulf-like clusters equipped with contemporary Graphics Processing Units (GPUs) as a means to significantly reduce the runtime of flocking-based document clustering. Our framework scales up to over one million documents processed simultaneously in a sixteennode GPU cluster. Results are also compared to a four-node cluster with higher-end GPUs. On these clusters, we observe 30X-50X speedups, which demonstrates the potential of GPU clusters to efficiently solve massive data mining problems. Such speedups combined with the scalability potential and accelerator-based parallelization are unique in the domain of document-based data mining, to the best of our knowledge.
Self-stabilizing (k,r)-clustering in Clock Rate-limited Systems Andreas Larsson
and temporarily broken assumptions. Clustering nodes within ad-hoc networks can help forming backbones. An algorithm for clustering nodes together in an ad-hoc network serves an important role. Back bones
MODEL-BASED CLUSTERING FOR CLASSIFICATION OF AQUATIC SYSTEMS AND DIAGNOSIS OF ECOLOGICAL STRESS
Clustering approaches were developed using the classification likelihood, the mixture likelihood, and also using a randomization approach with a model index. Using a clustering approach based on the mixture and classification likelihoods, we have developed an algorithm that...
Clausen, Michael
protein analysis and from linguistics. The algorithm for pairwise data clustering is used to segment textured images. #12; "Pairwise Data Clustering ...", Hofmann & Buhmann , (March 14, 1996) 2 1 Introduction
2014-01-01
Biological networks obtained by high-throughput profiling or human curation are typically noisy. For functional module identification, single network clustering algorithms may not yield accurate and robust results. In order to borrow information across multiple sources to alleviate such problems due to data quality, we propose a new joint network clustering algorithm ASModel in this paper. We construct an integrated network to combine network topological information based on protein-protein interaction (PPI) datasets and homological information introduced by constituent similarity between proteins across networks. A novel random walk strategy on the integrated network is developed for joint network clustering and an optimization problem is formulated by searching for low conductance sets defined on the derived transition matrix of the random walk, which fuses both topology and homology information. The optimization problem of joint clustering is solved by a derived spectral clustering algorithm. Network clustering using several state-of-the-art algorithms has been implemented to both PPI networks within the same species (two yeast PPI networks and two human PPI networks) and those from different species (a yeast PPI network and a human PPI network). Experimental results demonstrate that ASModel outperforms the existing single network clustering algorithms as well as another recent joint clustering algorithm in terms of complex prediction and Gene Ontology (GO) enrichment analysis. PMID:24565376
Explorations in Multiwavelength Cluster Detection Using Chandra
We report the detection of several previously unknown serendipitous X-ray clusters, and present a new method for detecting clusters in multiband optical data. We first search 62 Chandra observations retrieved from archival data in ChaMP, the Chandra Multiwavelength Project, for extended sources, and use optical data to confirm new detections by the identification of red sequences. Five previously known clusters (three from earlier ChaMP searches) were detected, confirming the utility of our technique. We detect two new clusters: (1) a low-luminosity system merging with a more massive, previously known cluster, and (2) a high luminosity cluster at z=0.48. We also describe a new method for detecting clusters in optical survey data using the Voronoi Tessellation and Percolation (VTP) algorithm in combination with red sequence-based color filtration. In tests of the algorithm on fields containing known clusters, our color-filtered VTP successfully detects a significant fraction of optically faint, X-ray selected clusters. Results of running filtered VTP on the full ChaMP optical database are forthcoming. This project was supported by the NSF REU program under grant AST-9731923 and the Smithsonian Astrophysical Observatory. We gratefully acknowledge support for the ChaMP from NASA under CXC archival research grant AR2-3009X.
This proposal is a specific response to the strategic goal of NASA's research program to "discover how the universe works and explore how the universe evolved into its present form." Towards this goal, we propose to mine the Spitzer archive for all observations of galaxy groups and clusters for the purpose of studying galaxy evolution in clusters, contamination rates for Sunyaev Zeldovich cluster surveys, and to provide a database of Spitzer observed clusters to the broader community. Funding from this proposal will go towards two years of support for a Postdoc to do this work. After searching the Spitzer Heritage Archive, we have found 194 unique galaxy groups and clusters that have data from both the Infrared array camera (IRAC; Fazio et al. 2004) at 3.6 - 8 microns and the multiband imaging photometer for Spitzer (MIPS; Rieke et al. 2004) at 24microns. This large sample will add value beyond the individual datasets because it will be a larger sample of IR clusters than ever before and will have sufficient diversity in mass, redshift, and dynamical state to allow us to differentiate amongst the effects of these cluster properties. An infrared sample is important because it is unaffected by dust extinction while at the same time is an excellent measure of both stellar mass (IRAC wavelengths) and star formation rate (MIPS wavelengths). Additionally, IRAC can be used to differentiate star forming galaxies (SFG) from active galactic nuclei (AGN), due to their different spectral shapes in this wavelength regime. Specifically, we intend to identify SFG and AGN in galaxy groups and clusters. Groups and clusters differ from the field because the galaxy densities are higher, there is a large potential well due mainly to the mass of the dark matter, and there is hot X-ray gas (the intracluster medium; ICM). We will examine the impact of these differences in environment on galaxy formation by comparing cluster properties of AGN and SFG to those in the field. Also, we will examine the effect that evolutions of cluster redshift and dynamical state have on SFG and AGN in groups and clusters. In addition to environment, we will study the timescales of chemical enrichment of the ICM, using the SFG and AGN as tracers of processes that can transport metals outside of galaxies. Cosmological parameters can be measured based on observing galaxy clusters as signposts of the growth of structure in the universe. The best way to select a redshift independent sample is to use the SZ effect with mm observations to detect a shift in the cosmic microwave background spectrum as those photons scatter off hot gas in clusters. However, such mm observations are contaminated by the emission of SFG and AGN. We intend to characterize the magnitude of this effect on SZ surveys by understanding the frequency, radial distribution, and redshift distribution of these galaxies in clusters. Lastly, a compiled cluster catalog of all Spitzer observed clusters would be useful to the broader astronomical community. We plan to incorporate ancillary multi-wavelength data, where available, and to both publish our catalog in journals, and work with NED to make the catalog easily accessible in an efficient manner by the community.
2007-04-11
To use galaxy clusters as a cosmological probe, it is important to account for their triaxiality. Assuming that the triaxial shapes of galaxy clusters are induced by the tidal interaction with the surrounding matter, Lee and Kang recently developed a reconstruction algorithm for the measurement of the axial ratio of a triaxial cluster. We examine the validity of this reconstruction algorithm by performing an observational test of it with the Virgo cluster as a target. We first modify the LK06 algorithm by incorporating the two dimensional projection effect. Then, we analyze the 1275 member galaxies from the Virgo Cluster Catalogue and find the projected direction of the Virgo cluster major axis by measuring the anisotropy in the spatial distribution of the member galaxies in the two dimensional projected plane. Applying the modified reconstruction algorithm to the analyzed data, we find that the axial ratio of the triaxial Virgo cluster is (1: 0.54 : 0.73). This result is consistent with the recent observational report from the Virgo Cluster Survey, proving the robustness of the reconstruction algorithm. It is also found that at the inner radii the shape tends to be more like prolate. We discuss the possible effect of the Virgo cluster triaxiality on the mass estimation.
2012-03-01
There are many examples of clustering in astronomy. Stars in our own galaxy are often seen as being gravitationally bound into tight globular or open clusters. The Solar System's Trojan asteroids cluster at the gravitational Langrangian in front of Jupiter’s orbit. On the largest of scales, we find gravitationally bound clusters of galaxies, the Virgo cluster (in the constellation of Virgo at a distance of ˜50 million light years) being a prime nearby example. The Virgo cluster subtends an angle of nearly 8? on the sky and is known to contain over a thousand member galaxies. Galaxy clusters play an important role in our understanding of theUniverse. Clusters exist at peaks in the three-dimensional large-scale matter density field. Their sky (2D) locations are easy to detect in astronomical imaging data and their mean galaxy redshifts (redshift is related to the third spatial dimension: distance) are often better (spectroscopically) and cheaper (photometrically) when compared with the entire galaxy population in large sky surveys. Photometric redshift (z) [Photometric techniques use the broad band filter magnitudes of a galaxy to estimate the redshift. Spectroscopic techniques use the galaxy spectra and emission/absorption line features to measure the redshift] determinations of galaxies within clusters are accurate to better than delta_z = 0.05 [7] and when studied as a cluster population, the central galaxies form a line in color-magnitude space (called the the E/S0 ridgeline and visible in Figure 16.3) that contains galaxies with similar stellar populations [15]. The shape of this E/S0 ridgeline enables astronomers to measure the cluster redshift to within delta_z = 0.01 [23]. The most accurate cluster redshift determinations come from spectroscopy of the member galaxies, where only a fraction of the members need to be spectroscopically observed [25,42] to get an accurate redshift to the whole system. If light traces mass in the Universe, then the locations of galaxy clusters will be at locations of the peaks in the true underlying (mostly) dark matter density field. Kaiser (1984) [19] called this the high-peak model, which we demonstrate in Figure 16.1. We show a two-dimensional representation of a density field created by summing plane-waves with a predetermined power and with random wave-vector directions. In the left panel, we plot only the largest modes, where we see the density peaks (black) and valleys (white) in the combined field. In the right panel, we allow for smaller modes. You can see that the highest density peaks in the left panel contain smaller-scale, but still high-density peaks. These are the locations of future galaxy clusters. The bottom panel shows just these cluster-scale peaks. As you can see, the peaks themselves are clustered, and instead of just one large high-density peak in the original density field (see the left panel), the smaller modes show that six peaks are "born" within the broader, underlying large-scale density modes. This exemplifies the "bias" or amplified structure that is traced by galaxy clusters [19]. Clusters are rare, easy to find, and their member galaxies provide good distance estimates. In combination with their amplified clustering signal described above, galaxy clusters are considered an efficient and precise tracer of the large-scale matter density field in the Universe. Galaxy clusters can also be used to measure the baryon content of the Universe [43]. They can be used to identify gravitational lenses [38] and map the distribution of matter in clusters. The number and spatial distribution of galaxy clusters can be used to constrain cosmological parameters, like the fraction of the energy density in the Universe due to matter (Omega_matter) or the variation in the density field on fixed physical scales (sigma_8) [26,33]. The individual clusters act as “Island Universes” and as such are laboratories here we can study the evolution of the properties of the cluster, like the hot, gaseous intra-cluster medium or shapes, colors, and star-
Multiple Manifold Clustering Using Curvature Constrained Path
The problem of multiple surface clustering is a challenging task, particularly when the surfaces intersect. Available methods such as Isomap fail to capture the true shape of the surface near by the intersection and result in incorrect clustering. The Isomap algorithm uses shortest path between points. The main draw back of the shortest path algorithm is due to the lack of curvature constrained where causes to have a path between points on different surfaces. In this paper we tackle this problem by imposing a curvature constraint to the shortest path algorithm used in Isomap. The algorithm chooses several landmark nodes at random and then checks whether there is a curvature constrained path between each landmark node and every other node in the neighborhood graph. We build a binary feature vector for each point where each entry represents the connectivity of that point to a particular landmark. Then the binary feature vectors could be used as a input of conventional clustering algorithm such as hierarchical clustering. We apply our method to simulated and some real datasets and show, it performs comparably to the best methods such as K-manifold and spectral multi-manifold clustering. PMID:26375819
ERIC Educational Resources Information Center
The 15 occupational clusters (transportation, fine arts and humanities, communications and media, personal service occupations, construction, hospitality and recreation, health occupations, marine science occupations, consumer and homemaking-related occupations, agribusiness and natural resources, environment, public service, business and office…
Clustering memes in social media streams
The problem of clustering content in social media has pervasive applications, including the identification of discussion topics, event detection, and content recommendation. Here we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A pre-clustering procedure, namely protomeme detection, first isolates atomic tokens of information carried by the tweets. Protomemes are thereafter aggregated, based on multiple similarity measures, to obtain memes as cohesive groups of tweets reflecting actual concepts or topics of discussion. The clustering algorithm takes into account various dimensions of the data and metadata, including natural language, the social network, and the patterns of information diffusion. As a result, our system can build clusters of semantically, structurally, and topically related tweets. The clustering process is based on a variant of Online K-means that incorporates a memory mechanism, used to "forget" old memes and replace them o...