Note: This page contains sample records for the topic c-means clustering algorithm from Science.gov.
While these samples are representative of the content of Science.gov,
they are not comprehensive nor are they the most current set.
We encourage you to perform a real-time search of Science.gov
to obtain the most current and comprehensive results. Last update: November 12, 2013.

This paper reports the results of a numerical comparison of two versions of the fuzzy c-means (FCM) clusteringalgorithms. In particular, we propose and exemplify an approximate fuzzy c-means (AFCM) implementation based upon replacing the necessary ``exact'' variates in the FCM equation with integer-valued or real-valued estimates. This approximation enables AFCM to exploit a lookup table approach for computing Euclidean

Robert L. Cannon; Jitendra V. Dave; James C. Bezdek

Clustering approach is widely used in biomedical applications particularly for brain tumor detection in abnormal magnetic resonance (MRI) images. Fuzzy clustering using fuzzy C-means (FCM) algorithm proved to be superior over the other clustering approaches in terms of segmentation efficiency. But the major drawback of the FCM algorithm is the huge computational time required for convergence. The effectiveness of the

Image segmentation remains one of the major challenges in image analysis and computer vision. Fuzzy clustering, as a soft segmentation method, has been widely studied and successfully applied in mage clustering and segmentation. The fuzzy c-means (FCM) algorithm is the most popular method used in mage segmentation. However, most clusteringalgorithms such as the k-means and the FCM clusteringalgorithms search for the final clusters values based on the predetermined initial centers. The FCM clusteringalgorithms does not consider the space information of pixels and is sensitive to noise. In the paper, presents a new fuzzy c-means (FCM) algorithm with adaptive evolutionary programming that provides image clustering. The features of this algorithm are: 1) firstly, it need not predetermined initial centers. Evolutionary programming will help FCM search for better center and escape bad centers at local minima. Secondly, the spatial distance and the Euclidean distance is also considered in the FCM clustering. So this algorithm is more robust to the noises. Thirdly, the adaptive evolutionary programming is proposed. The mutation rule is adaptively changed with learning the useful knowledge in the evolving process. Experiment results shows that the new image segmentation algorithm is effective. It is providing robustness to noisy images.

In this paper, we propose an improved fuzzy c-means (FCM) algorithm based on cluster height information to deal with the sensitivity of unbalanced sized clusters in FCM. As we know, cluster size sensitivity is an major drawback of FCM, which tends to balance the cluster sizes during iteration, so the center of smaller cluster might be drawn to the adjacent larger one, which will lead to bad classification. To overcome this problem, the cluster height information is considered and introduced to the distance function to adjust the conventional Euclidean distance, thus to control the effect on classification from cluster size difference. Experimental results demonstrate that our algorithm can obtain good clustering results in spite of great size difference, while traditional FCM cannot work well in such case. The improved FCM has shown its potential for extracting small clusters, especially in medical image segmentation.

The Fuzzy C-Means (FCM) algorithm is commonly used for clustering. The weighting exponent tn and the clustering number C are the important parameters in FCM algorithm. Conventional fuzzy clustering method must use both of the two prespecified parameters, so in this paper we analyses the original algorithm and studies on the optimal selection methods of the m and c by

In a recent paper, Krinidis and Chatzis proposed a variation of fuzzy c-meansalgorithm for image clustering. The local spatial and gray-level information are incorporated in a fuzzy way through an energy function. The local minimizers of the designed energy function to obtain the fuzzy membership of each pixel and cluster centers are proposed. In this paper, it is shown that the local minimizers of Krinidis and Chatzis to obtain the fuzzy membership and the cluster centers in an iterative manner are not exclusively solutions for true local minimizers of their designed energy function. Thus, the local minimizers of Krinidis and Chatzis do not converge to the correct local minima of the designed energy function not because of tackling to the local minima, but because of the design of energy function. PMID:23144036

In this paper we consider the use of fuzzy modelling in the context of econometric analysis of both time-series and cross-section data. We discuss and demonstrate a semi-parametric methodology for model identification and estimation that is based on the Fuzzy c-Meansalgorithm that is widely used in the context of pattern recognition, and the Takagi-Sugeno approach to modelling fuzzy systems.

This study proposes an inverse solution algorithm through which both the aquifer parameters and the zone structure of these parameters can be determined based on a given set of observations on piezometric heads. In the zone structure identification problem fuzzy c-means (FCM) clustering method is used. The association of the zone structure with the transmissivity distribution is accomplished through an

Dynamic Single Photon Emission Computed Tomography (SPECT) has the potential to quantitatively estimate physiological parameters by fitting compartment models to the tracer kinetics. The generalized linear least square method (GLLS) is an efficient method to estimate unbiased kinetic parameters and parametric images. However, due to the low sensitivity of SPECT, noisy data can cause voxel-wise parameter estimation by GLLS to fail. Fuzzy C-Mean (FCM) clustering and modified FCM, which also utilizes information from the immediate neighboring voxels, are proposed to improve the voxel-wise parameter estimation of GLLS. Monte Carlo simulations were performed to generate dynamic SPECT data with different noise levels and processed by general and modified FCM clustering. Parametric images were estimated by Logan and Yokoi graphical analysis and GLLS. The influx rate (K1), volume of distribution (Vd) were estimated for the cerebellum, thalamus and frontal cortex. Our results show that (1) FCM reduces the bias and improves the reliability of parameter estimates for noisy data, (2) GLLS provides estimates of micro parameters (K1-k4) as well as macro parameters, such as volume of distribution (Vd) and binding potential (BP1 & BP2) and (3) FCM clustering incorporating neighboring voxel information does not improve the parameter estimates, but improves noise in the parametric images. These findings indicated that it is desirable for pre-segmentation with traditional FCM clustering to generate voxel-wise parametric images with GLLS from dynamic SPECT data.

Clusteringalgorithms for streaming data sets are gaining importance due to the availability of large data streams from different sources. Recently a number of streaming algorithms have been proposed using crisp algorithms such as hard cmeans or its variants. The crisp cases may not be easily generalized to fuzzy cases as these two groups of algorithms try to optimize

This study proposes an inverse solution algorithm through which both the aquifer parameters and the zone structure of these parameters can be determined based on a given set of observations on piezometric heads. In the zone structure identification problem fuzzy c-means ( FCM) clustering method is used. The association of the zone structure with the transmissivity distribution is accomplished through an optimization model. The meta-heuristic harmony search ( HS) algorithm, which is conceptualized using the musical process of searching for a perfect state of harmony, is used as an optimization technique. The optimum parameter zone structure is identified based on three criteria which are the residual error, parameter uncertainty, and structure discrimination. A numerical example given in the literature is solved to demonstrate the performance of the proposed algorithm. Also, a sensitivity analysis is performed to test the performance of the HS algorithm for different sets of solution parameters. Results indicate that the proposed solution algorithm is an effective way in the simultaneous identification of aquifer parameters and their corresponding zone structures.

Generally fuzzy c-meansalgorithm is one proved that very well suited for remote sensing image segmentation, exhibited sensitivity to the initial guess with regard to both speed and stability. But it also showed sensitivity to noise. This paper proposes a fully automatic technique to obtain image clusters. A modified fuzzy c-means classification algorithm is used to provide a fuzzy partition.

In this paper, we investigate three types of c-meansclusteringalgorithms with a conditionally positive definite kernel. One is based on hard c-means, and the others are based on standard and entropy-regularized fuzzy c -means. First, based on a conditionally positive definite kernel describing a squared Euclidean distance between data in the feature space, these algorithms are derived from revised

Clustering is the process of organizing data objects into a set of disjoint classes called clusters. The objective of this paper is to develop an enhanced k-means and kernelized fuzzy c-means for a segmentation of brain magnetic resonance images. Performance of iterative clusteringalgorithms which converges to numerous local minima depend highly on initial cluster centers. In general the clustering

In this paper, we are proposing a method for segmentation of PAP (Papinocolaou) smear images using Fuzzy C-MeanAlgorithm (FCM) and analysis of the segmented images based on shape and size criteria. The traditional FCM algorithm used in the present study is modified by replacing the Euclidean distance metric by Mahalanobis distance metric. Further, the computation of the cluster center

\\u000a This paper describes a new clusteringalgorithm for color image segmentation. We combine the classical fuzzy c-meansalgorithm\\u000a (FCM) with a genetic algorithm (GA), and we modify the objective function of the FCM for taking into account the spatial information\\u000a of image data and the intensity inhomogeneities. An application to medical images is presented. Experiments show that the\\u000a proposed algorithm

Lucia Ballerini; Leonardo Bocchi; Carina B. Johansson

This paper deals with the implementation of Simple Algorithm for detection of range and shape of tumor in brain MR images. Tumor is an uncontrolled growth of tissues in any part of the body. Tumors are of different types and they have different Characteristics and different treatment. As it is known, brain tumor is inherently serious and life-threatening because of

In this paper, we study a kernel extension of the classic possibilistic c-means. In the proposed extension, we implicitly map input patterns into a possibly high-dimensional space by means of positive semidefinite kernels. In this new space, we model the mapped data by means of the possibilistic clusteringalgorithm. We study in more detail the special case where we model

Maurizio Filippone; Francesco Masulli; Stefano Rovetta

2 This paper describes the de sign and implementation o f an au thentication system developed for preventing pa ssword sharing u sing fuzzy techniques applied to biometrical data. The c-Meansclusteringalgorithm is implemented to create personal per-keyboard profiles based on the typing patterns of users. These profiles are accessed at logon time to further validate the identity of

Hyperspectral data classification using supervised approaches, in general, and the statistical algorithms, in particular, need high quantity and quality training data. However, these limitations, and the high dimensionality of these data, are the most important problems for using the supervised algorithms. As a solution, unsupervised or clusteringalgorithms can be considered to overcome these problems. One of the emerging clusteringalgorithms that can be used for this purpose is the kernel-based fuzzy c-means (KFCM), which has been developed by kernelizing the FCM algorithm. Nevertheless, there are some parameters that affect the efficiency of KFCM clustering of hyperspectral data. These parameters include kernel parameters, initial cluster centers, and the number of spectral bands. To address these problems, two new algorithms are developed. In these algorithms, the particle swarm optimization method is employed to optimize the KFCM with respect to these parameters. The first algorithm is designed to optimize the KFCM with respect to kernel parameters and initial cluster centers, while the second one selects the optimum discriminative subset of bands and the former parameters as well. The evaluations of the results of experiments show that the proposed algorithms are more efficient than the standard k-means and FCM algorithms for clustering hyperspectral remotely sensed data.

In this paper, two new clusteringalgorithms based on fuzzy c-means for data with tolerance using kernel functions are proposed. Kernel functions which map the data from the original space into higher dimensional feature space are introduced into the proposed algorithms. Non-linear boundary of clusters can be easily found by using the kernel functions. First, two clusteringalgorithms for data with tolerance are introduced. One is based on standard method and the other is on entropy-based one. Second, the tolerance in feature space is discussed taking account into soft margin algorithm in Support Vector Machine. Third, two objective functions in feature space are shown corresponding to two methods, respectively. Fourth, Karush-Kuhn-Tucker conditions of two objective functions are considered, respectively, and these conditions are re-expressed with kernel functions as the representation of an inner product for mapping from the original pattern space into a higher dimensional feature space. Fifth, two iterative algorithms are proposed for the objective functions, respectively. Through some numerical experiments, the proposed algorithms are discussed.

This paper proposes a novel algorithm for simultaneous estimation of the bias field and segmentation of tissues for magnetic resonance images. The algorithm formulated by modifying the objective function in the fuzzy C-meansalgorithm to include a bias field which is modeled as a linear combination of a set of basis functions. Bias field estimation and image segmentation are simultaneously

Image segmentation plays a crucial role in many medical imaging applications. In this paper, we present a novel algorithm for fuzzy segmentation of magnetic resonance imaging (MRI) data. The algorithm is realized by modifying the objective function in the conventional fuzzy C-means (FCM) algorithm using a kernel-induced distance metric and a spatial penalty on the membership functions. Firstly, the original

One of the most famous algorithms that appeared in the area of image segmentation is the Fuzzy C-Means (FCM) algorithm. This algorithm has been used in many applications such as data analysis, pattern recognition, and image segmentation. It has the advantages of producing high quality segmentation compared to the other available algorithms. Many modifications have been made to the algorithm to improve its segmentation quality. The proposed segmentation algorithm in this paper is based on the Fuzzy C-Meansalgorithm adding the relational fuzzy notion and the wavelet transform to it so as to enhance its performance especially in the area of 2D gel images. Both proposed modifications aim to minimize the oversegmentation error incurred by previous algorithms. The experimental results of comparing both the Fuzzy C-Means (FCM) and the Wavelet Fuzzy C-Means (WFCM) to the proposed algorithm on real 2D gel images acquired from human leukemias, HL-60 cell lines, and fetal alcohol syndrome (FAS) demonstrate the improvement achieved by the proposed algorithm in overcoming the segmentation error. In addition, we investigate the effect of denoising on the three algorithms. This investigation proves that denoising the 2D gel image before segmentation can improve (in most of the cases) the quality of the segmentation.

Rashwan, Shaheera; Faheem, Mohamed Talaat; Sarhan, Amany; Youssef, Bayumy A. B.

Currently, the high complexity of video contents has posed the following major challenges for fast retrieval: (1) efficient similarity measurements, and (2) efficient indexing on the compact representations. A video-retrieval strategy based on fuzzy c-means (FCM) is presented for querying by example. Initially, the query video is segmented and represented by a set of shots, each shot can be represented by a key frame, and then we used video processing techniques to find visual cues to represent the key frame. Next, because the FCM algorithm is sensitive to the initializations, here we initialized the cluster center by the shots of query video so that users could achieve appropriate convergence. After an FCM cluster was initialized by the query video, each shot of query video was considered a benchmark point in the aforesaid cluster, and each shot in the database possessed a class label. The similarity between the shots in the database with the same class label and benchmark point can be transformed into the distance between them. Finally, the similarity between the query video and the video in database was transformed into the number of similar shots. Our experimental results demonstrated the performance of this proposed approach.

Hou, Sujuan; Zhou, Shangbo; Siddique, Muhammad Abubakar

A new approach to pulse shape classification for neutron detectors of type BC501A has been investigated. The method is based on the fuzzy c-means (FCM) algorithm, which allows finding clusters of similar shapes in a set of digitized detector pulses. The aim of the method is to provide principal pulse shapes, which further can be used to apply a pulse shape based particle identification. Since the method is not adapted to the case of liquid scintillator signals it is of general use and can also be applied to signals of other detector types. A detailed study of the quality of the FCM method for the search of principal pulse shapes in BC501A liquid scintillators is presented using a 500 Msample/s 12 bit digitizer and a 252Cf neutron source. A comparison to principal pulse shapes extracted using time-of-flight information proves the applicability of the method. Finally, an example of a pulse shape discrimination method based on the extracted principal pulse shapes is presented and compared to the well-known integration method.

How to determine an appropriate number of clusters is very important when implementing a specific clusteringalgorithm, like\\u000a c-means, fuzzy c-means (FCM). In the literature, most cluster validity indices are originated from partition or geometrical\\u000a property of the data set. In this paper, the authors developed a novel cluster validity index for FCM, based on the optimality\\u000a test of FCM.

This paper presented a new approach for robust segmenta- tion of Magnetic Resonance images that have been corrupted by intensity inhomogeneities and noise. The algorithm is formulated by modifying the objective function of the standard fuzzy C-means (FCM) method to com- pensate for intensity inhomogeneities. A additional term is injected into the objective function to constrain the behavior of membership

Clustering aims to identify groups of similar objects. To evaluate the results of clusteralgorithms, an investigator uses cluster-validity indices. While the theory of cluster validity is well established for vector object data, little effort has been made to extend it to relationship-based data. As such, this paper proposes a theory of reformulation for object-data validity indices so that they

Isaac J. Sledge; Timothy C. Havens; James C. Bezdek; James M. Keller

Weighting exponent m is an important parameter in fuzzy c-means (FCM) clusteringalgorithm, which directly affects the performance of the algorithm and the validity of fuzzy cluster analysis. However, so far the optimal choice of m is still an open problem. A method of selecting the optimal m is proposed in this paper, which is based on the fuzzy decision

Segmentation of an image entails the division or separation of the image into regions of similar attribute. The most basic attribute for segmentation of an image is its luminance amplitude for a monochrome image and color components for a color image. Clustering is one of the methods used for segmentation. The objective of this paper is to compare the performance

Infrared detection system is frequently employed on surveillance operations and reconnaissance mission to detect particular targets of interest in both civilian and military communities. By incorporating the polarization of light as supplementary information, the target discrimination performance could be enhanced. So this paper proposed an infrared target identification method which is based on fuzzy theory and neural network with polarization properties of targets. The paper utilizes polarization degree and light intensity to advance the unsupervised KFCM (kernel fuzzy C-Means) clustering method. And establish different material pol1arization properties database. In the built network, the system can feedback output corresponding material types of probability distribution toward any input polarized degree such as 10° 15°, 20°, 25°, 30°. KFCM, which has stronger robustness and accuracy than FCM, introduces kernel idea and gives the noise points and invalid value different but intuitively reasonable weights. Because of differences in characterization of material properties, there will be some conflicts in classification results. And D - S evidence theory was used in the combination of the polarization and intensity information. Related results show KFCM clustering precision and operation rate are higher than that of the FCM clustering method. The artificial neural network method realizes material identification, which reasonable solved the problems of complexity in environmental information of infrared polarization, and improperness of background knowledge and inference rule. This method of polarization identification is fast in speed, good in self-adaption and high in resolution.

We present an effective method for brain tissue classification based on diffusion tensor imaging (DTI) data. The method accounts for two main DTI segmentation obstacles: random noise and magnetic field inhomogeneities. In the proposed method, DTI parametric maps were used to resolve intensity inhomogeneities of brain tissue segmentation because they could provide complementary information for tissues and define accurate tissue maps. An improved fuzzy c-means with spatial constraints proposal was used to enhance the noise and artifact robustness of DTI segmentation. Fuzzy c-meansclustering with spatial constraints (FCM_S) could effectively segment images corrupted by noise, outliers, and other imaging artifacts. Its effectiveness contributes not only to the introduction of fuzziness for belongingness of each pixel but also to the exploitation of spatial contextual information. We proposed an improved FCM_S applied on DTI parametric maps, which explores the mean and covariance of the feature spatial information for automated segmentation of DTI. The experiments on synthetic images and real-world datasets showed that our proposed algorithms, especially with new spatial constraints, were more effective. PMID:23891435

Wen, Ying; He, Lianghua; von Deneen, Karen M; Lu, Yue

This paper suggests a novel image compression scheme, using the discrete wavelet transformation (DWT) and the fuzzy c-meansclustering technique. The goal is to achieve higher compression rates by applying different compression thresholds for the wavelet coefficients of each DWT band, in terms of how they are clustered according to their absolute values. This methodology is compared to another one

Dimitris A. Karras; S. A. Karkanis; Dimitrios E. Maroulis

A method that uses fuzzy clusteringalgorithms to achieve particle identification based on pulse shape analysis is presented. The fuzzy c-meansclusteringalgorithm is used to compute mean (principal) pulse shapes induced by different particle species in an automatic and unsupervised fashion from a mixed set of data. A discrimination amplitude is proposed using these principal pulse shapes to identify the originating particle species of a detector pulse. Since this method does not make any assumptions about the specific features of the pulse shapes, it is very generic and suitable for multiple types of detectors. The method is applied to discriminate between photon- and proton-induced signals in CsI(Tl) scintillator detectors and the results are compared to the well-known integration method.

Wirth, R.; Fiori, E.; Löher, B.; Savran, D.; Silva, J.; Álvarez Pol, H.; Cortina Gil, D.; Pietras, B.; Bloch, T.; Kröll, T.; Nácher, E.; Perea, Á.; Tengblad, O.; Bendel, M.; Dierigl, M.; Gernhäuser, R.; Le Bleis, T.; Winkel, M.

The segmentation of magnetic resonance images (MRI) is a challenging problem that has received an enormous amount of attention lately. Many researchers have applied various techniques however fuzzy c-means (FCM) based algorithms have produced better results compared to other methods. In this paper, we present a modified FCM algorithm for bias (also called intensity in-homogeneities) estimation and segmentation of MRI.

Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clusteringalgorithms for data sets appearing in statistics,

In view of more iterative times, longer convergence time and lower accuracy of Fuzzy C-Means (FCM) clusteringalgorithm, an improved FCM (IFCM) clusteringalgorithm is put forward. After data preprocessing of web log, fuzzy clustering to the web log data is adopted with the help of IFCM clusteringalgorithm. It can provide foundation and help for the subsequent web page optimization and personalization services.

Magnetic resonance images are often corrupted by intensity inhomogeneity, which manifests itself as slow intensity variations of the same tissue over the image domain. Such shading artifacts must be corrected before doing computerized analysis such as intensity-based segmentation and quantitative analysis. In this paper, we present a fuzzy c-means (FCM) based algorithm that simultaneously estimates the shading effect while segmenting

The efficient markets hypothesis (EMH) maintains that market prices fully reflect all available information such as public or private information. Therefore, the study of analyzing the reaction of stock price to the information has attracted more and more attention. Here we proposed an effective measuring method using fuzzy c-means (FCM) for describing the reaction to public information, and further analyzing

Li Chao-Chao; Chi Kai; Fu Fang-Ping; Che Wen-Gang; Zhao Qing-Jiang

This paper describes fuzzy c-means (FCM) applied to the automation of large data reduction and review tasks. A data processor has been developed which determines the number of distinct structural mode responses of airfoils in a turbomachine and groups all similar responses in order to facilitate the analysis of test results. Successful implementation of the processor has demonstrated a reduction

In this paper, a generalized multiple-kernel fuzzy C-means (FCM) (MKFCM) methodology is introduced as a frame- work for image-segmentation problems. In the framework, aside from the fact that the composite kernels are used in the kernel FCM (KFCM), a linear combination of multiple kernels is pro- posed and the updating rules for the linear coefficients of the composite kernel are

A new method which the numbers of cluster is self-adapted and use up and down cut-off of FCM combined with PSO to take place of common FCM combined with PSO is proposed in this paper. Experiment's results show that compared with the method of combining the particle swarm optimization (PSO) with common FCM, it helps to make a better effect on image segmentation and optimize the numbers of cluster and converge the rate quickly.

In this paper, we present a novel algorithm for fuzzy segmentation of magnetic resonance imaging (MRI) data and es- timation of intensity inhomogeneities using fuzzy logic. MRI in- tensity inhomogeneities can be attributed to imperfections in the radio-frequency coils or to problems associated with the acqui- sition sequences. The result is a slowly varying shading artifact over the image that

Mohamed N. Ahmed; Sameh M. Yamany; Nevin Mohamed; Aly A. Farag; Thomas Moriarty

The S/N of an underwater image is low and has a fuzzy edge. If using traditional methods to process it directly, the result is not satisfying. Though the traditional fuzzy C-meansalgorithm can sometimes divide the image into object and background, its time-consuming computation is often an obstacle. The mission of the vision system of an autonomous underwater vehicle (AUV) is to rapidly and exactly deal with the information about the object in a complex environment for the AUV to use the obtained result to execute the next task. So, by using the statistical characteristics of the gray image histogram, a fast and effective fuzzy C-means underwater image segmentation algorithm was presented. With the weighted histogram modifying the fuzzy membership, the above algorithm can not only cut down on a large amount of data processing and storage during the computation process compared with the traditional algorithm, so as to speed up the efficiency of the segmentation, but also improve the quality of underwater image segmentation. Finally, particle swarm optimization (PSO) described by the sine function was introduced to the algorithm mentioned above. It made up for the shortcomings that the FCM algorithm can not get the global optimal solution. Thus, on the one hand, it considers the global impact and achieves the local optimal solution, and on the other hand, further greatly increases the computing speed. Experimental results indicate that the novel algorithm can reach a better segmentation quality and the processing time of each image is reduced. They enhance efficiency and satisfy the requirements of a highly effective, real-time AUV.

\\u000a In this paper, we present a novel algorithm for adaptive fuzzy segmentation of MRI data and estimation of intensity inhomogeneities\\u000a using fuzzy logic. MRI intensity inhomogeneities can be attributed to imperfections in the RF coils or some problems associated\\u000a with the acquisition sequences. The result is a slowly-varying shading artifact over the image that can produce errors with\\u000a conventional intensity-based

Mohamed N. Ahmed; Sameh M. Yamany; N. A. Mohamed; Aly A. Farag

Abstract Clustering is an important research topic that has practical applications in many 5elds. It has been demonstrated that fuzzy clustering, using algorithms such as the fuzzy C-means (FCM), has clear advantages over crisp and probabilistic clustering methods. Like most clusteringalgorithms, however, FCM and its derivatives need the number of clusters in the given data set as one of

In a clustering problem, the aim is to partition a given set of n points in d-dimensional space into k groups, called clusters, so that points within each cluster are near each other. Two objective functions frequently used to measure the performance of a clusteringalgorithm are, for any L4 metric, (a) the maximum distance between pairs of points in

Clustering aims at discovering groups and identifying interesting distributions and patterns in data sets. Researchers have extensively studied clustering since it arises in many application domains in engineering and social sciences. In the last years the availability of huge transactional and experimental data sets and the arising requirements for data mining created needs for clusteringalgorithms that scale and can

Maria Halkidi; Yannis Batistakis; Michalis Vazirgiannis

This paper presents an algorithm, called the modified suppressed fuzzy c-means (MS-FCM), that simultaneously performs clustering and parameter selection for the suppressed fuzzy c-means (S-FCM) algorithm proposed by (Fan, J.L., Zhen, W.Z., Xie, W.X., 2003. Suppressed fuzzy c-meansclusteringalgorithm. Pattern Recognition Lett. 24, 1607-1612). The proposed algorithm is computationally simple, and is able to select the parameter a in

Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., “FOR PROCESSING”) and vendor postprocessed (i.e., “FOR PRESENTATION”), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r = 0.82, p < 0.001) and processed (r = 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r = 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's ? ? 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies.

Keller, Brad M.; Nathan, Diane L.; Wang, Yan; Zheng, Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina

Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') and vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r= 0.82, p < 0.001) and processed (r= 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r= 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's {kappa}{>=} 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies.

Keller, Brad M.; Nathan, Diane L.; Wang Yan; Zheng Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina [Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Applied Mathematics and Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States)

The high dimensionality of image-based dataset can be a drawback for classification accuracy. In this study, we propose the application of fuzzy c-meansclustering, cluster validity indices and the notation of a joint-feature-clustering matrix to find redundancies of image-features. The introduced matrix indicates how frequently features are grouped in a mutual cluster. The resulting information can be used to find data-derived feature prototypes with a common biological meaning, reduce data storage as well as computation times and improve the classification accuracy.

The fuzzy c-means (FCM) and possibilistic c-means (PCM) algorithms have been utilized in a wide variety of fields and applications. Although many methods are derived from the FCM and PCM for clustering various types of spatial data, relational clustering has received much less attention. Most fuzzy clustering methods can only process the spatial data (e.g., in Euclidean space) instead of

The main motivation of this paper is to introduce a class of robust non-Euclidean distance measures for the original data space to derive new objective function and thus clustering the non-Euclidean structures in data to enhance the robustness of the original clusteringalgorithms to reduce noise and outliers. The new objective functions of proposed algorithms are realized by incorporating the noise clustering concept into the entropy based fuzzy C-meansalgorithm with suitable noise distance which is employed to take the information about noisy data in the clustering process. This paper presents initial cluster prototypes using prototype initialization method, so that this work tries to obtain the final result with less number of iterations. To evaluate the performance of the proposed methods in reducing the noise level, experimental work has been carried out with a synthetic image which is corrupted by Gaussian noise. The superiority of the proposed methods has been examined through the experimental study on medical images. The experimental results show that the proposed algorithms perform significantly better than the standard existing algorithms. The accurate classification percentage of the proposed fuzzy C-means segmentation method is obtained using silhouette validity index. PMID:23219569

Kannan, S R; Devi, R; Ramathilagam, S; Takezawa, K

\\u000a Many real world systems can be modeled as networks or graphs. Clusteringalgorithms that help us to organize and understand\\u000a these networks are usually referred to as, graph based clusteringalgorithms. Many algorithms exist in the literature for\\u000a clustering network data. Evaluating the quality of these clusteringalgorithms is an important task addressed by different\\u000a researchers. An important ingredient of

This interactive module helps students to understand the definition of and uses for clusteringalgorithms. Students will learn to categorize the types of clusteringalgorithms, to use the minimal spanning tree and the k-means clusteringalgorithm, and to solve exercise problems using clusteringalgorithms. Each component has a detailed explanation along with quiz questions. A series of questions is presented at the end to test the students understanding of the lesson's entire concept.

The problem of clustering N objects into M classes may be viewed as a combinatorial optimization algorithm. In the literature on clustering, iterative hill-climbing techniques are used to find a locally optimum classification. In this paper, we develop a clusteringalgorithm based on the branch and bound method of combinatorial optimization. This algorithm determines the globally optimum classification and is

Warren L. G. Koontz; Patrenahalli M. Narendra; Keinosuke Fukunaga

This paper is concerned with the computational efficiency of clusteringalgorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. Two relational versions of an evolutionary algorithm for clustering are derived and compared against two systematic (repetitive) approaches that can also

In high-dimensional data, clusters can exist in subspaces that hide themselves from traditional clustering methods. A number of algorithms have been proposed to identify such projected clusters, but most of them rely on some user parameters to guide the clustering process. The clustering accuracy can be seriously degraded if incorrect values are used. Unfortunately, in real situations, it is rarely

Clustering is one of the image-processing methods used in non-destructive testing (NDT). As one of the initializing parameters, most clusteringalgorithms, like fuzzy Cmeans (FCM), Iterative self-organization data analysis (ISODATA), K-means, and their derivatives, require the number of clusters. This paper proposes an algorithm for clustering the pixels in C-scan images without any initializing parameters. In this state-of-the-art method, an image is sampled based on the rosette pattern and according to the pattern characteristics, and extracted samples are clustered and then the number of clusters is determined. The centroids of the classes are computed by means of a method used to calculate the distribution function. Based on different data sets, the results show that the algorithm improves the clustering capability by 92.93% and 91.93% in comparison with FCM and K-means algorithms, respectively. Moreover, when dealing with high-resolution data sets, the efficiency of the algorithm in terms of cluster detection and run time improves considerably.

Fast and high-quality document clusteringalgorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In particular, clusteringalgorithms that build meaningful hierarchies out of large document collections are ideal tools for their interactive visualization and exploration as they provide data-views that are consistent, predictable,

The problem this paper focuses on is the classical problem of unsupervised clustering of a data-set. In particular, the bisecting divisive clustering approach is here considered. This approach consists in recursively splitting a cluster into two sub-clusters, starting from the main data-set. This is one of the more basic and common problems in fields like pattern analysis, data mining, document

Sergio M. Savaresi; Daniel Boley; Sergio Bittanti; Giovanna Gazzaniga

This paper describes a new task scheduling algorithm based on clustering. In this new approach, clustering of the tasks is achieved by applying a force model to the task graph. From an initial configuration of the task graph, forces act upon the nodes to manoeuvre them into a low energy or equilibrium state. Clusters are created from the equilibrium state

In this paper, we propose a new ant based clusteringalgorithm. The algorithm takes inspiration from the sound communication properties of real ants. Artificial ants communicate directly with each others in order to merge similar group of objects. The proposed algorithm was tested and evaluated. The obtained results are very encouraging in comparison with the famous k-means and some ant

Akil Elkamel; Mariem Gzara; Salma Jamoussi; H. Ben-Abdallah

This paper describes a hierarchical parallel implementation of two clusteringalgorithms applied to the segmentation of multidimensional images and range images. The proposed hierarchical parallel implementation results in a fast robust segmentation algorithm that can be applied in a number of practical computer vision problems. The clustering process is divided in two basic steps. First, a fast sequential clusteringalgorithm performs a simple analysis of the image data, which results in a sub optimal classification of the image features. Second, the resulting clusters are analyzed using the minimum volume ellipsoid estimator. The second step is to merge the similar clusters using the number and shape of the ellipsoidal clusters that best represents the data. Both algorithms are implemented in a parallel computer architecture that speeds up the classification task. The hierarchical clusteringalgorithm is compared against the fuzzy k-means clusteringalgorithm showing that both approaches gave comparable segmentation results. The hierarchical parallel implementation is tested in synthetic multidimensional images and real range images.

Background We present the algorithm PFClust (Parameter Free Clustering), which is able automatically to cluster data and identify a suitable number of clusters to group them into without requiring any parameters to be specified by the user. The algorithm partitions a dataset into a number of clusters that share some common attributes, such as their minimum expectation value and variance of intra-cluster similarity. A set of n objects can be clustered into any number of clusters from one to n, and there are many different hierarchical and partitional, agglomerative and divisive, clustering methodologies available that can be used to do this. Nonetheless, automatically determining the number of clusters present in a dataset constitutes a significant challenge for clusteringalgorithms. Identifying a putative optimum number of clusters to group the objects into involves computing and evaluating a range of clusterings with different numbers of clusters. However, there is no agreed or unique definition of optimum in this context. Thus, we test PFClust on datasets for which an external gold standard of ‘correct’ cluster definitions exists, noting that this division into clusters may be suboptimal according to other reasonable criteria. PFClust is heuristic in the sense that it cannot be described in terms of optimising any single simply-expressed metric over the space of possible clusterings. Results We validate PFClust firstly with reference to a number of synthetic datasets consisting of 2D vectors, showing that its clustering performance is at least equal to that of six other leading methodologies – even though five of the other methods are told in advance how many clusters to use. We also demonstrate the ability of PFClust to classify the three dimensional structures of protein domains, using a set of folds taken from the structural bioinformatics database CATH. Conclusions We show that PFClust is able to cluster the test datasets a little better, on average, than any of the other algorithms, and furthermore is able to do this without the need to specify any external parameters. Results on the synthetic datasets demonstrate that PFClust generates meaningful clusters, while our algorithm also shows excellent agreement with the correct assignments for a dataset extracted from the CATH part-manually curated classification of protein domain structures.

Artificial neural networks (ANNs) are powerful mathematical models that are used to solve complex real world problems. Wavelet neural networks (WNNs), which were developed based on the wavelet theory, are a variant of ANNs. During the training phase of WNNs, several parameters need to be initialized; including the type of wavelet activation functions, translation vectors, and dilation parameter. The conventional k-means and fuzzy c-meansclusteringalgorithms have been used to select the translation vectors. However, the solution vectors might get trapped at local minima. In this regard, the evolutionary harmony search algorithm, which is capable of searching for near-optimum solution vectors, both locally and globally, is introduced to circumvent this problem. In this paper, the conventional k-means and fuzzy c-meansclusteringalgorithms were hybridized with the metaheuristic harmony search algorithm. In addition to obtaining the estimation of the global minima accurately, these hybridized algorithms also offer more than one solution to a particular problem, since many possible solution vectors can be generated and stored in the harmony memory. To validate the robustness of the proposed WNNs, the real world problem of epileptic seizure detection was presented. The overall classification accuracy from the simulation showed that the hybridized metaheuristic algorithms outperformed the standard k-means and fuzzy c-meansclusteringalgorithms.

Multiobjective evolutionary clustering approach has been successfully utilized in data clustering. In this paper, we propose a novel unsupervised machine learning algorithm namely multiobjective evolutionary clustering ensemble algorithm (MECEA) to perform the texture image segmentation. MECEA comprises two main phases. In the first phase, MECEA uses a multiobjective evolutionary clusteringalgorithm to optimize two complementary clustering objectives: one based on

Xiaoxue Qian; Xiangrong Zhang; Licheng Jiao; Wenping Ma

Unsupervised classification is a popular tool for unlabeled datasets in data mining and exploratory data analysis, such as K-means and Fuzzy C-mean. Although these unsupervised techniques have demonstrated substantial success for satellite imagery, they have some limitations. The initialization number of clusters in K-means clustering application is often needed in advance as an input parameter to the algorithm. Our previous

Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clusteringalgorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability. PMID:16119259

Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max

The Fuzzy C-Meanclustering (FCM) algorithm is an effective image segmentation algorithm which combines the clustering of non-supervised and the idea of the blurry aggregate, it is widely applied to image segmentation, but it has many problems, such as great amount of calculation, being sensitive to initial data values and noise in images, and being vulnerable to fall into the shortcoming of local optimization. To conquer the problems of FCM, the algorithm of fuzzy clustering based on Particle Swarm Optimization (PSO) was proposed, this article first uses the PSO algorithm of a powerful global search capability to optimize FCM centers, and then uses this center to partition the images, the speed of the image segmentation was boosted and the segmentation accuracy was improved. The results of the experiments show that the PSO-FCM algorithm can effectively avoid the disadvantage of FCM, boost the speed and get a better image segmentation result.

An artificial ant sleeping model (ASM) and adaptive artificial ants clusteringalgorithm (A 4C) are presented to resolve the clustering problem in data mining by simulating the behaviors of gregarious ant colonies. In the ASM mode, each data is represented by an agent. The agents' environment is a two-dimensional grid. In A 4C, the agents can be formed into high-quality

We compare several different parallel implementation approaches for the clustering operations performed during adaptive gridding operations in patch-based structured adaptive mesh refinement (SAMR) applications. Specifically, we target the clusteringalgorithm of Berger and Rigoutsos (BR91), which is commonly used in many SAMR applications. The baseline for comparison is a simplistic parallel extension of the original algorithm that works well for up to O(10{sup 2}) processors. Our goal is a clusteringalgorithm for machines of up to O(10{sup 5}) processors, such as the 64K-processor IBM BlueGene/Light system. We first present an algorithm that avoids the unneeded communications of the simplistic approach to improve the clustering speed by up to an order of magnitude. We then present a new task-parallel implementation to further reduce communication wait time, adding another order of magnitude of improvement. The new algorithms also exhibit more favorable scaling behavior for our test problems. Performance is evaluated on a number of large scale parallel computer systems, including a 16K-processor BlueGene/Light system.

Image segmentation is a crucial stage in the analysis of dermoscopic images as the extraction of exact boundaries of skin lesions is esseintial for accurate diagnosis. One approach to image segmentation is based on the idea of clustering pixels with similar characteristics. Fuzzy c-means is a popular clustering based algorithm that is often employed in medical image segmentation, however due

Huiyu Zhou; Gerald Schaefer; Abdul H. Sadka; M. Emre Celebi

Image segmentation is an important task in analysing dermoscopy images as the extraction of the borders of skin lesions provides important cues for accurate diagnosis. One family of segmentation algorithms is based on the idea of clustering pixels with similar characteristics. Fuzzy c-means has been shown to work well for clustering based segmentation, however due to its iterative nature this

Huiyu Zhou; Gerald Schaefer; Abdul H. Sadka; M. Emre Celebi

Noise can provably speed up convergence in many centroid-based clusteringalgorithms. This includes the popular k-means clusteringalgorithm. The clustering noise benefit follows from the general noise benefit for the expectation-maximization algorithm because many clusteringalgorithms are special cases of the expectation-maximization algorithm. Simulations show that noise also speeds up convergence in stochastic unsupervised competitive learning, supervised competitive learning, and differential competitive learning. PMID:23137615

The authors have been using genetic algorithms to study the structures of atomic clusters and related problems. This is a problem where local minima are easy to locate, but barriers between the many minima are large, and the number of minima prohibit a systematic search. They use a novel mating algorithm that preserves some of the geometrical relationship between atoms, in order to ensure that the resultant structures are likely to inherit the best features of the parent clusters. Using this approach, they have been able to find lower energy structures than had been previously obtained. Most recently, they have been able to turn around the building block idea, using optimized structures from the GA to learn about systematic structural trends. They believe that an effective GA can help provide such heuristic information, and (conversely) that such information can be introduced back into the algorithm to assist in the search process.

Morris, J.R.; Deaven, D.M.; Ho, K.M.; Wang, C.Z.; Pan, B.C.; Wacker, J.G.; Turner, D.E. [Ames Lab., IA (United States)]|[Iowa State Univ., Ames, IA (United States). Dept. of Physics

A drastic improvement in the analysis of gene expression has lead to new discoveries in bioinformatics research. In order to analyse the gene expression data, fuzzy clusteringalgorithms are widely used. However, the resulting analyses from these specific types of algorithms may lead to confusion in hypotheses with regard to the suggestion of dominant function for genes of interest. Besides that, the current fuzzy clusteringalgorithms do not conduct a thorough analysis of genes with low membership values. Therefore, we present a novel computational framework called the "multi-stage filtering-Clustering Functional Annotation" (msf-CluFA) for clustering gene expression data. The framework consists of four components: fuzzy c-meansclustering (msf-CluFA-0), achieving dominant cluster (msf-CluFA-1), improving confidence level (msf-CluFA-2) and combination of msf-CluFA-0, msf-CluFA-1 and msf-CluFA-2 (msf-CluFA-3). By employing double filtering in msf-CluFA-1 and apriori algorithms in msf-CluFA-2, our new framework is capable of determining the dominant clusters and improving the confidence level of genes with lower membership values by means of which the unknown genes can be predicted. PMID:23930805

Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering. PMID:24051734

This paper addresses the issue of the classification of accident scenarios generated in a dynamic safety and reliability analyses of a Nuclear Power Plant (NPP) equipped with a Dig- ital Instrumentation and Control system (I&C). More specifically, the classification of the final state reached by the system at the end of an accident scenario is performed by Fuzzy C-Means clus-

Francesco Di Maio; Piercesare Secchi; Simone Vantini; Enrico Zio

Abstract We present and analyze the o - line star algorithm for clustering static information systems and the on - line star algorithm for clustering dynamic information systems These algorithms organize a document collection into a number of clusters that is naturally induced by the collection via a computationally e cient cover by dense subgraphs We further show a lower

In the routing protocol for wireless sensor network, the cluster size is generally fixed in clustering routing algorithm for wireless sensor network, which can easily lead to the "hot spot" problem. Furthermore, the majority of routing algorithms barely consider the problem of long distance communication between adjacent cluster heads that brings high energy consumption. Therefore, this paper proposes a new cross unequal clustering routing algorithm based on the EEUC algorithm. In order to solve the defects of EEUC algorithm, this algorithm calculating of competition radius takes the node's position and node's remaining energy into account to make the load of cluster heads more balanced. At the same time, cluster adjacent node is applied to transport data and reduce the energy-loss of cluster heads. Simulation experiments show that, compared with LEACH and EEUC, the proposed algorithm can effectively reduce the energy-loss of cluster heads and balance the energy consumption among all nodes in the network and improve the network lifetime

Tong, Wang; Jiyi, Wu; He, Xu; Jinghua, Zhu; Munyabugingo, Charles

A high resolution detector is being developed for our small animal position emission tomography (MicroPET). The detector unit consist of 8x8 crystal blocks, coupled to four photomultiplier tubes (PMTs). Each scintillation event is mapped in a two dimensional (2-D) position through the relative ratio of the output signals of the PMTs. Crystal Look-up table (CLT) used in ThuMicroPET scanner defines

Xiaowen Kang; Xishan Sun; Shi Wang; Yaqiang Liu; Yan Xia; Rong Zhou; Zhaoxia Wu; Yongjie Jin

Magnetic Resonance Imaging (MRI) is one of the best technologies currently being used for diagnosing brain tumor. Brain tumor is diagnosed at advanced stages with the help of the MRI image. Segmentation is an important process to extract suspicious region from complex medical images. Automatic detection of brain tumor through MRI can provide the valuable outlook and accuracy of earlier

The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clusteringalgorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initializatio...

For pt.I see ibid., p.775-85. In part I an equivalence between the concepts of fuzzy clustering and soft competitive learning in clusteringalgorithms is proposed on the basis of the existing literature. Moreover, a set of functional attributes is selected for use as dictionary entries in the comparison of clusteringalgorithms. In this paper, five clusteringalgorithms taken from the

Clusteringalgorithms have successfully been applied as a digital image segmentation technique in various fields and applications. However, those clusteringalgorithms are only applicable for specific images such as medical images, microscopic images etc. In this paper, we present a new clusteringalgorithm called Adaptive Fuzzy-K-means (AFKM) clustering for image segmentation which could be applied on general images and\\/or specific

The Southern shelf of the Korean Peninsula has three potential sediment sources: China, Japan and Korea. However, dominance of clay and/or silt grained sediments causes ambiguity in the allocation of sediment sources and sedimentation features by conventional sedimentological approach. As an alternative analytic method, mineral magnetic techniques have been widely used to sediment provenances that are highly sensitive to micron-scale mineral fractions. In this study, various mineral magnetic parameters representing grain-size, magnetic concentration, and magnetic mineralogy were obtained from 480 sediment samples collected from 98 regularly spaced spots in the southern shelf of Korea. For the multivariate fuzzy cluster analysis, six parameters independent of magnetic mineral concentration were used. The interpretive routine suggests that the best statistical solution for the samples is four clusters. In addition, the solution shows a prominent spatial discrimination based on the magnetic data set. Samples characterized by relatively coarse and high-coercivity magnetic minerals are found in west of the study area (close to China, cluster 1). Clusters 2 and 3 include the samples showing a dominance of fine and low-coercivity ferrimagnetic minerals (center of the study area). Samples with close affinity to cluster 4 are found in northeast, which dominated by fine and high-coercivity minerals. Such longitudinal dichotomy in fuzzy clustering implies two possible sediment provenances: west in China and northeast in Korea. It is likely that Kuroshio Current flows are responsible for the spreads of finer grains in center of the study area (clusters 2 and 3).

In this paper, we present the design of parallel Sobel edge detection algorithm using Foster's methodology. The parallel algorithm is implemented using MPI message passing library and master/slave algorithm. Every processor performs the same sequential algorithm but on different part of the image. Experimental results conducted on Beowulf cluster are presented to demonstrate the performance of the parallel algorithm.

Traditional analytical methods for traffic information can't meet to need of intelligent traffic system. Mining value-add information can deal with more traffic problems. The paper exploits a new clustering optimization algorithm to extract useful spatial clustered pattern for predicting long-term traffic flow from macroscopic view. Considering the sensitivity of initial parameters and easy falling into local extreme in FCM algorithm, the new algorithm applies Particle Swarm Optimization method, which can discovery the globe optimal result, to the FCM algorithm. And the algorithm exploits the union of the clustering validity index and objective function of the FCM algorithm as the fitness function of the PSO algorithm. The experimental result indicates that it is effective and efficient. For fuzzy clustering of road traffic data, it can produce useful spatial clustered pattern. And the clustered centers represent the locations which have heavy traffic flow. Moreover, the parameters of the patterns can provide intelligent traffic system with assistant decision support.

Hu, Chunchun; Shi, Wenzhong; Meng, Lingkui; Liu, Min

Segmentation is a difficult task and challenging problem in the brain medical images for diagnosing cancer portion and other brain related diseases. Many researchers have introduced various segmentation techniques for brain medical images, however fuzzy clustering based fuzzy c-means image segmentation technique is more effective compared to other segmentation techniques. This paper introduces three new proposed algorithms namely Weighted Bias

Modern biological research increasingly recognises the importance of genome-wide gene regulatory network inference; however, a range of statistical, technological and biological factors make it a difficult and intractable problem. One approach that some research has used is to cluster the data and then infer a structural model of the clusters. When using this kind of approach it is very important to choose the clusteringalgorithm carefully. In this paper we explicitly analyse the attributes that make a clusteringalgorithm appropriate, and we also consider how to measure the quality of the identified clusters. Our analysis leads us to develop three novel cluster quality measures that are based on regulatory overlap. Using these measures we evaluate two modern candidate algorithms: FLAME, and KMART. Although FLAME was specifically developed for clustering gene expression profile data, we find that KMART is probably a better algorithm to use if the goal is to infer a structural model of the clusters.

In the process of raisin production, there were a variety of color impurities, which needs be removed effectively. A new kind of efficient raisin color-sorting algorithm was presented here. First, the technology of image processing basing on the threshold was applied for the image pre-processing, and then the gray-scale distribution characteristic of the raisin image was found. In order to get the chromatic aberration image and reduce some disturbance, we made the flame image subtraction that the target image data minus the background image data. Second, Haar wavelet filter was used to get the smooth image of raisins. According to the different colors and mildew, spots and other external features, the calculation was made to identify the characteristics of their images, to enable them to fully reflect the quality differences between the raisins of different types. After the processing above, the image were analyzed by K-means clustering analysis method, which can achieve the adaptive extraction of the statistic features, in accordance with which, the image data were divided into different categories, thereby the categories of abnormal colors were distinct. By the use of this algorithm, the raisins of abnormal colors and ones with mottles were eliminated. The sorting rate was up to 98.6%, and the ratio of normal raisins to sorted grains was less than one eighth.

Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clusteringalgorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clusteringalgorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clusteringalgorithm achieves better performance compared to the K- means and the Ant clusteringalgorithm for real document clustering.

Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL

Routing protocols in sensor networks usually require the use of messages flooding. This process is very costly in term of energy consumption. For unattended sensor networks (often composed of battery-powered nodes) energy consumption is an important metric since it maps directly to network operational lifetime. In this paper, we examine an on demand clusteralgorithm called passive clusteringalgorithm. This

Traditional clustering methods have been usually developed in a centralized fashion. One reason for this is that this form of clustering relies on data structures that must be accessed and modified at each step of the clustering process. Another issue with classical clustering methods is that they need some hints about the target clustering. These hints include for example the

This paper presents a multi-ant colonies approach for clustering data that consists of some parallel and independent ant colonies and a queen ant agent. Each ant colony process takes different types of ants moving speed and different versions of the probability conversion function to generate various clustering results with an ant-based clusteringalgorithm. These results are sent to the queen

Fast and high-quality document clusteringalgorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering solutions provide a view of the data at different levels of granularity, making them ideal for people to visualize and interactively explore large document collections.In this

(Abstract)On basis of analyzing the advantages and weaknesses of the current clusteringalgorithm of data streams, this paper introduces a new density-based clusteringalgorithm in evolving data streams——SDStream, which can analyze and deal with large-scale evolving data stream. Its performance is tested by using both real datasets and synthetic datasets. Experimental results show this algorithm has better perpformance of applicability,

Traditional k-means algorithm cannot get high clustering precise rate, and easily be affected by clustering center random initialized and isolated points, but the algorithm is simple with low time complexity, and can process the big data set quickly. This paper proposes an improved k-means algorithm named PKM. PKM is based on similarity degree among data points made by cumulated K-means,

BACKGROUND: A wealth of clusteringalgorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and

Jeremy J Jay; John D Eblen; Yun Zhang; Mikael Benson; Andy D Perkins; Arnold M Saxton; Brynn H Voy; Elissa J Chesler; Michael A Langston

\\u000a A wealth of clusteringalgorithms has been applied to gene co-expression experiments. These algorithms cover a broad array\\u000a of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique\\u000a communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative\\u000a effectiveness provides guidance to algorithm selection, development and implementation.

Jeremy J. Jay; John D. Eblen; Yun Zhang; Mikael Benson; Andy D. Perkins; Arnold M. Saxton; Brynn H. Voy; Elissa J. Chesler; Michael A. Langston

A novel, nature inspired, unsupervised classification method, based on the most recent metaheuristic algorithm, stirred by the breeding strategy of the parasitic bird, the cuckoo, is introduced in this paper. The proposed Cuckoo Search ClusteringAlgorithm (CSCA) yields good results on benchmark dataset. Inspired by the results, the proposed algorithm is validated on two real time remote sensing satellite- image

Cluster analysis aims at finding subsets (clusters) of a given set of entities, which are homogeneous and/or well separated. Starting from the 1990s, cluster analysis has been applied to several domains with numerous applications. It has emerged as one of the most exciting interdisciplinary fields, having benefited from concepts and theoretical results obtained by different scientific research communities, including genetics, biology, biochemistry, mathematics, and computer science. The last decade has brought several new algorithms, which are able to solve larger sized and real-world instances. We will give an overview of the main types of clustering and criteria for homogeneity or separation. Solution techniques are discussed, with special emphasis on the combinatorial optimization perspective, with the goal of providing conceptual insights and literature references to the broad community of clustering practitioners. A new biased random-key genetic algorithm is also described and compared with several efficient hybrid GRASP algorithms recently proposed to cluster biological data. PMID:23896381

In this paper we discuss various methods for clustering in order to determine estimates of the cluster mass, focusing on the cluster A 3581. Using virtual observatory (VO) tools, possible galaxy cluster candidates are selected. Using the Kayes Mixture Model (KMM) algorithm and the Gaussian Mixing Model (GMM), we determine the most likely cluster member candidates. We then compare the results obtained to SIMBADs method of hierarchy. The mass of A 3581 was calculated and checked with literature values. We discuss the sensitivity of the mass determination and show that the GMM provides a very robust method to determine member candidates for cluster A 3581.

Nano-sized clusters of various materials are recent experimental targets, since they exhibit size-dependent physico-chemical properties. A vast amount of literature is available on the study of molecular clusters but general methods for systematic evolution of their growth are rather scarce. The present work reports a molecular cluster building algorithm based on the electrostatic guidelines, followed by ab initio investigations, enabled by the application of molecular tailoring approach. Applications of the algorithm for generating geometries and interaction energies of large molecular clusters of zinc sulfide, benzene, and water are presented. PMID:21361531

We consider the problem of partitioning the nodes of a complete edge weighted graph into {kappa} clusters so as to minimize the sum of the diameters of the clusters. Since the problem is NP-complete, our focus is on the development of good approximation algorithms. When edge weights satisfy the triangle inequality, we present the first approximation algorithm for the problem. The approximation algorithm yields a solution that has no more than 10k clusters such the total diameter of these clusters is within a factor O(log (n/{kappa})) of the optimal value fork clusters, where n is the number of nodes in the complete graph. For any fixed {kappa}, we present an approximation algorithm that produces {kappa} clusters whose total diameter is at most twice the optimal value. When the distances are not required to satisfy the triangle inequality, we show that, unless P = NP, for any {rho} {ge} 1, there is no polynomial time approximation algorithm that can provide a performance guarantee of {rho} even when the number of clusters is fixed at 3. Other results obtained include a polynomial time algorithm for the problem when the underlying graph is a tree with edge weights.

Image segmentation is the prerequisite step for further image analysis. Segmentation algorithms based on clustering attract more and more attentions. In this paper, an image-domain based clustering method for segmentation, called CSA-CA, is proposed. In this method, a scale parameter is introduced instead of an apriori known number of clusters. Considering that adjacent pixels are generally not independent of each other, the spatial local context is took account into our method. A spatial information term is added so that the near pixels have higher probability to merge into one cluster. Additionally, a clonal selection clustering operator is used so that a cluster is capable of exploring the others that are not neighboring in spatial but similar in feature. In the experiments we show the effectiveness of the proposed method and compare it to other segmentation algorithms.

Abstract: In this paper, we propose an on-demand distributed clusteringalgorithm for multi-hop packet radio networks. These types ofnetworks, also known as ad hoc networks, are dynamic in nature due to the mobility of nodes. The association and dissociation of nodes toand from clusters perturb the stability of the network topology, and hence a reconfiguration of the system is often

For avoiding the dependence of the validity of clustering on the space distribution of high dimensional samples of Fuzzy C2Means, a dynamic fuzzy clustering method based on artificial fish swarm algorithm was proposed By introducing a fuzzy equivalence matrix to the similar degree among samples, the high dimensional samples were mapped to two dimensional planes. Then the Euclidean distance of

Feature selection is an important process in data analysis for information-preserving data reduction. Clustering is inherently a difficult task and is made even more difficult when the selection of relevant features is also an issue. In this paper, we propose an approach for clustering and feature selection simultaneously using a harmony search algorithm. Our approach makes feature selection an integral

Clustering with constraints is an active area of machine learn- ing and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of

In this paper we are interested in autonomous systems that can automatically develop terrain classifiers without human interaction or feedback. A key issue is clustering of sensor data from the same terrain. In this context, we present a novel off- line windowless clusteringalgorithm exploiting time-dependency between samples. In terrain coverage, sets of sensory measure- ments are returned that are

With the size and state of the Internet today, a good quality approach to organizing this mass of information is of great importance. Clustering web pages into groups of similar documents is one approach, but relies heavily on good feature extraction and document representation as well as a good clustering approach and algorithm. Due to the changing nature of the

The main focus of this paper is to propose integration of dynamic and multiobjective algorithms for graph clustering in dynamic environments under multiple objectives. The primary application is to multiobjective clustering in social networks which change over time. Social networks, typically represented by graphs, contain information about the relations (or interactions) among online materials (or people). A typical social network

Motivation: Efficient clustering is important for handling the large amount of available EST sequences. Most con- temporary methods are based on some kind of all-against- all comparison, resulting in a quadratic time complexity. Ad ifferent approach is needed to keep up with the rapid growth of EST data. Results: An ew ,f ast EST clusteringalgorithm is pre- sented. Sub-quadratic

A novel and fast fingerprint identification technique is presented in this paper by using ANN. The proposed method use a novel clusteringalgorithm to detect similar feature groups from multiple template images generated from the same finger and create the cluster core set. The proposed feature extraction scheme is based on the reduction of information contents to the require minimum

This paper describes an evolutionary clusteringalgorithm, which can partition a given dataset automatically into the optimal number of groups through one shot of optimization. The proposed method is based on an evolutionary computing technique known as the Bacterial Evolutionary Algorithm (BEA). The BEA draws inspiration from a biological phenomenon of microbial evolution. Unlike the conventional mutation, crossover and selection

Artificial Bee Colony (ABC) algorithm which is one of the most recently introduced optimization algorithms, simulates the intelligent foraging behavior of a honey bee swarm. Clustering analysis, used in many disciplines and applications, is an important tool and a descriptive task seeking to identify homogeneous groups of objects based on the values of their attributes. In this work, ABC is

The internal structure of hadronic showers can be resolved in a high-granularity calorimeter. This structure is described in terms of simple components and an algorithm for reconstruction of hadronic clusters using these components is presented. Results from applying this algorithm to simulated hadronic Z-pole events in the SiD concept are discussed.

This paper focuses on the image segmentation, which is one of the key problems in medical image processing. A new medical image segmentation method is proposed based on fuzzy c- meansalgorithm and spatial information. Firstly, we classify the image into the region of interest and background using fuzzy cmeansalgorithm. Then we use the information of the tissues' gradient and the intensity inhomogeneities of regions to improve the quality of segmentation. The sum of the mean variance in the region and the reciprocal of the mean gradient along the edge of the region are chosen as an objective function. The minimum of the sum is optimum result. The result shows that the clustering segmentation algorithm is effective.

In this paper, multiple kernel fuzzy c-means is introduced as a general framework for image segmentation problem. Multiple kernel fuzzy c-means provides us a new approach to combine different information of image pixels in segmentation algorithms. That is, different information of image pixels are combined in the kernel space by combining different kernel functions defined on specific information domains. Two

Clustering or grouping of similar objects is one of the most widely used procedures in data mining, which has received enormous attentions and many methods have been proposed in these recent decades. However these traditional clusteringalgorithms require all the data objects to be located at one single site where it is analyzed. And such limitation cannot face the challenge as nowadays monstrous sizes of data sets are often stored on different independently working computers connected to each other via local or wide area networks instead of one single site. Therefore in this paper, we propose a fully distributed clusteringalgorithm, called a fully distributed clustering based on fractal dimension (FDCFD), which enables each site to collaborate in forming a global clustering model with low communication cost. The main idea behind FDCFD is via calculating fractal dimension to group points in a cluster in such a way that none of the points in the cluster changes the cluster's fractal dimension radically. In our theoretical analysis, we will demonstrate that our approach can work very well for clustering data that is inherently distributed, collect information spread over several local sites to form a global clustering meanwhile without communication costs and delays for transmitting.

This paper introduces a multilayer traffic network model and traffic network clustering method for solving the route selection problem (RSP) in car navigation system (CNS). The purpose of the proposed method is to reduce the computation time of route selection substantially with acceptable loss of accuracy by preprocessing the large size traffic network into new network form. The proposed approach further preprocesses the traffic network than the traditional hierarchical network method by clustering method. The traffic network clustering considers two criteria. We specify a genetic clusteringalgorithm for traffic network clustering and use NSGA-II for calculating the multiple objective Pareto optimal set. The proposed method can overcome the size limitations when solving route selection in CNS. Solutions provided by the proposed algorithm are compared with the optimal solutions to analyze and quantify the loss of accuracy.

Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clusteringalgorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.

Bejarano, Jeremy [Brigham Young University; Bose, Koushiki [Brown University; Brannan, Tyler [North Carolina State University; Thomas, Anita [Illinois Institute of Technology; Adragni, Kofi [University of Maryland; Neerchal, Nagaraj [University of Maryland; Ostrouchov, George [ORNL

Functional data can be clustered by plugging estimated regression coefficients from individual curves into the k-means algorithm. Clustering results can differ depending on how the curves are fit to the data. Estimating curves using different sets of basis functions corresponds to different linear transformations of the data. k-means clustering is not invariant to linear transformations of the data. The optimal linear transformation for clustering will stretch the distribution so that the primary direction of variability aligns with actual differences in the clusters. It is shown that clustering the raw data will often give results similar to clustering regression coefficients obtained using an orthogonal design matrix. Clustering functional data using an L2 metric on function space can be achieved by clustering a suitable linear transformation of the regression coefficients. An example where depressed individuals are treated with an antidepressant is used for illustration.

Energy efficiency is very important in mobile ad hoc networks (MANETs). Most clusteringalgorithms save energy by limiting the maximum size of the cluster, by reducing control overhead for maintaining clusters, or by reducing power consumption of intra-cluster communication. These algorithms usually update the intra-cluster routing information and maintain clusters by periodically sending out beacons. This paper proposes to save

Dali Wei; H Anthony Chan; Engelbert Linus Chuwa; Batebe Laura Majugo

Methods for spatially distributed physical modeling of snowmelt are presented. Estimation of snow-water equivalence using regression trees, and distributed snowmelt modeling using iterative clustering are described; these methods are applied to Emerald Lake basin, a 120-ha seasonally snow covered basin in California. Modeled snowmelt and basin discharge substantially agree, yet modeled and observed snow covered area agree only over ~

Based on the clustering technology in data mining, we aimed to establish a new schoolwork identifying mechanism. In order to let the normal answer can adapt to actual situations better, we first generalized the normal answer, and then calculated the similarity between every sample and normal answer, as well as similar degree between school works. Based on the similarity, we

Graph clustering has been an essential part in many methods and thus its accuracy has a significant effect on many applications. In addition, exponential growth of real-world graphs such as social networks, biological networks and electrical circuits demands clusteringalgorithms with nearly-linear time and space complexity. In this paper we propose Personalized PageRank Clustering (PPC) that employs the inherent cluster exploratory property of random walks to reveal the clusters of a given graph. We combine random walks and modularity to precisely and efficiently reveal the clusters of a graph. PPC is a top-down algorithm so it can reveal inherent clusters of a graph more accurately than other nearly-linear approaches that are mainly bottom-up. It also gives a hierarchy of clusters that is useful in many applications. PPC has a linear time and space complexity and has been superior to most of the available clusteringalgorithms on many datasets. Furthermore, its top-down approach makes it a flexible solution for clustering problems with different requirements.

A. Tabrizi, Shayan; Shakery, Azadeh; Asadpour, Masoud; Abbasi, Maziar; Tavallaie, Mohammad Ali

We formulate a technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.

Feldt, S.; Waddell, J.; Hetrick, V. L.; Berke, J. D.; ?ochowski, M.

In this paper, we aim to quantify the performance gains of dynamic parallelism. The newest version of CUDA, CUDA 5, introduces dynamic parallelism, which allows GPU threads to create new threads, without CPU intervention, and adapt to its data. This effectively eliminates the superfluous back and forth communication between the GPU and CPU through nested kernel computations. The change in performance will be measured using two well-known clusteringalgorithms that exhibit data dependencies: the K-means clustering and the hierarchical clustering. K-means has a sequential data dependence wherein iterations occur in a linear fashion, while the hierarchical clustering has a tree-like dependence that produces split tasks. Analyzing the performance of these data-dependent algorithms gives us a better understanding of the benefits or potential drawbacks of CUDA 5's new dynamic parallelism feature.

A method for clustering incomplete longitudinal data, and gene expression time course data in particular, is presented. Specifically, an existing method that utilizes mixtures of multivariate Gaussian distributions with modified Cholesky-decomposed covariance structure is extended to accommodate incomplete data. Parameter estimation is carried out in a fashion that is similar to an expectation-maximization algorithm. We focus on the particular application of clustering incomplete gene expression time course data. In this application, our approach gives good clustering performance when compared to the results when there is no missing data. Possible extensions of this work are also suggested. PMID:21969969

Shaikh, Mateen; McNicholas, Paul D; Desmond, Anthony F

We present a new cluster-detection algorithm designed for the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) survey but with generic application to any multiband data. The method makes no prior assumptions about the properties of clusters other than (i) the similarity in colour of cluster galaxies (the 'red sequence'); and (ii) an enhanced projected surface density. The detector has three main steps: (i) it identifies cluster members by photometrically filtering the input catalogue to isolate galaxies in colour-magnitude space; (ii) a Voronoi diagram identifies regions of high surface density; and (iii) galaxies are grouped into clusters with a Friends-of-Friends technique. Where multiple colours are available, we require systems to exhibit sequences in two colours. In this paper, we present the algorithm and demonstrate it on two data sets. The first is a 7-deg2 sample of the deep Sloan Digital Sky Survey (SDSS) equatorial stripe (Stripe 82), from which we detect 97 clusters with z? 0.6. Benefitting from deeper data, we are 100 per cent complete in the maxBCG optically selected cluster catalogue (based on shallower single-epoch SDSS data) and find an additional 78 previously unidentified clusters. The second data set is a mock Medium Deep Survey Pan-STARRS catalogue, based on the ? cold dark matter (?CDM) model and a semi-analytic galaxy formation recipe. Knowledge of galaxy-halo memberships in the mock catalogue allows for the quantification of algorithm performance. We detect 305 mock clusters in haloes with mass >1013 h-1 M? at z? 0.6 and determine a spurious detection rate of <1 per cent, consistent with tests on the Stripe 82 catalogue. The detector performs well in the recovery of model ?CDM clusters. At the median redshift of the catalogue, the algorithm achieves >75 per cent completeness down to halo masses of 1013.4 h-1 M? and recovers >75 per cent of the total stellar mass of clusters in haloes down to 1013.8 h-1 M?. A companion paper presents the complete cluster catalogue over the full 270-deg2 Stripe 82 catalogue.

In this paper we propose the Possibilistic C-Means in Feature Space and the One-Cluster Possibilistic C-Means in Feature Space algo- rithms which are kernel methods for clustering in feature space based on the possibilistic approach to clustering. The proposed algorithms retain the properties of the possibilistic clustering, working as density estimator in feature space and showing high robustness to outliers,

Maurizio Filippone; Francesco Masulli; Stefano Rovetta

An algorithm has been designed and tested which was devised as a tool assisting the analysis of geological structures solely from orientation data. More specifically, the algorithm was intended for the analysis of geological structures that can be approached as planar and piecewise features, like many folded strata. Input orientation data is expressed as pairs of angles (azimuth and dip). The algorithm starts by considering the data in Cartesian coordinates. This is followed by a search for an initial clustering solution, which is achieved by comparing the results output from the systematic shift of a regular rigid grid over the data. This initial solution is optimal (achieves minimum square error) once the grid size and the shift increment are fixed. Finally, the algorithm corrects for the variable spread that is generally expected from the data type using a reshaped non-rigid grid. The algorithm is size-oriented, which implies the application of conditions over cluster size through all the process in contrast to density-oriented algorithms, also widely used when dealing with spatial data. Results are derived in few seconds and, when tested over synthetic examples, they were found to be consistent and reliable. This makes the algorithm a valuable alternative to the time-consuming traditional approaches available to geologists.

Iterative methods and genetic algorithms have been used separately to minimise the loss function of many representative-based clustering formulations. Neither of them alone seems to be significantly better. Moreover, the trade-off of effort vs quality slightly favours gradient descent. We present a unifying view for the three most popular loss functions: least sum of squares, its fuzzy version and the

This paper describes a multi-agent methodology for the prediction of physical crime and cyber crime. The model uses clusteringalgorithms to determine the number of agents in the environment. The preference of each of these agents is determined by feature selection. The agents are allowed to interact in a synthetic environment. The results of the interactions are measured and the

Aim at the problem that classical Euclidean distance metric cannot generate a appropriate partition for data lying in a manifold, a genetic algorithm based clustering method using geodesic distance measure is put forward. In this study, a prototype-based genetic representation is utilized, where each chromosome is a sequence of positive integer numbers that represent the k-medoids. Additionally, a geodesic distance

We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster. However, if we exclude clusters embedded in complex large-scale environments, we find that the velocity dispersion of the remaining clusters is as good an estimator of M{sub 200} as L{sub r}. The final C4 catalog will contain {approx_equal} 2500 clusters using the full SDSS data set and will represent one of the largest and most homogeneous samples of local clusters.

Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer, Hans; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U., ICG /North Carolina U. /Chicago U., Astron. Astrophys. Ctr. /Chicago U., EFI /Michigan U. /Fermilab /Princeton U. Observ. /Garching, Max Planck Inst., MPE /Pittsburgh U. /Tokyo U., ICRR /Baltimore, Space Telescope Sci. /Penn State U. /Chicago U. /Stavropol, Astrophys. Observ. /Heidelberg, Max Planck Inst. Astron. /INI, SAO

Ant clustering is one of effective clustering methods. Compares to other clustering methods, ant clusteringalgorithm has one outstanding advantage and one disadvantage. The advantage is that the total numbers of cluster is generated automatically ,and the disadvantage is that its cluster result is random and its result is influenced by the input data and the parameters, which leads low

In this study an unsupervised way of fire pixel detection from video frames is depicted. A hybrid clusteringalgorithm is proposed, depending on color samples in video frames. A modified k-mean clusteringalgorithm is used here. In this algorithm hierarchical and partition clustering are used to build the hybrid. The results are analyzed with color base threshold method by considering

In today's applications, evolving data streams are ubiquitous. Stream clusteringalgorithms were introduced to gain useful knowledge from these streams in real-time. The quality of the obtained clusterings, i.e. how good they reflect the data, can be assessed by evaluation measures. A multitude of stream clusteringalgorithms and evaluation measures for clusterings were introduced in the literature, however, until now

Philipp Kranen; Hardy Kremer; Timm Jansen; Thomas Seidl; Albert Bifet; Geoff Holmes; Bernhard Pfahringer

In order to analyze the dynamic evolution clusters in physical systems, we have come up with a new algorithm that tracks the identity of clusters as the system evolves. According to identities assigned with the help of this algorithm, some clusters evolve, some others are born and some others die all the time. The lifetime of a cluster calculated in

K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions. This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset.

Monitoring cluster evolution in data streams is a major research topic in data streams mining. Previous clustering methods for evolving data streams focus on global clustering result. It may lose critical information about individual cluster. This paper introduces some basic movements of evolution of an individual cluster. Based on the measurement of the movements, a novel algorithm called MovStream is

Liang Tang; Chang-jie Tang; Lei Duan; Chuan Li; Ye-xi Jiang; Chun-qiu Zeng; Jun Zhu

Clustering is a form of unsupervised classification that aims at grouping data points based on similarity. In this paper, we propose a new partitional clusteringalgorithm based on the notion of `contribution of a data point'. We apply the algorithm to content-based image retrieval and compare its performance with that of the k-means clusteringalgorithm. Unlike the k-means algorithm, our

Background Handling genotype data typed at hundreds of thousands of loci is very time-consuming and it is no exception for population structure inference. Therefore, we propose to apply PCA to the genotype data of a population, select the significant principal components using the Tracy-Widom distribution, and assign the individuals to one or more subpopulations using generic clusteringalgorithms. Results We investigated K-means, soft K-means and spectral clustering and made comparison to STRUCTURE, a model-based algorithm specifically designed for population structure inference. Moreover, we investigated methods for predicting the number of subpopulations in a population. The results on four simulated datasets and two real datasets indicate that our approach performs comparably well to STRUCTURE. For the simulated datasets, STRUCTURE and soft K-means with BIC produced identical predictions on the number of subpopulations. We also showed that, for real dataset, BIC is a better index than likelihood in predicting the number of subpopulations. Conclusion Our approach has the advantage of being fast and scalable, while STRUCTURE is very time-consuming because of the nature of MCMC in parameter estimation. Therefore, we suggest choosing the proper algorithm based on the application of population structure inference.

The goal of the current paper is to introduce a novel clusteringalgorithm that has been designed for grouping transcribed textual documents obtained out of audio, video segments. Since audio transcripts are normally highly erroneous documents, one of the major challenges at the text processing stage is to reduce the negative impacts of errors gained at the speech recognition stage. Other difficulties come from the nature of conversational speech. In the paper we describe the main difficulties of the spoken documents and suggest an approach restricting their negative effects. In our paper we also present a clusteringalgorithm that groups transcripts on the base of informative closeness of documents. To carry out such partitioning we give an intuitive definition of informative field of a transcript and use it in our algorithm. To assess informative closeness of the transcripts, we apply Chi-square similarity measure, which is also described in the paper. Our experiments with Chi-square similarity measure showed its robustness and high efficacy. In particular, the performance analysis that have been carried out in regard to Chi-square and three other similarity measures such as Cosine, Dice, and Jaccard showed that Chi-square is more robust to specific features of spoken documents.

This paper presents a new adaptive On-demand Weighting Algorithm based on weight, adaptive On-demand Weighting Algorithm based on signal strength (SAOW). The algorithm applied the signal strength into the clustering process, each node has four weights, the sum of signal strength weight, stability weight, information quantity weight and battery energy weight, meanwhile the cluster structure is improved using signal intensity.

Dao-Quan Li; Qi-Guang Cao; Wei-Hua Xue; Huai-Cai Wang

One of the practical issues in clustering is the specification of the appropriate number of clusters, which is not obvious when analyzing geospatial datasets, partly because they are huge (both in size and spatial extent) and high dimensional. In this paper we present a computationally efficient model-based split and merge clusteringalgorithm that incrementally finds model parameters and the number of clusters. Additionally, we attempt to provide insights into this problem and other data mining challenges that are encountered when clustering geospatial data. The basic algorithm we present is similar to the G-means and X-means algorithms; however, our proposed approach avoids certain limitations of these well-known clusteringalgorithms that are pertinent when dealing with geospatial data. We compare the performance of our approach with the G-means and X-means algorithms. Experimental evaluation on simulated data and on multispectral and hyperspectral remotely sensed image data demonstrates the effectiveness of our algorithm.

Vatsavai, Raju [ORNL; Symons, Christopher T [ORNL; Chandola, Varun [ORNL; Jun, Goo [University of Michigan

We present jClustering, an open framework for the design of clusteringalgorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913

Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J

We present jClustering, an open framework for the design of clusteringalgorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary.

Mateos-Perez, Jose Maria; Garcia-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J.

Constructing feature space by only selecting more informative words can speed up document clusteringalgorithm greatly, and the cluster quality is not affected. In this paper, firstly, the impact of feature selection on document clustering is discussed, then, a new solution for feature selection was brought forward which is based on word co-occurrence frequency. According to cluster hypothesis, the documents

Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clusteringalgorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clusteringalgorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.

The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values\\u000a prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms\\u000a which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes\\u000a algorithm uses

The goal of this study was to overcome three main shortcomings in using a single algorithm to determine a particular clustering of a phenomenon. We addressed this issue by considering cross-cultural research as a case in point and applied Multi-Algorithm Voting (MAV) methodology to cluster analysis. Specifically, this study was designed to provide more systematic supportive decision tools for researchers

Ant colony algorithm (ACA), inspired by the food-searching behavior of ants, is an evolutionary algorithm and performs well in discrete optimization. In this paper, it is used for fuzzy clustering in image segmentation. Three features such as gray value, gradient and neighborhood of the pixels, are extracted for the searching and clustering process. Unexpectedly, tests show that it is time

|Discussion of clustering of documents and queries in information retrieval systems focuses on the use of a genetic algorithm to adapt subject descriptions so that documents become more effective in matching relevant queries. Various types of clustering are explained, and simulation experiments used to test the genetic algorithm are described. (27…

For the security problems of the hierarchical P2P network (HPN), the paper presents a security clusteringalgorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.

Distinguishing potential new cluster data from outliers is a main problem in mining new pattern from evolving data streams. Meanwhile, all the clusteringalgorithms inherited from CluStream framework are distribution-based learning which are realized via a sliding window, so this problem becomes more obvious. This paper proposes a three-step clusteringalgorithm, rDenStream, based on DenStream, which includes outlier retrospect learning.

Data clustering plays an important role in many disciplines, including data mining, machine learning, bioinformatics, pattern recognition, and other fields. When there is a need to learn the inherent grouping structure of data in an unsupervised manner, ant-based clustering stand out as the most widely used group of swarm-based clusteringalgorithms. Under this perspective, this paper presents a new Adaptive

Idris El-Feghi; Mohamed Errateeb; Majid Ahmadi; M. A. Sid-Ahmed

\\u000a The aim of this paper was to introduce a new method to healthcare markets and analyze the value difference of different market\\u000a segments. Data clustering is an important data mining technology in engineering and technology. Ant colony optimization algorithm,\\u000a which is an emerging bionics evolutionary algorithm, has strong robustness and adaptability. Clustering method based on ant\\u000a colony algorithm has been

We investigate here the behavior of the standard k-means clusteringalgorithm and several alternatives to it: the k-harmonic means algorithm due to Zhang and colleagues, fuzzy k-means, Gaussian expectation-maximization, and two new variants of k-harmonic means. Our aim is to find which aspects of these algorithms contribute to finding good clusterings, as opposed to converging to a low-quality local optimum.

ObjectivePublic health surveillance requires outbreak detection algorithms with computational efficiency sufficient to handle the increasing volume of disease surveillance data. In response to this need, the authors propose a spatial clusteringalgorithm, rank-based spatial clustering (RSC), that detects rapidly infectious but non-contagious disease outbreaks.DesignThe authors compared the outbreak-detection performance of RSC with that of three well established algorithms—the wavelet anomaly

Topology Control and energy saving is one of the most important challenges in wireless sensor networks because of Special characteristics of these networks such as limited transmission range, low bandwidth, limited power resource and the impossibility of recharging. In this paper, k-fault tolerant clustering topology control algorithm is presented. This algorithm named QoS-based clustering topology control algorithm (QCTC), improves network

Azadeh Forghani; Amir Masoud Rahmani; Ahmad Khademzadeh

BACKGROUND: In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as clusteralgorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols

A clusteringalgorithm for the design of efficient vector quantizers to be followed by entropy coding is proposed. The algorithm, called entropy-constrained pairwise nearest neighbor (ECPNN), designs codebooks by merging the pair of Voronoi regions which gives the least increase in distortion for a given decrease in entropy. The algorithm can be used as an alternative to the entropy-constrained vector

Diego P. De Garrido; William A. Pearlman; Weiler A. Finamore

Standard genetic algorithms are not very suited to problems with multivariate interactions among variables. This problem has been identified from the beginning of these algorithms and has been termed as the linkage learning problem. Numerous attempts have been carried out to solve this problem with various degree of success. In this paper, we employ an effective algorithm to cluster a

Amin Nikanjam; Hadi Sharifi; B. Hoda Helmi; Adel Rahmani

A novel list scheduling technique, Heterogeneous List Scheduling technique (HLS), is presented. The technique is a procedure to design a scheduling algorithm for scheduling tasks into a heterogeneous environment. Using the technique, a Heterogeneous Dominant Sequence Clusteralgorithm (HDSC) is proposed. This algorithm uses the critical paths and dominant sequence as scheduling heuristics with the goal of minimizing the parallel

Fussy c-meanclusteralgorithm (FCM) is often used in image segmentation, but most FCM algorithm is time wasteful, for the purpose of improving segmentation efficiency, a fast segmentation algorithm based on histogram constraint is proposed. The new algorithm resample initial image to reduce data size, but reduction of data size space may cause distortion and make FCM converged to error

To make image databases more effectively organized, we present a SK (sequence clustering plus the K-mean clustering) sub-vector multi-hierarchy clusteringalgorithm in this paper. It clusters images to numbers of classes automatically, according to human perception. It utilized HSV histograms, wavelet texture features, color-texture moments, a gray gradient co-occurrence matrix, and hierarchical distribution features, to put similar semantic images into

The paper presents proposals of a new architecture and respective task scheduling algorithms for a multi-processor system\\u000a based on dynamically organised shared memory clusters. The clusters are organised around memory modules placed in a common\\u000a address space. Each memory module can be accessed through a local cluster bus and a common inter-cluster bus. Execution of\\u000a tasks in a processor is

Graph clusteringalgorithms are widely used in the analysis of biological networks. Extracting functional modules in protein-protein interaction (PPI) networks is one such use. Most clusteringalgorithms whose focuses are on finding functional modules try either to find a clique like sub networks or to grow clusters starting from vertices with high degrees as seeds. These algorithms do not make any difference between a biological network and any other networks. In the current research, we present a new procedure to find functional modules in PPI networks. Our main idea is to model a biological concept and to use this concept for finding good functional modules in PPI networks. In order to evaluate the quality of the obtained clusters, we compared the results of our algorithm with those of some other widely used clusteringalgorithms on three high throughput PPI networks from Sacchromyces Cerevisiae, Homo sapiens and Caenorhabditis elegans as well as on some tissue specific networks. Gene Ontology (GO) analyses were used to compare the results of different algorithms. Each algorithm's result was then compared with GO-term derived functional modules. We also analyzed the effect of using tissue specific networks on the quality of the obtained clusters. The experimental results indicate that the new algorithm outperforms most of the others, and this improvement is more significant when tissue specific networks are used.

Ghasemi, Mahdieh; Rahgozar, Maseud; Bidkhori, Gholamreza; Masoudi-Nejad, Ali

Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as clusteralgorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes clusteralgorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer.

Graph clusteringalgorithms are widely used in the analysis of biological networks. Extracting functional modules in protein-protein interaction (PPI) networks is one such use. Most clusteringalgorithms whose focuses are on finding functional modules try either to find a clique like sub networks or to grow clusters starting from vertices with high degrees as seeds. These algorithms do not make any difference between a biological network and any other networks. In the current research, we present a new procedure to find functional modules in PPI networks. Our main idea is to model a biological concept and to use this concept for finding good functional modules in PPI networks. In order to evaluate the quality of the obtained clusters, we compared the results of our algorithm with those of some other widely used clusteringalgorithms on three high throughput PPI networks from Sacchromyces Cerevisiae, Homo sapiens and Caenorhabditis elegans as well as on some tissue specific networks. Gene Ontology (GO) analyses were used to compare the results of different algorithms. Each algorithm's result was then compared with GO-term derived functional modules. We also analyzed the effect of using tissue specific networks on the quality of the obtained clusters. The experimental results indicate that the new algorithm outperforms most of the others, and this improvement is more significant when tissue specific networks are used. PMID:24039752

Ghasemi, Mahdieh; Rahgozar, Maseud; Bidkhori, Gholamreza; Masoudi-Nejad, Ali

We investigate difference of convex functions (DC) programming and the DC algorithm (DCA) to solve the block clustering problem in the continuous framework, which traditionally requires solving a hard combinatorial optimization problem. DC reformulation techniques and exact penalty in DC programming are developed to build an appropriate equivalent DC program of the block clustering problem. They lead to an elegant and explicit DCA scheme for the resulting DC program. Computational experiments show the robustness and efficiency of the proposed algorithm and its superiority over standard algorithms such as two-mode K-means, two-mode fuzzy clustering, and block classification EM. PMID:23777526

Le, Hoai Minh; Le Thi, Hoai An; Dinh, Tao Pham; Huynh, Van Ngai

We study the surface and the reflection clusteralgorithms, which have been developed for systems with global continuous symmetries by Niedermayer and Wolff, respectively, in the O(4) spin model in four dimensions. The surface algorithm appears not to be substantially better than local algorithms for this model. We show that the reflection algorithm has big advantages: It fights critical slowing down and due to the use of improved estimators it gives a variance reduction.

Frick, C.; Jansen, K.; Seuferling, P. (Institut fuer Theoretische Physik, Rheinisch-Westfalische Technische Hochschule Aachen, D-5100 Aachen, Federal Republic of Germany (DE) Hochstleistungsrechenzentrum c/o Kernforschungsanlage Juelich G.m.b.H., P.O. Box 1913, D-5170 Juelich, Federal Republic of Germany Deutsches Elektronen-Synchroton DESY, Hamburg, D-2000 Hamburg, Federal Republic of Germany)

An algorithm based on molecular electrostatic potential (MESP) and molecular tailoring approach (MTA) for building energetically favorable molecular clusters is presented. This algorithm is tested on prototype (CO2)n clusters with n = 13, 20, and 25 to explore their structure, energetics, and properties. The most stable clusters in this series are seen to show more number of triangular motifs. Many-body energy decomposition analysis performed on the most stable clusters reveals that the 2-body is the major contributor (>96%) to the total interaction energy. Vibrational frequencies and molecular electrostatic potentials are also evaluated for these large clusters through MTA. The MTA-based MESPs of these clusters show a remarkably good agreement with the corresponding actual ones. The most intense MTA-based normal mode frequencies are in fair agreement with the actual ones for smaller clusters. These calculated asymmetric stretching frequencies are blue-shifted with reference to the CO2 monomer.

Statistical language models frequently suffer from a lack of training data. This problem can be alleviated by clustering, because it reduces the number of free parameters that need to be trained. However, clustered models have the following drawback: if there is ``enough'' data to train an unclustered model, then the clustered variant may perform worse. On currently used language modeling

Dynamic texture (DT) is a probabilistic generative model, defined over space and time, that represents a video as the output of a linear dynamical system (LDS). The DT model has been applied to a wide variety of computer vision problems, such as motion segmentation, motion classification, and video registration. In this paper, we derive a new algorithm for clustering DT models that is based on the hierarchical EM algorithm. The proposed clusteringalgorithm is capable of both clustering DTs and learning novel DT cluster centers that are representative of the cluster members in a manner that is consistent with the underlying generative probabilistic model of the DT. We also derive an efficient recursive algorithm for sensitivity analysis of the discrete-time Kalman smoothing filter, which is used as the basis for computing expectations in the E-step of the HEM algorithm. Finally, we demonstrate the efficacy of the clusteringalgorithm on several applications in motion analysis, including hierarchical motion clustering, semantic motion annotation, and learning bag-of-systems (BoS) codebooks for dynamic texture recognition. PMID:23681990

Mumtaz, Adeel; Coviello, Emanuele; Lanckriet, Gert R G; Chan, Antoni B

In this work, two new fuzzy clustering (FC) algorithms based on Differential Evolution (DE) are proposed. Five well-known data sets viz. Iris, Wine, Glass, E. Coli and Olive Oil are used to demonstrate the effectiveness of DEFC-1 and DEFC-2. They are compared with Fuzzy C-Means (FCM) algorithm and Threshold Accepting Based Fuzzy Clusteringalgorithms proposed by Ravi et al., [1]. Xie-Beni index is used to arrive at the 'optimal' number of clusters. Based on the numerical experiments, we infer that, in terms of least objective function value, these variants can be used as viable alternatives to FCM algorithm.

Objective Public health surveillance requires outbreak detection algorithms with computational efficiency sufficient to handle the increasing volume of disease surveillance data. In response to this need, the authors propose a spatial clusteringalgorithm, rank-based spatial clustering (RSC), that detects rapidly infectious but non-contagious disease outbreaks. Design The authors compared the outbreak-detection performance of RSC with that of three well established algorithms—the wavelet anomaly detector (WAD), the spatial scan statistic (KSS), and the Bayesian spatial scan statistic (BSS)—using real disease surveillance data on to which they superimposed simulated disease outbreaks. Measurements The following outbreak-detection performance metrics were measured: receiver operating characteristic curve, activity monitoring operating curve curve, cluster positive predictive value, cluster sensitivity, and algorithm run time. Results RSC was computationally efficient. It outperformed the other two spatial algorithms in terms of detection timeliness, and outbreak localization. RSC also had overall better timeliness than the time-series algorithm WAD at low false alarm rates. Conclusion RSC is an ideal algorithm for analyzing large datasets when the application of other spatial algorithms is not practical. It also allows timely investigation for public health practitioners by providing early detection and well-localized outbreak clusters.

Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clusteringalgorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms. PMID:23390351

Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clusteringalgorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.

In kernel-based algorithms, Mercer kernel techniques have been used for improving the separability of input patterns. Although designed to tackle the problem of curse of dimensionality, non-accelerated kernel-based clusteringalgorithms fail to provide enough time efficiency for practical applications, such as medical image segmentation. For improving the time efficiency of kernel-based clustering, different speed-up schemes can be adopted. Different from

This study investigates whether a fuzzy clustering method is of any practical value in delineating urban housing submarkets relative to clustering methods based on classic (or crisp) set theory. A fuzzy c-meansalgorithm is applied to obtain fuzzy set membership degree of census tracts to housing submarkets defined within a metropolitan area. Issues of choosing algorithm parameters are discussed on

Unsupervised clustering is a powerful technique for understanding multispectral and hyperspectral images, being k-means one of the most used iterative approaches. It is a simple though computationally expensive algorithm, particularly for clustering large hyperspectral images into many categories. Software implementation presents advantages such as flexibility and low cost for implementation of complex functions. However, it presents limitations, such as difficulties

Abel Guilhermino da S. Filho; Alejandro César Frery; Cristiano Coêlho de Araújo; Haglay Alice; Jorge Cerqueira; Juliana A. Loureiro; Manoel Eusebio de Lima; Maria das Graças S. Oliveira; Michelle Matos Horta

In this paper we present an efficient hierarchical clusteringalgorithm for relational data, being those relations modeled by a graph. The hierarchical clustering approach proposed in this paper is based on divisive and link criteria, to break the graph and join the nodes at different stages. We then apply this approach to a community detection problems based on the well-known

Community mining has been the focus of many recent researches on dynamic social networks. In this paper, we propose a clustering based improved ant colony algorithm (CIACA) for community mining in social networks. The CIACA combines the local pheromone update rule with the global update rule and utilizes heuristic function to adjust the clustering solution dynamically, assisted by decay coefficient

Existing density-based data stream clusteringalgorithms use a two-phase scheme approach consisting of an online phase, in which raw data is processed to gather summary statistics, and an offline phase that generates the clusters by using the summary data. In this paper we propose a data stream clustering method based on a multi-agent system that uses a decentralized bottom-up self-organizing

Agostino Forestiero; Clara Pizzuti; Giandomenico Spezzano

Optimum geometries of silicon–germanium (Si–Ge) clusters are found using a single parent genetic algorithm. 100 atom and 150 atom clusters are studied with some variety of compositions and initial geometries. Total interaction energies, distances of Si and Ge atoms to the cluster centers, and average bond lengths are calculated. Si-core Ge-shell geometry is found to be favorable compared to other

An exact algorithm is proposed for minimum sum-of-squares nonhierarchical cluster- ing, i.e., for partitioning a given set of points from a Euclidean m-space into a given number of clusters in order to minimize the sum of squared distances from all points to the centroid of the cluster to which they belong. This problem is expressed as a constrained hyperbolic program

Multispectral image analysis of magnetic resonance imaging (MRI) data has been performed using an empirically-derived clusteringalgorithm. This algorithm groups image pixels into distinct classes which exhibit similar response in the T(sub 2) 1st and 2nd...

K. M. Horn G. C. Osbourn A. M. Bouchard J. A. Sanders

An approach to construct interpretable and precise fuzzy models from data is proposed. Interpretability, which is one of the most important features of fuzzy models, is analyzed first. Then a modified fuzzy clusteringalgorithm, combined with the least square method, is used to identify the initial fuzzy model. Third, the multi-objective genetic algorithm and interpretability-driven simplification techniques are proposed to

We present a fast general-purpose algorithm for high-throughput clustering of data “with a two-dimensional organization”. The algorithm is designed to be implemented with FPGAs or custom electronics. The key feature is a processing time that scales linearly with the amount of data to be processed. This means that clustering can be performed in pipeline with the readout, without suffering from combinatorial delays due to looping multiple times through all the data. This feature makes this algorithm especially well suited for problems where the data have high density, e.g. in the case of tracking devices working under high-luminosity condition such as those of LHC or super-LHC. The algorithm is organized in two steps: the first step (core) clusters the data; the second step analyzes each cluster of data to extract the desired information. The current algorithm is developed as a clustering device for modern high-energy physics pixel detectors. However, the algorithm has much broader field of applications. In fact, its core does not specifically rely on the kind of data or detector it is working for, while the second step can and should be tailored for a given application. For example, in case of spatial measurement with silicon pixel detectors, the second step performs center of charge calculation. Applications can thus be foreseen to other detectors and other scientific fields ranging from HEP calorimeters to medical imaging. An additional advantage of this two steps approach is that the typical clustering related calculations (second step) are separated from the combinatorial complications of clustering. This separation simplifies the design of the second step and it enables it to perform sophisticated calculations achieving offline quality in online applications. The algorithm is general purpose in the sense that only minimal assumptions on the kind of clustering to be performed are made.

K-means clustering is a very popular clustering technique which is used in numerous applications. Given a set of n data points in R(exp d) and an integer k, the problem is to determine a set of k points R(exp d), called centers, so as to minimize the mean...

C. Piatko D. M. Mount N. S. Netanyahu R. Silverman T. Kanungo

Experiments for formation of secondary organic aerosol (SOA) from photooxidation of 1,3,5-trimethylbenzene in the CH3ONO/NO/air mixture were carried out in the laboratory chamber. The size and chemical composition of the resultant individual particles were measured in real-time by an aerosol laser time of flight mass spectrometer (ALTOFMS) recently designed in our group. We also developed Fuzzy C-Means (FCM) algorithm to classify the mass spectra of large numbers of SOA particles. The study first started with mixed particles generated from the standards benzaldehyde, phenol, benzoic acid, and nitrobenzene solutions to test the feasibility of application of the FCM. The FCM was then used to extract out potential aerosol classes in the chamber experiments. The results demonstrate that FCM allowed a clear identification of ten distinct chemical particle classes in this study, namely, 3,5-dimethylbenzoic acid, 3,5-dimethylbenzaldehyde, 2,4,6-trimethyl-5-nitrophenol, 2-methyl-4-oxo-2-pentenal, 2,4,6-trimethylphenol, 3,5-dimethyl-2-furanone, glyoxal, and high-molecular-weight (HMW) components. Compared to offline method such as gas chromatography-mass spectrometry (GC-MS) measurement, the real-time ALTOFMS detection approach coupled with the FCM data processing algorithm can make cluster analysis of SOA successfully and provide more information of products. Thus ALTOFMS is a useful tool to reveal the formation and transformation processes of SOA particles in smog chambers.

Image segmentation remains one of the major challenges in image analysis. Many segmentation algorithms have been developed for various applications. Unsatisfactory results have been encountered in some cases, for many existing segmentation algorithms. In this paper, we introduce three modified versions of the conventional moving k-means clusteringalgorithm called the fuzzy moving k-means, adaptive moving k-means and adaptive fuzzy moving

We present a method to efficiently solve cardiac membrane models using a novel unsupervised clusteringalgorithm. The unsupervised clusteringalgorithm was designed to handle repeated clustering of multidimensional objects with rapidly changing properties. A Modified Trie datastructure that allowed efficient search, scalable and distributed assembly of the result was designed. The method was applied to solve monodomain models of cardiac tissue with highly non-linear reaction elements. We demonstrate the versatility and advantages of using the method by subjecting the tissue to various spatial excitation patterns. PMID:19164063

In this paper, three different clusteringalgorithms were applied to assemble infrared (IR) spectral maps from IR microspectra of tissues. Using spectra from a colorectal adenocarcinoma section, we show how IR images can be assembled by agglomerative hierarchical (AH) clustering (Ward's technique), fuzzy C-means (FCM) clustering, and k-means (KM) clustering. We discuss practical problems of IR imaging on tissues such

Peter Lasch; Wolfgang Haensch; Dieter Naumann; Max Diem

This paper summarizes ongoing research and development efforts involving the design, modeling, simulation, experimentation, and analysis of distributed, parallel computing algorithms and architectures for autonomous sonar arrays. Several fine-grain interconnection network models have been developed including unidirectional linear array, bidirectional linear array, and unidirectional register-insertion ring. Basic time- and frequency- domain algorithms for parallel beamforming have been designed and implemented,

A. George; W. Rosen; J. Markwell; L. Hopwood; R. Fogarty

An algorithm for detection and identification of image clusters or {open_quotes}blobs{close_quotes} based on color information for an autonomous mobile robot is developed. The input image data are first processed using a crisp color fuszzyfier, a binary smoothing filter, and a median filter. The processed image data is then inputed to the image clusters detection and identification program. The program employed the concept of {open_quotes}elastic rectangle{close_quotes}that stretches in such a way that the whole blob is finally enclosed in a rectangle. A C-program is develop to test the algorithm. The algorithm is tested only on image data of 8x8 sizes with different number of blobs in them. The algorithm works very in detecting and identifying image clusters.

In k-means clustering, we are given a set of n data points in d-dimensional space Rd and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982)

Tapas Kanungo; David M. Mount; Nathan S. Netanyahu; Christine D. Piatko; Ruth Silverman; Angela Y. Wu

We present a Bayesian approach for segmenting a sequence of gray-scale images to obtain a binary sketch. We extend a 2-D algorithm to video sequences. The 2-D algorithm is an adaptive thresholding scheme that uses spatial constraints and takes into consideration the local intensity characteristics of the image. We model the segmentation distribution as a 3-D Gibbs random field. We

Evolutionary computation can increase the speed and accuracy of pattern recognition in multispectral images, for example, in automatic target tracking. The first method treats the clustering process. It determines a cluster of pixels around specified reference pixels so that the entire cluster is increasingly representative of the search object. An initial population (of clusters) evolves into populations of new clusters, with each cluster having an assigned fitness score. This population undergoes iterative mutation and selection. Mutation operators alter both the pixel cluster set cardinality and composition. Several stopping criteria can be applied to terminate the evolution. An advantage of this evolutionary cluster formulation is that the resulting cluster may have an arbitrary shape so that it most nearly fits the search pattern. The second algorithm automates the selection of features (the center-frequency and the bandwidth) for each population member. For each pixel in the image and for each population member, the Mahalanobis distance to the reference set is calculated and a decision is made whether or not this pixel belongs to a target. The sum of correct and false decisions defines a Receiver Operating Curve, which is used to measure the fitness of a population member. Based on this fitness, the algorithm decides which population members to use as parents for the next iteration.

Burgin, George H.; Kagey, H. Price; Jafolla, James C.

An algorithm for intrusion detection based on improved evolutionary semi- supervised fuzzy clustering is proposed which is suited for situation that gaining labeled data is more difficulty than unlabeled data in intrusion detection systems. The algorithm requires a small number of labeled data only and a large number of unlabeled data and class labels information provided by labeled data is used to guide the evolution process of each fuzzy partition on unlabeled data, which plays the role of chromosome. This algorithm can deal with fuzzy label, uneasily plunges locally optima and is suited to implement on parallel architecture. Experiments show that the algorithm can improve classification accuracy and has high detection efficiency.

Clustering is inherently a difficult problem, both with respect to the construction of adequate objective functions as well as to the optimization of the objective functions. In this paper, we suggest an objective function called the Weighted Sum Validity Function (WSVF), which is a weighted sum of the several normalized cluster validity functions. Further, we propose a Hybrid Niching Genetic Algorithm (HNGA), which can be used for the optimization of the WSVF to automatically evolve the proper number of clusters as well as appropriate partitioning of the data set. Within the HNGA, a niching method is developed to preserve both the diversity of the population with respect to the number of clusters encoded in the individuals and the diversity of the subpopulation with the same number of clusters during the search. In addition, we hybridize the niching method with the k-means algorithm. In the experiments, we show the effectiveness of both the HNGA and the WSVF. In comparison with other related genetic clusteringalgorithms, the HNGA can consistently and efficiently converge to the best known optimum corresponding to the given data in concurrence with the convergence result. The WSVF is found generally able to improve the confidence of clustering solutions and achieve more accurate and robust results. PMID:16366242

Background Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. Results Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91), k-Modes-RVF (0.81), New Fuzzy k-Modes (0.80), k-Modes (0.76), k-Modes-Hybrid 1 (0.76), k-Modes-Hybrid 2 (0.75), Fuzzy k-Modes (0.74), and k-Modes-UAVM (0.70). Conclusions The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k)) and considered to be linear.

The clustering of sensor nodes is an effective topology control approach that can balance the load on sensor nodes and increase network scalability and lifetime. But the environments of sensor networks are inherently dynamic and they are continuously in change (e.g. military environments). In addition the sensor nodes have limited energy supply and after consumption of it they will die,

The multilevel fast multipole method (MLFMM) speeds up the matrix-vector multiplication in the iterative solution of the matrix equation arising in the boundary-element method. It makes use of a spatial hierarchy of the elements; elements are arranged into groups, and those groups are arranged into higher level groups, and so on. Ideally, the groups should be widely separated clusters. If

Clustering is a fundamental data analysis method. It is widely used for pattern recognition, feature extraction, vector quantization (VQ), image segmentation, function approximation, and data mining. As an unsupervised classification technique, clustering identifies some inherent structures present in a set of objects based on a similarity measure. Clustering methods can be based on statistical model identification (McLachlan & Basford, 1988) or competitive learning. In this paper, we give a comprehensive overview of competitive learning based clustering methods. Importance is attached to a number of competitive learning based clustering neural networks such as the self-organizing map (SOM), the learning vector quantization (LVQ), the neural gas, and the ART model, and clusteringalgorithms such as the C-means, mountain/subtractive clustering, and fuzzy C-means (FCM) algorithms. Associated topics such as the under-utilization problem, fuzzy clustering, robust clustering, clustering based on non-Euclidean distance measures, supervised clustering, hierarchical clustering as well as cluster validity are also described. Two examples are given to demonstrate the use of the clustering methods. PMID:19758784

WHO-ART was developed by the WHO collaborating centre for international drug monitoring in order to code adverse drug reactions. We assume that computation of semantic distance between WHO-ART terms may be an efficient way to group related medical conditions in the WHO database in order to improve signal detection. Our objective was to develop a method for clustering WHO-ART terms according to some proximity of their meanings. Our material comprises 758 WHO-ART terms. A formal definition was acquired for each term as a list of elementary concepts belonging to SNOMED international axes and characterized by modifier terms in some cases. Clustering was implemented as a terminology service on a J2EE server. Two different unsupervised machine learning algorithms (KMeans, Pvclust) clustered WHO-ART terms according to a semantic distance operator previously described. Pvclust grouped 51% of WHO-ART terms. K-Means grouped 100% of WHO-ART terms but 25% clusters were heterogeneous with k = 180 clusters and 6% clusters were heterogeneous with k = 32 clusters. Clusteringalgorithms associated to semantic distance could suggest potential groupings of WHO-ART terms that need validation according to the user's requirements. PMID:17238365

WHO-ART was developed by the WHO collaborating centre for international drug monitoring in order to code adverse drug reactions. We assume that computation of semantic distance between WHO-ART terms may be an efficient way to group related medical conditions in the WHO database in order to improve signal detection. Our objective was to develop a method for clustering WHO-ART terms according to some proximity of their meanings. Our material comprises 758 WHO-ART terms. A formal definition was acquired for each term as a list of elementary concepts belonging to SNOMED international axes and characterized by modifier terms in some cases. Clustering was implemented as a terminology service on a J2EE server. Two different unsupervised machine learning algorithms (KMeans, Pvclust) clustered WHO-ART terms according to a semantic distance operator previously described. Pvclust grouped 51% of WHO-ART terms. K-Means grouped 100% of WHO-ART terms but 25% clusters were heterogeneous with k = 180 clusters and 6% clusters were heterogeneous with k = 32 clusters. Clusteringalgorithms associated to semantic distance could suggest potential groupings of WHO-ART terms that need validation according to the user’s requirements.

Data mining is a process that uses technology to bridge the gap between data and logical decision-making. The jargon itself offers a promising view of organized data manipulation for extracting valuable information and knowledge from high volume of data. Copious techniques are developed to fulfill this aspiration. This paper outlines an ant colony optimization algorithm which is used newly in

Next generation sequencing (NGS) revolutionized genomic data generation by enabling high-throughput parallel sequencing, making large scale genomic data analysis a crucial task. To improve NGS data quality, we developed an efficient algorithm that uses a flexible read decomposition method to improve accuracy of error correction. We further proposed a statistical framework to differentiate infrequently observed subreads from sequencing errors in

In this paper, we present a technique for on-line segment-based map building in an unknown indoor environment from sonar sensor observations. The world model is represented with two-dimensional line segments. The information obtained by the ultrasonic sensors is updated instantaneously while the mobile robot is moving through the workspace. An Enhanced Adaptive Fuzzy ClusteringAlgorithm (EAFC) along with Noise Clustering

Ip Ying-leung; Ahmad B. Rad; K. M. Chow; Yiu-kwong Wong

High-throughput spectrometers are capable of producing data sets containing thousands of spectra for a single biological sample. These data sets contain a substantial amount of redundancy from peptides that may get selected multiple times in a LC-MS/MS experiment. In this paper, we present an efficient algorithm, CAMS (ClusteringAlgorithm for Mass Spectra) for clustering mass spectrometry data which increases both the sensitivity and confidence of spectral assignment. CAMS utilizes a novel metric, called F-set, that allows accurate identification of the spectra that are similar. A graph theoretic framework is defined that allows the use of F-set metric efficiently for accurate cluster identifications. The accuracy of the algorithm is tested on real HCD and CID data sets with varying amounts of peptides. Our experiments show that the proposed algorithm is able to cluster spectra with very high accuracy in a reasonable amount of time for large spectral data sets. Thus, the algorithm is able to decrease the computational time by compressing the data sets while increasing the throughput of the data by interpreting low S/N spectra. PMID:23471471

Saeed, Fahad; Pisitkun, Trairak; Knepper, Mark A; Hoffert, Jason D

Underwater wireless sensor networks (UWSNs) is a novel networking paradigm to explore aqueous environments. The characteristics\\u000a of mobile UWSNs, such as low communication bandwidth, large propagation delay, floating node mobility, and high error probability,\\u000a are significantly different from terrestrial wireless sensor networks. Energy-efficient communication protocols are thus urgently\\u000a demanded in mobile UWSNs. In this paper, we develop a novel clustering

We analyze the utility and scalability of extended compact genetic algorithm (eCGA)—a genetic algorithm (GA) that automatically and adaptively mines the regularities of the fitness landscape using machine learning methods and information theoretic measures—for ground state optimization of clusters. In order to reduce the computational time requirements while retaining the high reliability of predicting near-optimal structures, we employ two efficiency-enhancement

A distance-mapping algorithm takes a set of objects and a distance metric and then maps those objects to a Euclidean or pseudoEuclidean space in such a way that the distances among objects are approximately preserved. Distance mapping algorithms are a useful tool for clustering and visualization in data intensive applications, because they replace expensive distance calculations by sum-of-square calculations. This

Jason Tsong-Li Wang; Xiong Wang; King-Ip Lin; Dennis Shasha; Bruce A. Shapiro; Kaizhong Zhang

Brain MR Images corrupted by RF-Inhomogeneity exhibit brightness variations in such a way that a standard Fuzzy C-Means (fcm) segmentation algorithm fails. As a consequence, modified versions of the algorithm can be found in literature, which take\\u000a into account the artifact. In this work we show that the application of a suitable pre-processing algorithm, already presented\\u000a by the authors, followed

Hierarchical graphs and clustered graphs are useful non-classical graph models for structured relational information. Hierarchical\\u000a graphs are\\u000a graphs with layering structures; clustered graphs are graphs with\\u000a recursive clustering structures. Both have applications in CASE tools, software visualization and VLSI design. Drawing algorithms\\u000a for hierarchical\\u000a graphs have been well investigated. However, the problem of planar straight-line representation has not been solved

Peter Eades; Qing-wen Feng; Xuemin Lin; Hiroshi Nagamochi

\\u000a A new robust clustering scheme based on fuzzy c-means is proposed and the concept of a fuzzy mega-cluster is introduced in\\u000a this paper. The fuzzy mega-cluster is conceptually similar to the noise cluster, designed to group outliers in a separate\\u000a cluster. This proposed scheme, called the mega-clusteringalgorithm is shown to be robust against outliers. Another interesting\\u000a property is its

Image segmentation is the first important step to image analysis and image processing. In this paper, according to color crops image characteristics, we firstly transform the color space of image from RGB to HIS, and then select proper initial clustering center and cluster number in application of mean-variance approach and rough set theory followed by clustering calculation in such a way as to automatically segment color component rapidly and extract target objects from background accurately, which provides a reliable basis for identification, analysis, follow-up calculation and process of crops images. Experimental results demonstrate that improved k-means clusteringalgorithm is able to reduce the computation amounts and enhance precision and accuracy of clustering.

This paper seeks an answer to the question: Can the fuzzy k-means (FKM), c-means (FCM), kernelized FCM (KFCM), and spatial constrained (SKFCM) work automatically without pre-define number of clusters. We present automatic fuzzy algorithms with considering some spatial constraints on the objective function. The algorithms incorporate spatial information into the membership function and the validity procedure for clustering. We use

Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clusteringalgorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive ClusteringAlgorithm (WACA), and the Scenario-based ClusteringAlgorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms.

Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clusteringalgorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive ClusteringAlgorithm (WACA), and the Scenario-based ClusteringAlgorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms. PMID:22163905

Economic dispatch is a highly constrained optimization problem encompassing interaction among decision variables. Environmental concerns that arise due to the operation of fossil fuel fired electric generators, transforms the classical problem into multiobjective environmental\\/economic dispatch (EED). In this paper, a fuzzy clustering-based particle swarm (FCPSO) algorithm has been proposed to solve the highly constrained EED problem involving conflicting objectives. FCPSO

Shubham Agrawal; B. K. Panigrahi; Manoj Kumar Tiwari

Image segmentation is a key step in most medical image analysis. However, the process is particularly difficult due to limitation of the imaging equipments and variation in biological system. Therefore, accurate and robust segmentation are important for quantitative assessment of medical images in order to achieve correct clinical diagnosis. This paper studies the performance of clustering and adaptive thresholding algorithms

Vehicular Ad-Hoc Networks (VANETs) offer com- munication between vehicles and infrastructure. Warning mes- sages, among others, can be used to alert drivers, and thus improve road safety. To adapt to the unique nature of VANETs, which demands the delivery of time sensitive messages to nearby vehicles, fast topology control and scheduling algorithms are required. A clustering approach, which was initially

Optimizing energy consumption is the main concern for designing and planning the operation of the Wireless Sensor Networks (WSNs). Clustering technique is one of the methods utilized to extend lifetime of the network by applying data aggregation and balancing energy consumption among sensor nodes of the network. In this paper, we propose the recently developed, Harmony Search Algorithm (HSA) for

Call graphs are commonly used as input for automatic clusteringalgorithms, the goal of which is to extract the high level structure of the program under study. Determin- ing the call graph for a procedural program is fairly simple. However, this is not the case for programs written in object- oriented languages, due to polymorphism. A number of al- gorithms

Derek Rayside; Steve Reuss; Erik Hedges; Kostas Kontogiannis

This paper describes the development of a non-heuristic algorithm for solving group techology problems. The problem is first formulated as a bipartite graph, and then an expression for the upper limit to the number of groups is derived. Using this limit, a non-hierarchical clustering method is adopted for grouping components into families and machines into cells. After diagonally correlating the

We show that fundamental versions of the Deutsch-Jozsa and Bernstein-Vazirani quantum algorithms can be performed using a small entangled cluster state resource of only six qubits. We then investigate the minimal resource states needed to demonstrate general n-qubit versions and a scalable method to produce them. For this purpose, we propose a versatile photonic on-chip setup.

Tame, M. S.; Kim, M. S. [QOLS, Blackett Laboratory, Imperial College London, Prince Consort Road, SW7 2BW, United Kingdom and Institute for Mathematical Sciences, Imperial College London, SW7 2PG (United Kingdom)

De Soete (1986, 1988) proposed a variable weighting procedure when Euclidean distance is used as the dissimilarity measure with an ultrametric hierarchical clustering method. The algorithm produces weighted distances which approximate ultrametric distances as closely as possible in a least squares sense. The present simulation study examined the effectiveness of the De Soete procedure for an applications problem for which

Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,0...

This paper represents one of many studies funded under AgRISTARS to determine how Landsat can contribute to the Forest Service's Renewable Resource Inventory effort as mandated by the National Forest Management Act of 1976. The specific objective of this study was to test clusteringalgorithms used at Johnson Space Flight Center - ISOCLS and CLASSY.

This paper deals with the global optimization algorithm of the Bilinear Matrix Inequalities (BMIs) based on the Unimodal Normal Distribution Crossover (UNDX) GA. First, analyzing the structure of the BMIs, the existence of the typical difficult structures is confirmed. Then, in order to improve the performance of algorithm, based on results of the problem structures analysis and consideration of BMIs characteristic properties, we proposed the algorithm using primary search direction with relaxed Linear Matrix Inequality (LMI) convex estimation. Moreover, in these algorithms, we propose two types of evaluation methods for GA individuals based on LMI calculation considering BMI characteristic properties more. In addition, in order to reduce computational time, we proposed parallelization of RCGA algorithm, Master-Worker paradigm with cluster computing technique.

Background The goal of the study was to demonstrate a hierarchical structure of resting state activity in the healthy brain using a data-driven clusteringalgorithm. Methodology/Principal Findings The fuzzy-c-meansclusteringalgorithm was applied to resting state fMRI data in cortical and subcortical gray matter from two groups acquired separately, one of 17 healthy individuals and the second of 21 healthy individuals. Different numbers of clusters and different starting conditions were used. A cluster dispersion measure determined the optimal numbers of clusters. An inner product metric provided a measure of similarity between different clusters. The two cluster result found the task-negative and task-positive systems. The cluster dispersion measure was minimized with seven and eleven clusters. Each of the clusters in the seven and eleven cluster result was associated with either the task-negative or task-positive system. Applying the algorithm to find seven clusters recovered previously described resting state networks, including the default mode network, frontoparietal control network, ventral and dorsal attention networks, somatomotor, visual, and language networks. The language and ventral attention networks had significant subcortical involvement. This parcellation was consistently found in a large majority of algorithm runs under different conditions and was robust to different methods of initialization. Conclusions/Significance The clustering of resting state activity using different optimal numbers of clusters identified resting state networks comparable to previously obtained results. This work reinforces the observation that resting state networks are hierarchically organized.

Lee, Megan H.; Hacker, Carl D.; Snyder, Abraham Z.; Corbetta, Maurizio; Zhang, Dongyang; Leuthardt, Eric C.; Shimony, Joshua S.

The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models. All features of the package are illustrated in great detail. The utility of the package is highlighted through a small comparison study of several popular clusteringalgorithms.

Melnykov, Volodymyr [University of Alabama, Tuscaloosa; Chen, Wei-Chen [ORNL; Maitra, Ranjan [Iowa State University

The space-fixed genetic algorithm originally proposed by Niesse and Mayne [J. Chem. Phys. 105, 4700 (1996)] is modified and used to study the lowest energy structure of small silicon clusters by employing empirical interatomic potentials. In this new space-fixed genetic algorithm, a gradient-free simplex method, rather than the conventional gradient-driven conjugate gradient minimization employed by Niesse and Mayne, is selected by virtue of its flexibility and applicability to any form of interatomic potentials for which the calculation of derivatives is difficult. Using two empirical three-body potentials, we calculated the ground state structure up to Si15 successfully using this new genetic algorithm based on the simplex method. The effect of angular dependent three-body potentials on the cluster structures is examined and compared with the experimental results.

We have recently discovered the configuration existence theorem for N electrons in magnetic and circularly polarized fields stating that the maximum number of configurations may be the product of all differential foldings [1] (maximum number of times the ZVS gradient manifold is cut by even lower dimensional arbitrary plane to disjoint sets): There may be for example at least 2^180=1532495540865888858358347027150309183618739122183602176 maximum number of configurations of electrons corresponding to the complexity of carbon C60 (N=60) assuming the lowest nontrivial folding 2 (parabola-like). We use therefore genetic algorithm to find possible classical Wiger crystal allotropes leading quantum configurations for large number of electrons. We find several distinct configurations for large number of electrons. The genetic operations on configuration spiecies are also discussed. [1] M. Kalinski, L. Hansen, and D. Farrelly, Phys. Rev. Lett. 95, 103001 (2005).

An unavoidable problem of metal structures is their exposure to rust degradation during their operational life. Thus, the surfaces need to be assessed in order to avoid potential catastrophes. There is considerable interest in the use of patch repair strategies which minimize the project costs. However, to operate such strategies with confidence in the long useful life of the repair, it is essential that the condition of the existing coatings and the steel substrate can be accurately quantified and classified. This paper describes the application of fuzzy set theory for steel surfaces classification according to the steel rust time. We propose a semi-automatic technique to obtain image clustering using the Fuzzy C-means (FCM) algorithm and we analyze two kinds of data to study the classification performance. Firstly, we investigate the use of raw images" pixels without any pre-processing methods and neighborhood pixels. Secondly, we apply Gaussian noise to the images with different standard deviation to study the FCM method tolerance to Gaussian noise. The noisy images simulate the possible perturbations of the images due to the weather or rust deposits in the steel surfaces during typical on-site acquisition procedures

We propose a new method by incorporating improved k-means and modified fuzzy c-meansclustering techniques for segmenting medical images. Segmentation of medical images plays a key role in estimating the object boundary and abnormalities, if any. Moreover, it is an important process to extract more information from complex medical images. Magnetic resonance images may contain noise due to the operator

In this paper, we attempt to analyze currency crises within the decision theory framework. In this regard, we employ fuzzy system modeling with fuzzy C-means (FCM) clustering to develop perception based decision matrix. We try to build a prescriptive model in order to determine the best approximate reasoning schemas. We use the underlying behavior of the market participants during the

Endocrine signaling provides one critical means of physiological communication within an organism. Many endocrine signals exhibit an episodic or pulsatile configuration. In an effort to provide a versatile and statistically based algorithm for investigating the regulation of endocrine pulse signals, we have formulated a computerized algorithm in which a pulse is defined as a statistically significant increase in a "cluster" of hormone values followed by a statistically significant decrease in a second cluster of values. The increase or decrease is judged in relation to the actual experimental error expressed by the replicates in the presumptive nadir and peak data clusters. The program permits the operator to specify the cluster sizes of test peaks and pre- and postpeak nadirs. This method is largely insensitive to unstable base-line hormone concentrations and is not adversely affected by varying pulse amplitudes, widths, or configurations within the endocrine series. In addition, the simple statistical basis for this algorithm renders it minimally dependent on explicit or a priori assumptions about rates of hormone secretion or disappearance. The program has been validated for false-positive errors against a wide range of intraseries coefficients of variation (4-52%). We have illustrated its performance for profiles of luteinizing hormone, follicle-stimulating hormone, growth hormone, prolactin, adrenocorticotropic hormone, and cortisol and compared these episodic patterns with those of stable serum constituents (total serum protein and calcium), which do not exhibit pulsatile fluctuation. PMID:3008572

Image segmentation is often required as a preliminary and indispensable stage in the computer aided medical image process, particularly during the clinical analysis of magnetic resonance (MR) brain image. Fuzzy c-means (FCM) clusteringalgorithm has been widely used in many medical image segmentations. However, the conventionally standard FCM algorithm is sensitive to noise because of not taking into account the

Within the learning framework of maximum weighted likelihood (MWL) proposed by Cheung, 2004 and 2005, this paper will develop a batch Rival Penalized Expectation-Maximization (RPEM) algorithm for density mixture clustering provided that all observations are available before the learning process. Compared to the adaptive RPEM algorithm in Cheung, 2004 and 2005, this batch RPEM need not assign the learning rate analogous to the Expectation-Maximization (EM) algorithm (Dempster et al., 1977), but still preserves the capability of automatic model selection. Further, the convergence speed of this batch RPEM is faster than the EM and the adaptive RPEM in general. The experiments show the superior performance of the proposed algorithm on the synthetic data and color image segmentation.

The Chandra X-ray Observatory (Chandra) is producing images with outstanding spatial resolution using low-noise, fast-readout CCDs. Among many other things, X-ray images and spectra help astronomers study star formation and galactic evolution. Currently, X-ray astronomers classify one X-ray source at a time by visual inspection and use of model-fitting software. This approach is useful for studying the physics of bright individual sources but is time consuming for analyzing large images of rich fields of X-ray sources, such as stellar clusters. Objective and efficient techniques from the fields of multivariate statistics, pattern recognition, and hyperspectral image processing, are needed to analyze the growing Chandra image archive. An image processing algorithm has been developed that orders the given X-ray sources based on hard versus soft X-ray emission and then groups the ordered X-ray sources into clusters based on their spectral attributes. The algorithm was applied to imaging spectroscopy of the Orion Nebula Cluster (ONC) population of more than 1000 X-ray emitting stars. As an initial test of the algorithm, images of the ONC from the Chandra archive were analyzed. The final spectral classification algorithm was applied to a sample of sources selected from among the more than 1600 X-ray sources detected in the Chandra Orion Ultradeep Project. Clustering results have been compared with known optical and infrared properties of the population of the ONC to assess the algorithm's ability to identify groups of sources that share common attributes.

The three-dimensional (3-D) shape of microcalcification clusters is an important indicator in early breast cancer detection. In fact, there is a relationship between the cluster topology and the type of lesion (malignant or benign). This paper presents a 3-D reconstruction method for such clusters using two 2-D views acquired during standard mammographic examinations. For this purpose, the mammographic unit was modeled using a camera with virtual optics. This model was used to calibrate the acquisition unit and then to reconstruct the clusters in the 3-D space after microcalcification segmentation and matching. The proposed model is hardware independent since it is suitable for digital mammographic units with different geometries and with various physical acquisition principles. Three-dimensional reconstruction results are presented here to prove the validity of the method. Tests were first performed using a phantom with a well-known geometry. The latter contained X-ray opaque glass balls representing microcalcifications. The positions of these balls were reconstructed with a 16.25-microm mean accuracy. This very high inherent algorithm accuracy is more than enough for a precise 3-D cluster representation. Further validation tests were carried out using a second phantom including a spherical cluster. This phantom was built with materials simulating the behavior of both mammary tissue and microcalcifications toward Xrays. The reconstructed shape was effectively spherical. Finally, reconstructions were carried out for real clusters and their results are also presented. PMID:16366229

Background With the availability of large-scale, high-density single-nucleotide polymorphism (SNP) markers, substantial effort has been made in identifying disease-causing genes using linkage disequilibrium (LD) mapping by haplotype analysis of unrelated individuals. In addition to complex diseases, many continuously distributed quantitative traits are of primary clinical and health significance. However the development of association mapping methods using unrelated individuals for quantitative traits has received relatively less attention. Results We recently developed an association mapping method for complex diseases by mining the sharing of haplotype segments (i.e., phased genotype pairs) in affected individuals that are rarely present in normal individuals. In this paper, we extend our previous work to address the problem of quantitative trait mapping from unrelated individuals. The method is non-parametric in nature, and statistical significance can be obtained by a permutation test. It can also be incorporated into the one-way ANCOVA (analysis of covariance) framework so that other factors and covariates can be easily incorporated. The effectiveness of the approach is demonstrated by extensive experimental studies using both simulated and real data sets. The results show that our haplotype-based approach is more robust than two statistical methods based on single markers: a single SNP association test (SSA) and the Mann-Whitney U-test (MWU). The algorithm has been incorporated into our existing software package called HapMiner, which is available from our website at . Conclusion For QTL (quantitative trait loci) fine mapping, to identify QTNs (quantitative trait nucleotides) with realistic effects (the contribution of each QTN less than 10% of total variance of the trait), large samples sizes (? 500) are needed for all the methods. The overall performance of HapMiner is better than that of the other two methods. Its effectiveness further depends on other factors such as recombination rates and the density of typed SNPs. Haplotype-based methods might provide higher power than methods based on a single SNP when using tag SNPs selected from a small number of samples or some other sources (such as HapMap data). Rank-based statistics usually have much lower power, as shown in our study.

This paper presents a Cluster-based Dynamic Differential Evolution with external Archive (CDDE_Ar) for global optimization in dynamic fitness landscape. The algorithm uses a multipopulation method where the entire population is partitioned into several clusters according to the spatial locations of the trial solutions. The clusters are evolved separately using a standard differential evolution algorithm. The number of clusters is an adaptive parameter, and its value is updated after a certain number of iterations. Accordingly, the total population is redistributed into a new number of clusters. In this way, a certain sharing of information occurs periodically during the optimization process. The performance of CDDE_Ar is compared with six state-of-the-art dynamic optimizers over the moving peaks benchmark problems and dynamic optimization problem (DOP) benchmarks generated with the generalized-dynamic-benchmark-generator system for the competition and special session on dynamic optimization held under the 2009 IEEE Congress on Evolutionary Computation. Experimental results indicate that CDDE_Ar can enjoy a statistically superior performance on a wide range of DOPs in comparison to some of the best known dynamic evolutionary optimizers. PMID:23096074

\\u000a In streaming time series the Clustering problem is more complex, since the dynamic nature of streaming data makes previous\\u000a clustering methods inappropriate. In this paper, we propose firstly a new method to evaluate Clustering in streaming time\\u000a series databases. First, we introduce a novel multi-resolution PAA (MPAA) transform to achieve our iterative clusteringalgorithm.\\u000a The method is based on the

Jessica Lin; Michail Vlachos; Eamonn J. Keogh; Dimitrios Gunopulos; Jian-wei Liu; Shou-jian Yu; Jia-jin Le

This paper presents various architectural options for implementing a K-Means Re-Clusteringalgorithm suitable for unsupervised segmentation of hyperspectral images. Performance metrics are developed based upon quantitative comparisons of convergence rates and segmentation quality. A methodology for making these comparisons is developed and used to establish K values that produce the best segmentations with minimal processing requirements. Convergence rates depend on the initial choice of cluster centers. Consequently, this same methodology may be used to evaluate the effectiveness of different initialization techniques.

Synthetic aperture side-scan sonar (SAS) is an imaging modality for detecting objects on the sea floor and in shallow water. SAS images provide an echo of an object along with its acoustic shadow; both of which can be used in the classification and localization of the object. We developed a Fuzzy C-Means (FCM) image segmentation algorithm that segments the echo

\\u000a In this paper, we propose an effective method that detects fire automatically. The proposed algorithm is composed of four\\u000a stages. In the first stage, an approximate median method is used to detect moving regions. In the second stage, a fuzzy c-means\\u000a (FCM) algorithm based on the color of fire is used to select candidate fire regions from these moving regions.

Suggestions for improving the Basin-Hopping Monte Carlo (BHMC) algorithm for unbiased global optimization of clusters and nanoparticles are presented. The traditional basin-hopping exploration scheme with Monte Carlo sampling is improved by bringing together novel strategies and techniques employed in different global optimization methods, however, with the care of keeping the underlying algorithm of BHMC unchanged. The improvements include a total of eleven local and nonlocal trial operators tailored for clusters and nanoparticles that allow an efficient exploration of the potential energy surface, two different strategies (static and dynamic) of operator selection, and a filter operator to handle unphysical solutions. In order to assess the efficiency of our strategies, we applied our implementation to several classes of systems, including Lennard-Jones and Sutton-Chen clusters with up to 147 and 148 atoms, respectively, a set of Lennard-Jones nanoparticles with sizes ranging from 200 to 1500 atoms, binary Lennard-Jones clusters with up to 100 atoms, (AgPd)55 alloy clusters described by the Sutton-Chen potential, and aluminum clusters with up to 30 atoms described within the density functional theory framework. Using unbiased global search our implementation was able to reproduce successfully the great majority of all published results for the systems considered and in many cases with more efficiency than the standard BHMC. We were also able to locate previously unknown global minimum structures for some of the systems considered. This revised BHMC method is a valuable tool for aiding theoretical investigations leading to a better understanding of atomic structures of clusters and nanoparticles. PMID:23957311

A new method for detecting microcalcifications in regions of interest (ROIs) extracted from digitized mammograms is proposed. The top-hat transform is a technique based on mathematical morphology operations and, in this paper, is used to perform contrast enhancement of the mi-crocalcifications. To improve microcalcification detection, a novel image sub-segmentation approach based on the possibilistic fuzzy c-meansalgorithm is used. From the original ROIs, window-based features, such as the mean and standard deviation, were extracted; these features were used as an input vector in a classifier. The classifier is based on an artificial neural network to identify patterns belonging to microcalcifications and healthy tissue. Our results show that the proposed method is a good alternative for automatically detecting microcalcifications, because this stage is an important part of early breast cancer detection.

Quintanilla-Domínguez, Joel; Ojeda-Magaña, Benjamín; Marcano-Cedeño, Alexis; Cortina-Januchs, María G.; Vega-Corona, Antonio; Andina, Diego

A completely automated algorithm for performing many-body interaction energy analysis of clusters (MBAC) [M. J. Elrodt and R. J. Saykally, Chem. Rev. 94, 1975 (1994); S. S. Xantheas, J. Chem. Phys. 104, 8821 (1996)] at restricted Hartree-Fock (RHF)/MA Plesset 2nd order perturbation theory (MP2)/density functional theory (DFT) level of theory is reported. Use of superior guess density matrices (DM's) for smaller fragments generated from DM of the parent system and elimination of energetically insignificant higher-body combinations, leads to a more efficient performance (speed-up up to 2) compared to the conventional procedure. MBAC approach has been tested out on several large-sized weakly bound molecular clusters such as (H(2)O)(n), n=8, 12, 16, 20 and hydrated clusters of amides and aldehydes. The MBAC results indicate that the amides interact more strongly with water than aldehydes in these clusters. It also reconfirms minimization of the basis set superposition error for large cluster on using superior quality basis set. In case of larger weakly bound clusters, the contributions higher than four body are found to be repulsive in nature and smaller in magnitude. The reason for this may be attributed to the increased random orientations of the interacting molecules separated from each other by large distances. PMID:15352794

The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.

Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David

In many applications of remotely-sensed imagery, one of the first steps is partitioning the image into a tractable number of regions. In spectral remote sensing, the goal is often to find regions that are spectrally similar within the region but spectrally distinct from other regions. There is often no requirement that these region be spatially connected. Two goals of this study are to partition a hyperspectral image into groups of spectrally distinct materials, and to partition without human intervention. To this end, this study investigates the use of multi- resolution, multi-dimensional variants of the watershed- clusteringalgorithm on Hyperspectral Digital Imagery Collection Experiment (HYDICE) data. The watershed algorithm looks for clusters in a histogram: a B-dimensional surface where B is the number of bands used (up to 210 for HYDICE). The algorithm is applied to HYDICE data of the Purdue Agronomy Farm, for which ground truth is available. Watershed results are compared to those obtained by using the commonly-available Iterative Self-Organizing Data Analysis Technique (ISODATA) algorithm.

Hemmer, Terrence H.; Jellison, Gerard P.; Wilson, Darryl G.

In order to improve the computation speed of ordered subset expectation maximization (OSEM) algorithm for fully 3-D single photon emission computed tomography (SPECT) reconstruction, an experimental beowulf-type cluster was built and several parallel reconstruction schemes were described. We implemented a single-program-multiple-data (SPMD) parallel 3-D OSEM reconstruction algorithm based on message passing interface (MPI) and tested it with combinations of different number of calculating processors and different size of voxel grid in reconstruction (64×64×64 and 128×128×128). Performance of parallelization was evaluated in terms of the speedup factor and parallel efficiency. This parallel implementation methodology is expected to be helpful to make fully 3-D OSEM algorithms more feasible in clinical SPECT studies. PMID:17282575

Objective To compare the effectiveness of an algorithm for diagnosis of active labour in primiparous women with standard care in terms of maternal and neonatal outcomes. Design Cluster randomised trial. Setting Maternity units in Scotland with at least 800 annual births. Participants 4503 women giving birth for the first time, in 14 maternity units. Seven experimental clusters collected data from a baseline sample of 1029 women and a post-implementation sample of 896 women. The seven control clusters had a baseline sample of 1291 women and a post-implementation sample of 1287 women. Intervention Use of an algorithm by midwives to assist their diagnosis of active labour, compared with standard care. Main outcomes Primary outcome: use of oxytocin for augmentation of labour. Secondary outcomes: medical interventions in labour, admission management, and birth outcome. Results No significant difference was found between groups in percentage use of oxytocin for augmentation of labour (experimental minus control, difference=0.3, 95% confidence interval ?9.2 to 9.8; P=0.9) or in the use of medical interventions in labour. Women in the algorithm group were more likely to be discharged from the labour suite after their first labour assessment (difference=?19.2, ?29.9 to ?8.6; P=0.002) and to have more pre-labour admissions (0.29, 0.04 to 0.55; P=0.03). Conclusions Use of an algorithm to assist midwives with the diagnosis of active labour in primiparous women did not result in a reduction in oxytocin use or in medical intervention in spontaneous labour. Significantly more women in the experimental group were discharged home after their first labour ward assessment. Trial registration Current Controlled Trials ISRCTN00522952.

In this paper we describe a modified classification method destined for extractive summarization purpose. The classification in this method doesn't need a learning corpus; it uses the input text to do that. First, we cluster the document sentences to exploit the diversity of topics, then we use a learning algorithm (here we used Naive Bayes) on each cluster considering it as a class. After obtaining the classification model, we calculate the score of a sentence in each class, using a scoring model derived from classification algorithm. These scores are used, then, to reorder the sentences and extract the first ones as the output summary. We conducted some experiments using a corpus of scientific papers, and we have compared our results to another summarization system called UNIS.1 Also, we experiment the impact of clustering threshold tuning, on the resulted summary, as well as the impact of adding more features to the classifier. We found that this method is interesting, and gives good performance, and the addition of new features (which is simple using this method) can improve summary's accuracy.

Since the operations of sensors in wireless sensor networks (WSNs) mainly rely on battery energy, energy consumption becomes an important issue. In this paper, we use a global simulated annealing genetic algorithm (GSAGA) to create energy efficient clusters for routing in WSNs. The simulation results show that the proposed GSAGA algorithm has higher efficiency and can achieve better network lifetime

Our goal is to develop a robust out-of-core sorting program for a distributed-memory cluster. The literature contains two dominant paradigms for out-of-core sorting algorithms: merging-based and partitioning-based. We explore a third paradigm, that of oblivious algorithms. Unlike the two dom- inant paradigms, oblivious algorithms do not depend on the input keys and therefore lead to prede- termined I\\/O and communication

In this paper, we present a path integral hybrid Monte Carlo (PIHMC) method for rotating molecules in quantum fluids. This is an extension of our PIHMC for correlated Bose fluids [S. Miura and J. Tanaka, J. Chem. Phys. 120, 2160 (2004)] to handle the molecular rotation quantum mechanically. A novel technique referred to be an effective potential of quantum rotation is introduced to incorporate the rotational degree of freedom in the path integral molecular dynamics or hybrid Monte Carlo algorithm. For a permutation move to satisfy Bose statistics, we devise a multilevel Metropolis method combined with a configurational-bias technique for efficiently sampling the permutation and the associated atomic coordinates. Then, we have applied the PIHMC to a helium-4 cluster doped with a carbonyl sulfide molecule. The effects of the quantum rotation on the solvation structure and energetics were examined. Translational and rotational fluctuations of the dopant in the superfluid cluster were also analyzed. PMID:17381207

We have recently introduced [Phys. Rev. E 75, 045102(R) (2007); AIP Conference Proceedings 965, 2007, p. 323] an efficient method for the detection and identification of modules in complex networks, based on the de-synchronization properties (dynamical clustering) of phase oscillators. In this paper we apply the dynamical clustering tecnique to the identification of communities of marine organisms living in the Chesapeake Bay food web. We show that our algorithm is able to perform a very reliable classification of the real communities existing in this ecosystem by using different kinds of dynamical oscillators. We compare also our results with those of other methods for the detection of community structures in complex networks.

Because most galaxies live in groups, and the environment in which it resides affects the evolution of a galaxy, it is crucial to develop tools to understand how galaxies are distributed within groups. At the same time we must understand how groups are distributed and connected in the larger scale structure of the Universe. I have applied a variety of networking techniques to assess the substructure of galaxy groups, including distance matrices, agglomerative hierarchical clusteringalgorithms and dendrograms. We use distance matrices to locate groupings spatially in 3-D. Dendrograms created from agglomerative hierarchical clustering results allow us to quantify connections between galaxies and galaxy groups. The shape of the dendrogram reveals if the group is spatially homogenous or clumpy. These techniques are giving us new insight into the structure and dynamical state of galaxy groups and large scale structure. We specifically apply these techniques to the ALFALFA survey of the Coma-Abell 1367 supercluster and its resident galaxy groups.

More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clusteringalgorithms have been proposed for efficient tag SNP selection.

Within-field variability is a well-known phenomenon and its study is at the centre of precision agriculture (PA). In this paper, site-specific spatial variability (SSSV) of apparent Electrical Conductivity (ECa) and crop yield apart from pH, moisture, temperature and di-electric constant information was analyzed to construct spatial distribution maps. Principal component analysis (PCA) and fuzzy c-means (FCM) clusteringalgorithm were then performed to delineate management zones (MZs). Various performance indices such as Normalized Classification Entropy (NCE) and Fuzzy Performance Index (FPI) were calculated to determine the clustering performance. The geo-referenced sensor data was analyzed for within-field classification. Results revealed that the variables could be aggregated into MZs that characterize spatial variability in soil chemical properties and crop productivity. The resulting classified MZs showed favorable agreement between ECa and crop yield variability pattern. This enables reduction in number of soil analysis needed to create application maps for certain cultivation operations.

Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm.

Color segmentation of infrared thermal images is an important factor in detecting the tumor region. The cancerous tissue with\\u000a angiogenesis and inflammation emits temperature pattern different from the healthy one. In this paper, two color segmentation\\u000a techniques, K-means and fuzzy c-means for color segmentation of infrared (IR) breast images are modeled and compared. Using\\u000a the K-means algorithm in Matlab, some

The recent and continuing construction of multi and hyper spectral imagers will provide detailed data cubes with information in both the spatial and spectral domain. This data shows great promise for remote sensing applications ranging from environmental and agricultural to national security interests. The reduction of this voluminous data to useful intermediate forms is necessary both for downlinking all those bits and for interpreting them. Smart onboard hardware is required, as well as sophisticated earth bound processing. A segmented image (in which the multispectral data in each pixel is classified into one of a small number of categories) is one kind of intermediate form which provides some measure of data compression. Traditional image segmentation algorithms treat pixels independently and cluster the pixels according only to their spectral information. This neglects the implicit spatial information that is available in the image. We will suggest a simple approach; a variant of the standard k-means algorithm which uses both spatial and spectral properties of the image. The segmented image has the property that pixels which are spatially contiguous are more likely to be in the same class than are random pairs of pixels. This property naturally comes at some cost in terms of the compactness of the clusters in the spectral domain, but we have found that the spatial contiguity and spectral compactness properties are nearly orthogonal, which means that we can make considerable improvements in the one with minimal loss in the other.

The characterization and prediction of the structures of metal silicon clusters is important for nanotechnology research because these clusters can be used as building blocks for nano devices, integrated circuits and solar cells. Several authors have postulated that there is a transition between exo to endo absorption of Cu in Sin clusters and showed that for n larger than 9 it is possible to find endohedral clusters. Unfortunately, no global searchers have confirmed this observation, which is based on local optimizations of plausible structures. Here we use parallel Genetic Algorithms (GA), as implemented in our MGAC software, directly coupled with DFT energy calculations to show that the global search of CuSin cluster structures does not find endohedral clusters for n < 8 but finds them for n ? 10.

Ona, Ofelia B.; Ferraro, Marta B.; Facelli, Julio C.

The presented method significantly reduces the time necessary to validate a computed tomographic colonography (CTC) computer aided detection (CAD) algorithm of colonic polyps applied to a large patient database. As the algorithm is being developed on Windows PCs and our target, a Beowulf cluster, is running on Linux PCs, we made the application dual platform compatible using a single source code tree. To maintain, share, and deploy source code, we used CVS (concurrent versions system) software. We built the libraries from their sources for each operating system. Next, we made the CTC CAD algorithm dual-platform compatible and validate that both Windows and Linux produced the same results. Eliminating system dependencies was mostly achieved using the Qt programming library, which encapsulates most of the system dependent functionality in order to present the same interface on either platform. Finally, we wrote scripts to execute the CTC CAD algorithm in parallel. Running hundreds of simultaneous copies of the CTC CAD algorithm on a Beowulf cluster computing network enables execution in less than four hours on our entire collection of over 2400 CT scans, as compared to a month a single PC. As a consequence, our complete patient database can be processed daily, boosting research productivity. Large scale validation of a computer aided polyp detection algorithm for CT colonography using cluster computing significantly improves the round trip time of algorithm improvement and revalidation.

Bitter, Ingmar; Brown, John E.; Brickman, Daniel; Summers, Ronald M.

Fuzzy c-means (FCM) clusteringalgorithm is a popular model widely used in segmentation of magnetic imaging (MRI) data. The conventional FCM does not take into account the spatial information of image and get the unexpected results of segmentation when dealing with some MRI contaminated by noise. Considering the intensities of ideal MRI are piecewise constant, we present an improved model

In cancer biology, it is very important to understand the phenotypic changes of the patients and discover new cancer subtypes. Recently, microarray-based technologies have shed light on this problem based on gene expression profiles which may contain outliers due to either chemical or electrical reasons. These undiscovered subtypes may be heterogeneous with respect to underlying networks or pathways, and are related with only a few of interdependent biomarkers. This motivates a need for the robust gene expression-based methods capable of discovering such subtypes, elucidating the corresponding network structures and identifying cancer related biomarkers. This study proposes a penalized model-based Student's t clustering with unconstrained covariance (PMT-UC) to discover cancer subtypes with cluster-specific networks, taking gene dependencies into account and having robustness against outliers. Meanwhile, biomarker identification and network reconstruction are achieved by imposing an adaptive [Formula: see text] penalty on the means and the inverse scale matrices. The model is fitted via the expectation maximization algorithm utilizing the graphical lasso. Here, a network-based gene selection criterion that identifies biomarkers not as individual genes but as subnetworks is applied. This allows us to implicate low discriminative biomarkers which play a central role in the subnetwork by interconnecting many differentially expressed genes, or have cluster-specific underlying network structures. Experiment results on simulated datasets and one available cancer dataset attest to the effectiveness, robustness of PMT-UC in cancer subtype discovering. Moveover, PMT-UC has the ability to select cancer related biomarkers which have been verified in biochemical or biomedical research and learn the biological significant correlation among genes. PMID:23799085

Highly size-asymmetrical fluid mixtures arise in a variety of physical contexts, notably in suspensions of colloidal particles to which much smaller particles have been added in the form of polymers or nanoparticles. Conventional schemes for simulating models of such systems are hamstrung by the difficulty of relaxing the large species in the presence of the small one. Here we describe how the rejection-free geometrical clusteralgorithm of Liu and Luijten [J. Liu and E. Luijten, Phys. Rev. Lett. 92, 035504 (2004)] can be embedded within a restricted Gibbs ensemble to facilitate efficient and accurate studies of fluid phase behavior of highly size-asymmetrical mixtures. After providing a detailed description of the algorithm, we summarize the bespoke analysis techniques of [Ashton et al., J. Chem. Phys. 132, 074111 (2010)] that permit accurate estimates of coexisting densities and critical-point parameters. We apply our methods to study the liquid-vapor phase diagram of a particular mixture of Lennard-Jones particles having a 10:1 size ratio. As the reservoir volume fraction of small particles is increased in the range of 0%-5%, the critical temperature decreases by approximately 50%, while the critical density drops by some 30%. These trends imply that in our system, adding small particles decreases the net attraction between large particles, a situation that contrasts with hard-sphere mixtures where an attractive depletion force occurs. PMID:21090849

Ashton, Douglas J; Liu, Jiwen; Luijten, Erik; Wilding, Nigel B

From the technological point of view is usually more important to ensure the ability to react promptly to changing environmental conditions than to try to forecast them. Evolution Algorithms were proposed initially to drive the adaptation of complex systems to varying or uncertain environments. In the general setting, the adaptive-anticipatory dilemma reduces itself to the placement of the interaction with the environment in the computational schema. Adaptation consists of the estimation of the proper parameters from present data in order to react to a present environment situation. Anticipation consists of the estimation from present data in order to react to a future environment situation. This duality is expressed in the Evolutionary Computation paradigm by the precise location of the consideration of present data in the computation of the individuals fitness function. In this paper we consider several instances of Evolutionary Algorithms applied to precise problem and perform an experiment that test their response as anticipative and adaptive mechanisms. The non stationary problem considered is that of Non Stationary Clustering, more precisely the adaptive Color Quantization of image sequences. The experiment illustrates our ideas and gives some quantitative results that may support the proposition of the Evolutionary Computation paradigm for other tasks that require the interaction with a Non-Stationary environment.

González, A. I.; Graña, M.; D'Anjou, A.; Torrealdea, F. J.

Background The recent developments in microarray technology has allowed for the simultaneous measurement of gene expression levels. The large amount of captured data challenges conventional statistical tools for analysing and finding inherent correlations between genes and samples. The unsupervised clustering approach is often used, resulting in the development of a wide variety of algorithms. Typical clusteringalgorithms require selecting certain parameters to operate, for instance the number of expected clusters, as well as defining a similarity measure to quantify the distance between data points. The diffraction?based clusteringalgorithm however is designed to overcome this necessity for user?defined parameters, as it is able to automatically search the data for any underlying structure. Methods The diffraction?based clusteringalgorithm presented in this paper is tested using five well?known expression datasets pertaining to cancerous tissue samples. The clustering results are then compared to those results obtained from conventional algorithms such as the k?means, fuzzy c?means, self?organising map, hierarchical clusteringalgorithm, Gaussian mixture model and density?based spatial clustering of applications with noise (DBSCAN). The performance of each algorithm is measured using an average external criterion and an average validity index. Results The diffraction?based clusteringalgorithm is shown to be independent of the number of clusters as the algorithm searches the feature space and requires no form of parameter selection. The results show that the diffraction?based clusteringalgorithm performs significantly better on the real biological datasets compared to the other existing algorithms. Conclusion The results of the diffraction?based clusteringalgorithm presented in this paper suggest that the method can provide researchers with a new tool for successfully analysing microarray data.

For large scale scheduling problems, the iterative decomposition is a feasible approach to reduce the size of problems. A partial clustering based algorithm is proposed for the iterative partial decomposition of large-scale scheduling problems in this paper. The efficiency of the new method is illustrated by numerical computational results on several large-scale production scheduling problems.

The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 106 records and 104 dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.

The segmentation of breast Magnetic Resonance Imaging (MRI) has been a long term challenge due to the fuzzy boundaries among objects, small spots, and irregular object shapes in breast MRI. Even though intensity-based clusteringalgorithms such as K-means clustering and Fuzzy C-meansclustering have been used widely for basic image segmentation, they resulted in complicated patterns for computer aided breast

Donghoon Kang; Sung Y. Shin; Chang Oan Sung; Jung Y. Kim; Jeong-Ki Pack; Hyung Do Choi

In this paper, a heuristic method for determining the optimal number of clusters is proposed. Four clusteringalgorithms, namely K-means, Growing Neural Gas, Simulated Annealing based technique, and Fuzzy C-means in conjunction with three well known cluster validity indices, namely Davies-Bouldin index, Calinski-Harabasz index, Maulik-Bandyopadhyay index, in addition to the proposed index are used. Our simulations evaluate capability of mentioned indices in some artificial and medical datasets. PMID:19163761

Molecular packing, clustering, and docking computations have been performed by empirical intermolecular energy minimization methods. The main focus of this study is finding a robust global search algorithm to solve intermolecular interaction problems, especially to apply an efficient algorithm to large-scale complex molecular systems such as drug-DNA binding or site selectivity which has increasing importance in drug design and drug discovery. Molecular packing in benzene, naphthalene, and anthracene crystals is analyzed in terms of molecular dimer interaction. Intermolecular energies of the gas dimer molecules are calculated for various intermolecular distances and orientations using empirical potential energy functions. The gas dimers are compared to pairs of molecules extracted from the observed crystal structures. Net atomic charges are obtained by the potential-derived method from 6-31G and 6-31G^{**} level ab initio wavefunctions. A new approach using a genetic algorithm is applied to predict structures of benzene, naphthalene, and anthracene molecular clusters. The computer program GAME (genetic algorithm for minimization of energy) has been developed to obtain the global energy minimum of clusters of dimer, trimer, and tetramer molecules. This test model has been further developed to applications of molecular docking. Docking calculations of deoxyguanosine molecules to actinomycin D were performed successfully to identify the binding sites of the drug molecule, which was revealed by actinomycin D-deoxyguanosine complex from the solved x-ray crystal structure. The comparison between the evolutionary computing method and conventional local optimization methods concluded that genetic algorithms are very competitive when it comes to complex, large-scale optimization. Full power of genetic algorithms can be unveiled in computer-assisted drug design only when the difficulties of including optimized molecular conformation in the algorithm are overcome. These problems have been analyzed and some alternative solutions are formulated with the techniques being discussed in detail.

ABSTRACT In web recommender systems, clustering is done oine to ex- tract usage patterns and a successful recommendation,highly depends on the quality of this clustering solution. In these types of applications, data to be clustered is in the form of user sessions which are sequences of web pages visited by the user. Sequence clustering is one of the important tools

Centrosomes are small organelles that organize the mitoticspindle during cell division and are also involved in cell shape andpolarity. Within epithelial tumors, such as breast cancer, and somehematological tumors, centrosome abnormalities (CA) are common, occurearly in disease etiology, and correlate with chromosomal instability anddisease stage. In situ quantification of CA by optical microscopy ishampered by overlap and clustering of these organelles, which appear asfocal structures. CA has been frequently associated with Tp53 status inpremalignant lesions and tumors. Here we describe an approach toaccurately quantify centrosomes in tissue sections and tumors.Considering proliferation and baseline amplification rate the resultingpopulation based ratio of centrosomes per nucleus allow the approximationof the proportion of cells with CA. Using this technique we show that20-30 percent of cells have amplified centrosomes in Tp53 null mammarytumors. Combining fluorescence detection, deconvolution microscopy and amathematical algorithm applied to a maximum intensity projection we showthat this approach is superior to traditional investigator based visualanalysis or threshold-based techniques.

Fleisch, Markus C.; Maxell, Christopher A.; Kuper, Claudia K.; Brown, Erika T.; Parvin, Bahram; Barcellos-Hoff, Mary-Helen; Costes,Sylvain V.

We present the GPU calculation with the common unified device architecture (CUDA) for the Swendsen-Wang multi-clusteralgorithm of two-dimensional classical spin systems. We adjust the two connected component labeling algorithms recently proposed with CUDA for the assignment of the cluster in the Swendsen-Wang algorithm. Starting with the q-state Potts model, we extend our implementation to the system of vector spins, the q-state clock model, with the idea of embedded cluster. We test the performance, and the calculation time on GTX580 is obtained as 2.51 nsec per a spin flip for the q=2 Potts model (Ising model) and 2.42 nsec per a spin flip for the q=6 clock model with the linear size L=4096 at the critical temperature, respectively. The computational speed for the q=2 Potts model on GTX580 is 12.4 times as fast as the calculation speed on a current CPU core. That for the q=6 clock model on GTX580 is 35.6 times as fast as the calculation speed on a current CPU core.

A novel clustering approach named Clustering Objects on Subsets of Attributes (COSA) has been proposed (Friedman and Meulman,\\u000a (2004). Clustering objects on subsets of attributes. J. R. Statist. Soc. B 66, 1–25.) for unsupervised analysis of complex data sets. We demonstrate its usefulness in medical systems biology studies.\\u000a Examples of metabolomics analyses are described as well as the unsupervised clustering

Doris Damian; Matej Oreši?; Elwin Verheij; Jacqueline Meulman; Jerome Friedman; Aram Adourian; Nicole Morel; Age Smilde; Jan van der Greef

Pattern-based clustering is important in many appli- cations, such as DNA micro-array data analysis, auto- matic recommendation systems and target marketing sys- tems. However, pattern-based clustering in large databases is challenging. On the one hand, there can be a huge num- ber of clusters and many of them can be redundant and thus make the pattern-based clustering ineffective. On the

Jian Pei; Xiaoling Zhang; Moonjung Cho; Haixun Wang; Philip S. Yu

This paper presents an approach for Multilingual News Doc- ument Clustering in comparable corpora. We have implemented two al- gorithms of heuristic nature that follow the approach. They use as unique evidence for clustering the identiflcation of cognate named entities be- tween both sides of the comparable corpora. In addition, no information about the right number of clusters has to

. Suppose that is our external criterion and is a clustering result. Let be the number of pairs of objects that are placed in the same class in and in the same cluster in , be the number of pairs of objects in the same class in but not in the same cluster in , be the number of pairs

San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast ClusteringAlgorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

In this paper, a novel image segmentation algorithm is proposed which combines the Dual tree complex wavelet transform (DT-CWT), Multiple kernel fuzzy c-meansclustering (MKFCM) and Adaptive level set method. The Dual tree complex wavelet transform is used for image denoising. Also it extracts high frequency components of image where in wavelets representation of image details is presented in high

A neuro-fuzzy clustering framework has been presented for a meaningful segmentation of Magnetic Resonance medical images.\\u000a MR imaging provides detail soft tissue descriptions of the target body object and it has immense importance in today’s non-invasive\\u000a therapeutic planning and diagnosis methods. The unlabeled image data has been classified using fuzzy c-means approach and\\u000a then the data has been used for

This paper presents a comparative study of the success and performance of the Gaussian mixture modeling and Fuzzy Cmeans methods to determine the volume and cross-sectionals areas of the corpus callosum (CC) using simulated and real MR brain images. The Gaussian mixture model (GMM) utilizes weighted sum of Gaussian distributions by applying statistical decision procedures to define image classes. In the Fuzzy Cmeans (FCM), the image classes are represented by certain membership function according to fuzziness information expressing the distance from the cluster centers. In this study, automatic segmentation for midsagittal section of the CC was achieved from simulated and real brain images. The volume of CC was obtained using sagittal sections areas. To compare the success of the methods, segmentation accuracy, Jaccard similarity and time consuming for segmentation were calculated. The results show that the GMM method resulted by a small margin in more accurate segmentation (midsagittal section segmentation accuracy 98.3% and 97.01% for GMM and FCM); however the FCM method resulted in faster segmentation than GMM. With this study, an accurate and automatic segmentation system that allows opportunity for quantitative comparison to doctors in the planning of treatment and the diagnosis of diseases affecting the size of the CC was developed. This study can be adapted to perform segmentation on other regions of the brain, thus, it can be operated as practical use in the clinic. PMID:23871683

Background Quantitative characterization of the topological characteristics of protein-protein interaction (PPI) networks can enable the elucidation of biological functional modules. Here, we present a novel clustering methodology for PPI networks wherein the biological and topological influence of each protein on other proteins is modeled using the probability distribution that the series of interactions necessary to link a pair of distant proteins in the network occur within a time constant (the occurrence probability). Results CASCADE selects representative nodes for each cluster and iteratively refines clusters based on a combination of the occurrence probability and graph topology between every protein pair. The CASCADE approach is compared to nine competing approaches. The clusters obtained by each technique are compared for enrichment of biological function. CASCADE generates larger clusters and the clusters identified have p-values for biological function that are approximately 1000-fold better than the other methods on the yeast PPI network dataset. An important strength of CASCADE is that the percentage of proteins that are discarded to create clusters is much lower than the other approaches which have an average discard rate of 45% on the yeast protein-protein interaction network. Conclusion CASCADE is effective at detecting biologically relevant clusters of interactions.

We have developed an evolutionary algorithm (EA) for the global minimum search of molecular clusters. The EA is able to discover all the putative global minima of water clusters up to (H(2)O)(20) and benzene clusters up to (C(6)H(6))(30). Then, the EA was applied to search for the global minima structures of (C(6)H(6))(n)(+) with n = 2-20, some of which were theoretically studied for the first time. Our results for n = 2-6 are consistent with previous theoretical work that uses a similar interaction potential. Excluding the very symmetric global minimum structure for n = 9, the growth pattern of (C(6)H(6))(n)(+) with n ? 7 involves the (C(6)H(6))(2)(+) dimer motif, which is placed off-center in the cluster. Such observation indicates that potentials commonly used in the literature for (C(6)H(6))(n)(+) cannot reproduce the icosahedral-type packing suggested by the available experimental data. PMID:21370815

Llanio-Trujillo, J L; Marques, J M C; Pereira, F B

We propose a methodology that incorporates k-means and improved watershed segmentation algorithm for medical image segmentation. The use of the conventional watershed algorithm for medical image analysis is widespread because of its advantages, such as always being able to produce a complete division of the image. However, its drawbacks include over-segmentation and sensitivity to false edges. We address the drawbacks

H. P. Ng; S. H. Ong; K. W. C. Foong; P. S. Goh; W. L. Nowinski

The concept can be used to estimate future resource requirements and to perform call admission decisions in wireless networks. Shadow clusters can be used to decide if a new call can be admitted to a wireless network based on its quality-of-service (QoS) requirements and local traffic conditions. The shadow cluster concept can especially be useful in future wireless networks with

David A. Levine; Ian F. Akyildiz; Mahmoud Naghshineh

\\u000a Attributed to complex and dynamic propagations in underwater acoustic sensors network, the multipath signals has posed great\\u000a challenges in the receiver designing in past decades of years. In this paper, we adopt the UWB channel model in underwater\\u000a networks and suggest a blind non-coherent receiver. Some differentiated features have been developed to represent the multipath\\u000a signals. Then, we formulate the

In this study a modified Live-Wire approach is presented. A Fuzzy C-Means (FCM) clustering procedure has been implemented before the wavelet transform cost map function is defined. This shrinks the area to be searched resulting in a significant reduction of the computational complexity. The method has been employed to computed tomography (CT) and magnetic resonance (MR) studies. The 2D segmentation of lungs, abdominal structures and knee joint has been performed in order to evaluate the method. Significant numerical complexity reduction of the Live-Wire algorithm as well as improvement of the object delineation with a decreased number of user interactions have been obtained. PMID:22483373

\\u000a This paper proposes three algorithms based on the K-means, the simple competitive learning (SCL) algorithm, and the fuzzy\\u000a c-meansalgorithm with differential evolution algorithm for image classification. Due to the local optimal performance of\\u000a these three algorithms, the differential evolution (DE) is integrated with the K-means algorithm, the SCL algorithm and the\\u000a fuzzy c-meansalgorithm to avoid the local optimal

Clustering or data grouping is a key initial procedure in image processing. In present scenario the size of database of companies has increased dramatically, these databases contain large amount of text, image. They need to mine these huge databases and make accurate decisions in short durations in order to gain marketing advantage. As image is a collection of number of

Vinod Kumar Dehariya; Shailendra Kumar Shrivastava; R. C. Jain

Airborne particulate matter is an important component of atmospheric pollution, affecting human health, climate, and visibility. Modern instruments allow single particles to be analyzed one-by-one in real time, and offer the promise of determining the sources of individual particles based on their mass spectral signatures. The large number of particles to be apportioned makes clustering a necessary step. The goal

Much technology assessment and organization design data exists in Microsoft Excel(Trade Name) spreadsheets. Tools are needed to put this data into a form that can be used by design managers to make design decisions. One need is to cluster data that is hig...

Wireless sensor networks (WSNs) usually consist of small cheap sensors and they are extensively used for many applications including environment monitoring. In this paper, we focus on the problem that node density in a randomly deployed WSN differs region by region. In other words, sensor nodes are deployed non-uniformly over the network. Such a non-uniform topology makes clustering protocols less

\\u000a Contemporary router\\/switch technology for high-perfor- mance local\\/system area networks (LANs\\/SANs) should provide the capacity\\u000a to fit the high bandwidth and timing requirements demanded by current applications. The MultiMedia Router (MMR) aims at offering\\u000a hardware-based QoS support within a compact interconnection component. One of the key elements in the MMR architecture is\\u000a the link scheduling algorithm. This algorithm must solve conflicts

José Manuel Claver; María Del Carmen Carrión; Manel Canseco; María Blanca Caminero; Francisco J. Quiles

A new method of fault diagnosis based on feature weighted FCM is presented. Feature-weight assigned to a feature indicates the importance of the feature. This paper shows that an appropriate assignment of feature-weight can improve the performance of fuzzy c-meansclustering. Feature evaluation based on class separability criterion is discussed in this paper. Experiment shows that the algorithm is able

In this study, we investigate the application of the fuzzy clustering to the anatomical localization and quantitation of brain lesions in Positron Emission Tomography (PET) images. The method is based on the Fuzzy C-Means (FCM) algorithm. The algorithm segments the PET image data points into a given number of clusters. Each cluster is an homogeneous region of the brain (e.g. tumor). A feature vector is assigned to a cluster which has the highest membership degree. Having the label affected by the FCM algorithm to a cluster, one may easily compute the corresponding spatial localization, area and perimeter. Studies concerning the evolution of a tumor after different treatments in two patients are presented. PMID:8891420

This study explores the applicability of data-driven clustering analysis in predicting vegetation distribution over two continents where water is an important controlling factor for vegetation growth, South America and Africa, and compares the ability of clustering analysis with that of a physically based dynamic vegetation model to predict vegetation distribution. A clustering analysis algorithm based on the genetic-algorithm-based K-means is tested, with the number of clusters determined a priori according to the primary plant functional types observed to exist in the study domain. The most important variables upon which the clustering analysis is based include available water, its seasonality, and evaporative demand. The dynamic vegetation model used is the Community Land Model version 3 coupled with a Dynamic Global Vegetation Model (CLM3-DGVM) with modifications targeted to address some known biases of the model. Results from both the clustering analysis and the modified CLM3-DGVM are compared against observations derived from the Moderate Resolution Imaging Spectroradiometer (MODIS). Both methods reasonably reproduced the general pattern of dominant plant functional type distribution. There is no clear winner between the two methods, as the DGVM outperforms the clustering analysis approach in some aspects and is outperformed in others. It is therefore suggested that clustering analysis can be a useful tool in biogeography estimation, although it cannot be used in mechanistic studies as the process-based DGVMs are.

This paper investigates a new approach for Single Document Sum- marization based on a Machine Learning ranking algorithm. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most

Massih-reza Amini; Nicolas Usunier; Patrick Gallinari

This paper discusses the improvement of premature convergence in genetic algorithm (GA) used for optimizing multimodal numerical problems. Mutation is the principle operation in GA for enhancing the degree of population diversity, but is not efficient often, particularly in traditional GA. Moreover, the definition of mutation rate is a tradeoff between computing time and accuracy. In our work, we introduce

Tsung-ying Sun; Chan-cheng Liu; Sheng-ta Hsieh; Chun-ling Lin; Kan-yuan Lee

Divisible load scenarios occur in modern media server applications since most multimedia applications typically require access to continuous and discrete data. A high performance Continuous Media (CM) server greatly depends on the ability of its disk IO subsystem to serve both types of workloads efficiently. Disk scheduling algorithms for mixed media workloads, although they play a central role in this

Elias Balafoutis; Michael Paterakis; Peter Triantafillou; Guido Nerjes; Peter Muth; Gerhard Weikum

In this paper the authors present a new unsupervised fuzzy algorithm for vessel tracking that is applied to the detection of the ocular fundus vessels. The proposed method overcomes the problems of initialization and vessel profile modeling that are encountered in the literature and automatically tracks fundus vessels using linguistic descriptions like \\

A new unsupervised algorithm for image segmentation is proposed using an inhomogeneous Markov random field (MRF) model, in which the parameter is estimated in fuzzy spel affinities. The proposed algorithm improved the accuracy of segmentation. Simulated brain MR image with different noise levels and clinical brain MR image were presented in the experiments. The results showed that the proposed algorithm was more powerful than conventional homogeneous MRF model-based ones and than the fuzzy c-meansclusteringalgorithm as well. PMID:18024280

We propose a single channel two-stage time-segment discriminator of uterine magnetomyogram (MMG) contractions during pregnancy. We assume that the preprocessed signals are piecewise stationary having distribution in a common family with a fixed number of parameters. Therefore, at the first stage, we propose a model-based segmentation procedure, which detects multiple change-points in the parameters of a piecewise constant time-varying autoregressive model using a robust formulation of the Schwarz information criterion (SIC) and a binary search approach. In particular, we propose a test statistic that depends on the SIC, derive its asymptotic distribution, and obtain closed-form optimal detection thresholds in the sense of the Neyman-Pearson criterion; therefore, we control the probability of false alarm and maximize the probability of change-point detection in each stage of the binary search algorithm. We compute and evaluate the relative energy variation [root mean squares (RMS)] and the dominant frequency component [first order zero crossing (FOZC)] in discriminating between time segments with and without contractions. The former consistently detects a time segment with contractions. Thus, at the second stage, we apply a nonsupervised K-means clusteralgorithm to classify the detected time segments using the RMS values. We apply our detection algorithm to real MMG records obtained from ten patients admitted to the hospital for contractions with gestational ages between 31 and 40 weeks. We evaluate the performance of our detection algorithm in computing the detection and false alarm rate, respectively, using as a reference the patients' feedback. We also analyze the fusion of the decision signals from all the sensors as in the parallel distributed detection approach. PMID:18269980

Fuzzy support vector machines based on fuzzy c-meansclustering are proposed in this paper. They apply the fuzzy c-meansclustering technique to each class of the training set. During the clustering with a suitable fuzziness parameter q, the more important samples, such as support vectors, become the cluster centers respectively. All the cluster centers generated by fuzzy c-meansclustering are

Remote sensing image segmentation is the basis of image pattern recognition. It is significant for the application and analysis of remote sensing images. Clustering analysis as a non-supervised learning method is widely used in the segmentation of remote sensing images. It has made good results in the segmentation of low-resolution and moderate-resolution remote sensing images. As the improvement of image

Liu Rongjie; Zhang Jie; Song Pingjian; Shao Fengjing; Liu Guanfeng

Developing sensor networks enable us to gather information about specified regions or tasks. Sensing nodes have several characteristics\\u000a to have constrained energy and limited capacity. So, minimizing energy consumption and maximizing system lifetime have been\\u000a a major issue for wireless sensor networks. What is unique about our proposed agent approach is that the agent has a learning\\u000a capability using cluster

|From his experience in using a "clustering" arrangement for undergraduate classes of 25-50 students, the author describes a synergistic technique ("an active group exchange of three or more people involved in a particular objective"), listing guidelines and offering examples of material adaptable to it (questions, problems and topics, papers and…

Purpose To prevent low bone mineral density (BMD), that is, osteoporosis, in postmenopausal women, it is essential to diagnose osteoporosis more precisely. This study presented an automatic approach utilizing a histogram-based automatic clustering (HAC) algorithm with a support vector machine (SVM) to analyse dental panoramic radiographs (DPRs) and thus improve diagnostic accuracy by identifying postmenopausal women with low BMD or osteoporosis. Materials and Methods We integrated our newly-proposed histogram-based automatic clustering (HAC) algorithm with our previously-designed computer-aided diagnosis system. The extracted moment-based features (mean, variance, skewness, and kurtosis) of the mandibular cortical width for the radial basis function (RBF) SVM classifier were employed. We also compared the diagnostic efficacy of the SVM model with the back propagation (BP) neural network model. In this study, DPRs and BMD measurements of 100 postmenopausal women patients (aged >50 years), with no previous record of osteoporosis, were randomly selected for inclusion. Results The accuracy, sensitivity, and specificity of the BMD measurements using our HAC-SVM model to identify women with low BMD were 93.0% (88.0%-98.0%), 95.8% (91.9%-99.7%) and 86.6% (79.9%-93.3%), respectively, at the lumbar spine; and 89.0% (82.9%-95.1%), 96.0% (92.2%-99.8%) and 84.0% (76.8%-91.2%), respectively, at the femoral neck. Conclusion Our experimental results predict that the proposed HAC-SVM model combination applied on DPRs could be useful to assist dentists in early diagnosis and help to reduce the morbidity and mortality associated with low BMD and osteoporosis.

The role of electrocardiogram (ECG) as a noninvasive technique for detecting and diagnosing cardiac problems cannot be overemphasized. This paper introduces a fuzzy C-mean (FCM) clustered probabilistic neural network (PNN) for the discrimination of eight types of ECG beats. The performance has been compared with FCM clustered multi layered feed forward network (MLFFN) trained with back propagation algorithm. Important parameters are extracted from each ECG beat and feature reduction has been carried out using FCM clustering. The cluster centers form the input of neural network classifiers. The extensive analysis using the MIT-BIH arrhythmia database has shown an average classification accuracy of 97.54% with FCM clustered MLFFN and 99.58% with FCM clustered PNN. Fuzzy clustering improves the classification speed as well. The result reveals the capability of the FCM clustered PNN in the computer-aided diagnosis of ECG abnormalities. PMID:20703571

Haseena, Hassan Hamsa; Mathew, Abraham T; Paul, Joseph K

In this paper, a novel local manifold spectral clustering with fuzzy c-means (FCM) data condensation is presented. Firstly, a multilayer FCM data condensation method is performed on the original data to contain a condensation subset. Secondly, the spectral clusteringalgorithm based on the local manifold distance measure is used to realize the classification of the condensation subset. Finally, the nearest neighbor method is adopted to obtain the clustering result of the original data. Compared with the standard spectral clusteringalgorithm, the novel method is more robust and has the advantages of effectively dealing with the large scale data. In our experiments, we first analyze the performances of multilayer FCM data condensation and local manifold distance measure, then apply our method to solve image segmentation and the large Brodatz texture images classification. The experimental results show that the method is effective and extensible, and especially the runtime of this method is acceptable.

In many practical applications of clustering, the objects to be clustered evolve over time, and a clustering result is desired at each time step. In such applications, evolutionary clustering typically outperforms ordinary static clustering by producing clustering results that reflect long-term trends while being robust to short-term variations. Several evolutionary clusteringalgorithms have recently been proposed, often by adding a

The suitability of seven thresholding methods (six algorithms: isodata, Otsu, minimum error, moment-preserving, Pun and fuzzy; and a manual method) to consistently segment bread crumb images was investigated in comparison with the previously reported k-means clustering technique. Thresholding performance was assessed by two criteria: uniformity and busyness of the binary images. Crumb features (cell density, mean cell area, cell uniformity

The conjugate residual with optimal trial vectors (CROP) algorithm is developed. In this algorithm, the optimal trial vectors of the iterations are used as basis vectors in the iterative subspace. For linear equations and nonlinear equations with a small-to-medium nonlinearity, the iterative subspace may be truncated to a three-dimensional subspace with no or little loss of convergence rate, and the norm of the residual decreases in each iteration. The efficiency of the algorithm is demonstrated by solving the equations of coupled-cluster theory with single and double excitations in the atomic orbital basis. By performing calculations on H2O with various bond lengths, the algorithm is tested for varying degrees of nonlinearity. In general, the CROP algorithm with a three-dimensional subspace exhibits fast and stable convergence and outperforms the standard direct inversion in iterative subspace method.

For the difficult task of finding global minimum energy structures for molecular clusters of nontrivial size, we present a highly efficient parallel implementation of an evolutionary algorithm. By completely abandoning the traditional concept of generations and by replacing it with a less rigid pool concept, we have managed to eliminate serial bottlenecks completely and can operate the algorithm efficiently on an arbitrary number of parallel processes. Nevertheless, our new algorithm still realizes all of the main features of our old, successful implementation. First tests of the new algorithm are shown for the highly demanding problem of water clusters modeled by a potential with flexible, polarizable monomers (TTM2-F). For this problem, our new algorithm not only reproduces all of the global minima proposed previously in considerably less CPU time but also leads to improved proposals in several cases. These, in turn, qualitatively change our earlier predictions concerning the transitions from all-surface structures to cages with a single interior molecule, and from one to two interior molecules. Furthermore, we compare preliminary results up to n = 105 with locally optimized cuts from several ice modifications. This comparison indicates that relaxed ice structures may start to be competitive already at cluster sizes above n = 90. PMID:16640376

Facility location problem is one of most frequent problems, which is encountered while deciding facility places such as factories, warehouses. There are various techniques developed to solve facility location problems. Fuzzy c-means method is one of the most usable techniques between them. In this study, optimum warehouse location for natural stone mines is found by using fuzzy c-means method.

We present a novel approach to the detection of special nuclear material using cosmic rays. Muon Scattering Tomography (MST) is a method for using cosmic muons to scan cargo containers and vehicles for special nuclear material. Cosmic muons are abundant, highly penetrating, not harmful for organic tissue, cannot be screened against, and can easily be detected, which makes them highly suited to the use of cargo scanning. Muons undergo multiple Coulomb scattering when passing through material, and the amount of scattering is roughly proportional to the square of the atomic number Z of the material. By reconstructing incoming and outgoing tracks, we can obtain variables to identify high-Z material. In a real life application, this has to happen on a timescale of 1 min and thus with small numbers of muons. We have built a detector system using resistive plate chambers (RPCs): 12 layers of RPCs allow for the readout of 6 x and 6 y positions, by which we can reconstruct incoming and outgoing tracks. In this work we detail the performance of an algorithm by which we separate high-Z targets from low-Z background, both for real data from our prototype setup and for MC simulation of a cargo container-sized setup. (c) British Crown Owned Copyright 2013/AWE

Thomay, C.; Velthuis, J. J.; Baesso, P.; Cussans, D.; Morris, P. A. W.; Steer, C.; Burns, J.; Quillin, S.; Stapleton, M.

The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ?-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.

Al(4)(2-) was the first discovered ? + ? aromatic all-metal cluster. In the present work we analyze the molecular structure, relative stability, and aromaticity of lowest-lying isomers of related M(2)N(2)(2-) (M and N = B, Al, and Ga) clusters, with special emphasis devoted to the cis (C(2v)) and trans (D(2h)) isomers of the M(2)N(2)(2-) clusters. For such purpose, we start by performing the search of the global minimum for each cluster through the Gradient Embedded Genetic Algorithm (GEGA). Energy decomposition analyses and the calculated magnetic- and electronic-based aromaticity criteria of the lowest-lying isomers help to understand the nature of the bonding and the origin of the stability of the global minima. Such methodology should allow guiding future molecular design strategies. PMID:22990879

Deployment of wireless sensor networks (WSNs) has drawn much attention in recent years. Given the limited energy for sensor nodes, it is critical to implement WSNs with energy efficiency designs. Sensing coverage in networks, on the other hand, may degrade gradually over time after WSNs are activated. For mission-critical applications, therefore, energy-efficient coverage control should be taken into consideration to support the quality of service (QoS) of WSNs. Usually, coverage-controlling strategies present some challenging problems: (1) resolving the conflicts while determining which nodes should be turned off to conserve energy; (2) designing an optimal wake-up scheme that avoids awakening more nodes than necessary. In this paper, we implement an energy-efficient coverage control in cluster-based WSNs using a Memetic Algorithm (MA)-based approach, entitled CoCMA, to resolve the challenging problems. The CoCMA contains two optimization strategies: a MA-based schedule for sensor nodes and a wake-up scheme, which are responsible to prolong the network lifetime while maintaining coverage preservation. The MA-based schedule is applied to a given WSN to avoid unnecessary energy consumption caused by the redundant nodes. During the network operation, the wake-up scheme awakens sleeping sensor nodes to recover coverage hole caused by dead nodes. The performance evaluation of the proposed CoCMA was conducted on a cluster-based WSN (CWSN) under either a random or a uniform deployment of sensor nodes. Simulation results show that the performance yielded by the combination of MA and wake-up scheme is better than that in some existing approaches. Furthermore, CoCMA is able to activate fewer sensor nodes to monitor the required sensing area.

An integrated approach for multi-spectral segmentation of MR images is presented. This method is based on the fuzzy c-means (FCM) and includes bias field correction and contextual constraints over spatial intensity distribution and accounts for the non-spherical cluster’s shape in the feature space. The bias field is modeled as a linear combination of smooth polynomial basis functions for fast computation in the clustering iterations. Regularization terms for the neighborhood continuity of intensity are added into the FCM cost functions. To reduce the computational complexity, the contextual regularizations are separated from the clustering iterations. Since the feature space is not isotropic, distance measure adopted in Gustafson-Kessel (G-K) algorithm is used instead of the Euclidean distance, to account for the nonspherical shape of the clusters in the feature space. These algorithms are quantitatively evaluated on MR brain images using the similarity measures.

Datta, Sushmita; Sajja, Balasrinivasa Rao; Narayana, Ponnada A.

\\u000a The paper presents k-means based hybrid segmentation method for breast cancer diagnosis problem. It is part of the computer\\u000a system to support diagnosis based on microscope images of the fine needle biopsy. The system assumes distinguishing malignant\\u000a from benign cases. Described method is an alternative to the previously presented algorithms based on fuzzy c-meansclustering\\u000a and competitive neural networks. However,

Spectral clusteringalgorithm has been shown to be more effective in finding clusters than some traditional algorithms such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense

Wen-Yen Chen; Yangqiu Song; Hongjie Bai; Chih-Jen Lin; Edward Y. Chang

A genetic algorithm has been employed, in combination with the first-principles Pacheco-Ramalho intermolecular potential, not only including its two-body part but also its dominant three-body term, to predict the magic number structure of neutral (C60)N clusters up to N=25. The calculated second finite difference of the total energy shows that the (C60)N clusters with N=7, 13, 18, and 22 are particularly stable. Present results indicate that the three-body effect is not significant for (C60)N clusters with N<=17, but above N=17 we must consider the role played by the three-body dispersion term. It is found that N=17 is a crossover from icosahedral to decahedral, and for N>=23 (inclusion of the three-body term) icosahedral structures are always lower in energy than decahedra.

This report consists of three separate but related reports. They are (1) Human Resource Development, (2) Carbon-based Structural Materials Research Cluster, and (3) Data Parallel Algorithms for Scientific Computing. To meet the objectives of the Human Res...

|Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clusteringalgorithms; and examples to show how the algorithms work.…

Software clusteringalgorithms presented in the literature rarely incorporate in the clustering process dynamic information, such as the number of function invocations during runtime. Moreover, the structure of a software system is often multi-layered, while existing clusteringalgorithms often create flat system decompositions. This paper presents a software clusteringalgorithm called MULICsoft that incorporates in the clustering process both static

Bill Andreopoulos; Aijun An; Vassilios Tzerpos; Xiaogang Wang

Image segmentation is an important task in many applications. For large-scale, general image dataset, however, there are the competing requirements, including not making complex prior assumptions about the scene, having fast speed and good segmentation quality. In this paper, a hybrid approach for image segmentation is presented that incorporates two famous methods, i.e. Mean Shift (MS) and Fuzzy C-Means (FCM).

The need for personal transportation must be harmonized by considering the impact of so huge number of vehicles on the environment. The adoption of hybrid electric vehicles can provide a sensible improvement from an environmental viewpoint, but at the same time makes more difficult the definition and implementation of the overall powertrain control mechanism. In fact, powertrain control problems are

Most of the features of the future hybrid electric vehicles are enabled by a new energy flow management unit designed to split the instantaneous power demand between the internal combustion engine and the electric motor, ensuring both an efficient power supply and a reduced emission. Classic approaches that rely on static thresholds, optimized on a fixed drive cycle, cannot face

In order to further improve the performance of tumor clustering from bio-molecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from bio-molecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks, named as HFCEF-I, HFCEF-II, HFCEF-III and HFCEF-IV respectively, to identify samples which belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension, and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. PMID:23689925

Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le

We consider the problem of clustering two-dimensional as- sociation rules in large databases. We present a geometric- based algorithm, BitOp, for performing the clustering, em- bedded within an association rule clustering system, ARCS. Association rule clustering is useful when the user desires to segment the data. We measure the quality of the segment- ation generated by ARCS using the Minimum

The clustering of matter on cosmological scales is an essential probe for studying the physical origin and composition of our Universe. To date, most of the direct studies have focused on shear-shear weak lensing correlations, but it is also possible to extract the dark matter clustering by combining galaxy-clustering and galaxy-galaxy-lensing measurements. In order to extract the required information, one must relate the observable galaxy distribution to the underlying dark matter distribution. In this study we develop in detail a method that can constrain the dark matter correlation function from galaxy clustering and galaxy-galaxy-lensing measurements, by focusing on the correlation coefficient between the galaxy and matter overdensity fields. Our goal is to develop an estimator that maximally correlates the two. To generate a mock galaxy catalogue for testing purposes, we use the halo occupation distribution approach applied to a large ensemble of N-body simulations to model preexisting SDSS luminous red galaxy sample observations. Using this mock catalogue, we show that a direct comparison between the excess surface mass density measured by lensing and its corresponding galaxy clustering quantity is not optimal. We develop a new statistic that suppresses the small-scale contributions to these observations and show that this new statistic leads to a cross-correlation coefficient that is within a few percent of unity down to 5h{sup -1} Mpc. Furthermore, the residual incoherence between the galaxy and matter fields can be explained using a theoretical model for scale-dependent galaxy bias, giving us a final estimator that is unbiased to within 1%, so that we can reconstruct the dark matter clustering power spectrum at this accuracy up to k{approx}1h Mpc{sup -1}. We also perform a comprehensive study of other physical effects that can affect the analysis, such as redshift space distortions and differences in radial windows between galaxy clustering and weak lensing observations. We apply the method to a range of cosmological models and explicitly show the viability of our new statistic to distinguish between cosmological models.

Baldauf, Tobias; Smith, Robert E. [Institute for Theoretical Physics, University of Zurich, Zurich (Switzerland); Seljak, Uros [Institute for Theoretical Physics, University of Zurich, Zurich (Switzerland); Physics Department, Astronomy Department and Lawrence Berkeley National Laboratory, University of California, Berkeley, California (United States); Ewha University, 11-1 Daehyun-Dong Seodaemun-Gu Seoul 120-750 (Korea, Republic of); Mandelbaum, Rachel [Department of Astrophysical Sciences, Princeton University, Peyton Hall, Princeton, New Jersey (United States)

This report aims to give a brief overview of the current state of document clustering research and present recent developments in a well-organized manner. Clusteringalgorithms are considered with two hypothetical scenarios in mind: online query clustering with tight eciency constraints, and oine clustering with an emphasis on accuracy. A comparative analysis of the algorithms is performed along with a

In this paper algorithms for modelling dy- namic reconfiguration of ad-hoc wireless networks with movable base stations in presence of obstacles are pro- posed. This problem is considerably harder than the one treated in (1) where base stations were organized in a starred network and no obstacles were present. In these models nodes communicate through a clusterehead gateway switching routing

Giuseppe Pigola; Alfredo Pulvirenti; Viale A. Dori

This work studies the build-up method for the global minimization problem for molecular conformation, especially protein folding. The problem is hard to solve for large molecules using general minimization approaches because of the enormous amount of required computation. We therefore propose a build-up process to systematically “construct” the optimal molecular structures. A prototype algorithm is designed using the anisotropic effective

Microcalcification clusters in mammograms is the significant early sign of breast cancer. Individual clusters are difficult to detect and hence an automatic computer aided mechanism will help the radiologist in detecting the microcalcification clusters in an easy and efficient way. This paper presents a new classification approach for detection of microcalcification in digital mammogram using particle swarm optimization algorithm (PSO) based clustering technique. Fuzzy C-meansclustering technique, well defined for clustering data sets are used in combination with the PSO. We adopt the particle swarm optimization to search the cluster center in the arbitrary data set automatically. PSO can search the best solution from the probability option of the Social-only model and Cognition-only model. This method is quite simple and valid, and it can avoid the minimum local value. The proposed classification approach is applied to a database of 322 dense mammographic images, originating from the MIAS database. Results shows that the proposed PSO-FCM approach gives better detection performance compared to conventional approaches.

Software clusteringalgorithms presented in the literature rarely incorporate in the clustering process dynamic information, such as the number of function invocations during runtime. Moreover, the structure of a software system is often multi-layered, while existing clus- tering algorithms often create flat system decompositions. This paper presents a software clusteringalgorithm called MULICsoft that incorporates in the clustering process both

Bill Andreopoulos; An Aijun; Vassilios Tzerpos; Xiaogang Wang

Employing genetic algorithm incorporated with density functional theory calculations we determined the lowest-energy structures of cationic Na n + clusters ( n = 9, 15, 21, 26, 31, 36, 41, 50 and 59). We revealed a transition of growth pattern from "polyicosahedral" sequence to the Mackay icosahedral motif at around n = 40. Based on the ground-state structures the size dependent electronic properties of Na n + clusters including the binding energies, HOMO-LUMO gaps, electron density of states and photoabsorption spectra were discussed. As cluster size increases, the HOMO-LUMO gap of Na n + cluster gradually reduces and converges to metallic behavior of bulk crystal rapidly. The photoabsorption spectra of Na n + clusters from our calculations agree with experimental data rather well, confirming the reliability of our theoretical approaches.

Image segmentation is an essential processing step for many image analysis applications. In this paper, a novel image segmentation algorithm using fuzzy C-meansclustering (FCM) with spatial constraints based on Markov random field (MRF) via Bayesian theory is proposed. Due to disregard of spatial constraint information, the FCM algorithm fails to segment images corrupted by noise. In order to improve the robustness of FCM to noise, a powerful model for the membership functions that incorporates local correlation is given by MRF defined through a Gibbs function. Then spatial information is incorporated into the FCM by Bayesian theory. Therefore, the proposed algorithm has both the advantages of the FCM and MRF, and is robust to noise. Experimental results on the synthetic and real-world images are given to demonstrate the robustness and validity of the proposed algorithm.

We show that for any data set in any metric space, it is possible to construct a hierarchical clustering with the guarantee that for every k, the induced k-clustering has cost at most eight times that of the optimal k-clustering. Here the cost of a clustering is taken to be the maximum radius of its clusters. Our algorithm is similar

We show that for any data set in any metric space, it is possible to construct a hierarchical clustering with the guarantee that for every k, the induced k-clustering has cost at most eight times that of the optimal k-clustering. Here the cost of a clustering is taken to be the maximum radius of its clusters. Our algorithm is similar

We present a Bayesian clusteringalgorithm for multivariate time series. A clustering is regarded as a probabilistic model in which the unknown auto-correlation structure of a time se- ries is approximated by a first order Markov Chain and the overall joint distribution of the variables is simplified by con- ditional independence assumptions. The algorithm searches for the most probable set

Background Life processes are determined by the organism's genetic profile and multiple environmental variables. However the interaction between these factors is inherently non-linear [1]. Microarray data is one representation of the nonlinear interactions among genes and genes and environmental factors. Still most microarray studies use linear methods for the interpretation of nonlinear data. In this study, we apply Isomap, a nonlinear method of dimensionality reduction, to analyze three independent large Affymetrix high-density oligonucleotide microarray data sets. Results Isomap discovered low-dimensional structures embedded in the Affymetrix microarray data sets. These structures correspond to and help to interpret biological phenomena present in the data. This analysis provides examples of temporal, spatial, and functional processes revealed by the Isomap algorithm. In a spinal cord injury data set, Isomap discovers the three main modalities of the experiment – location and severity of the injury and the time elapsed after the injury. In a multiple tissue data set, Isomap discovers a low-dimensional structure that corresponds to anatomical locations of the source tissues. This model is capable of describing low- and high-resolution differences in the same model, such as kidney-vs.-brain and differences between the nuclei of the amygdala, respectively. In a high-throughput drug screening data set, Isomap discovers the monocytic and granulocytic differentiation of myeloid cells and maps several chemical compounds on the two-dimensional model. Conclusion Visualization of Isomap models provides useful tools for exploratory analysis of microarray data sets. In most instances, Isomap models explain more of the variance present in the microarray data than PCA or MDS. Finally, Isomap is a promising new algorithm for class discovery and class prediction in high-density oligonucleotide data sets.

Pattern recognition is the science of data structure and its classification. There are many classification and clustering methods prevalent in pattern recognition area. In this research, rainfall data in a region in Northern Iran are classified with natural breaks classification method and with a revised fuzzy c-means (FCM) algorithm as a clustering approach. To compare these two methods, the results of the FCM method are hardened. Comparison proved overall coincidence of natural breaks classification and FCM clustering methods. The differences arise from nature of these two methods. In the FCM, the boundaries between adjacent clusters are not sharp while they are abrupt in natural breaks method. The sensitivity of both methods with respect to rain gauge density was also analyzed. For each rain gauge density, percentage of boundary region and hardening error are at a minimum in the first cluster while the second cluster has the maximum error. Moreover, the number of clusters was sensitive to the number of stations. Since the optimum number of classes is not apparent in the classification methods and the boundary between adjacent classes is abrupt, use of clustering methods such as the FCM method, overcome such deficiencies. The methods were also applied for mapping an aridity index in the study region where the results revealed good coincidence between the FCM clustering and natural breaks classification methods.

Wrist pulse signal contains important information about the health status of a person and pulse signal diagnosis has been employed in oriental medicine for thousands of years. In this research, a systematic approach is proposed to analyze the computerized wrist pulse signals, with the focus placed on the feature extraction and pattern classification. The wrist pulse signals are first collected and pre-processed. Considering that a typical pulse signal is composed of periodically systolic and diastolic waves, a modified Gaussian model is adopted to fit the pulse signal and the modeling parameters are then taken as features. Consequently, a feature selection scheme is proposed to eliminate the tightly correlated features and select the disease-sensitive ones. Finally, the selected features are fed to a Fuzzy C-Means (FCM) classifier for pattern classification. The proposed approach is tested on a dataset which includes pulse signals from 100 healthy persons and 88 patients. The results demonstrate the effectiveness of the proposed approach in computerized wrist pulse diagnosis. PMID:19747872

It is shown that a particular case of the Bayesian Ying-Yang learning system and theory reduces to the maximum likelihood learning of a finite mixture, from which we have obtained not only the EM algorithm for its parameter estimation and its various approximate but fast algorithms for clustering in general cases including Mahalanobis distance clustering or . elliptic clustering ,

The aim of this work is to compare the efficiency and power of several cluster analysis techniques on fully artificial (mathematical) and synthesized (hybrid) functional magnetic resonance imaging (fMRI) data sets. The clusteringalgorithms used are hierarchical, crisp (neural gas, self-organizing maps, hard competitive learning, k-means, maximin-distance, CLARA) and fuzzy (c-means, fuzzy competitive learning). To compare these methods we use two performance measures, namely the correlation coefficient and the weighted Jaccard coefficient (wJC). Both performance coefficients (PCs) clearly show that the neural gas and the k-means algorithm perform significantly better than all the other methods using our setup. For the hierarchical methods the ward linkage algorithm performs best under our simulation design. In conclusion, the neural gas method seems to be the best choice for fMRI cluster analysis, given its correct classification of activated pixels (true positives (TPs)) whilst minimizing the misclassification of inactivated pixels (false positives (FPs)), and in the stability of the results achieved. PMID:15182847

The aim of this study is to develop a novel fuzzy clustering neural network (FCNN) algorithm as pattern classifiers for real-time odor recognition system. In this type of FCNN, the input neurons activations are derived through fuzzy cmeanclustering of the input data, so that the neural system could deal with the statistics of the measurement error directly. Then the performance of FCNN network is compared with the other network which is well-known algorithm, named multilayer perceptron (MLP), for the same odor recognition system. Experimental results show that both FCNN and MLP provided high recognition probability in determining various learn categories of odors, however, the FCNN neural system has better ability to recognize odors more than the MLP network.

In this study, we use the L-moments and fuzzy-clustering techniques to analyze dust storm frequencies in Iran. A homogeneity test based on H statistics is first carried out using the dust-storm-frequency time series at 122 weather stations. The test shows that dust storms over the study area as a whole do not have the same probabilistic behavior. To identify homogeneous regions within the study area, a fuzzy c-meansalgorithm based on the L-moments of the dust-storm-frequency time series is employed. By use of the cluster validation index, partition coefficient and partition entropy, four clusters are identified, i.e., the first Zagros east cluster (Cluster 1A) and the second Zagros east cluster (Cluster 1B), related respectively to the Dashte Kevir and Dashte-Lout dust source regions, the Zagros west cluster (Cluster 2) and the Iran northwest cluster (Cluster 3). Based on the goodness-of-fit test, ZDist, the best regional distribution models for Clusters 1A, 1B, 2 and 3 are found to be Pearson III, generalized normal, generalized Pareto and generalized normal distributions, respectively. The different types of the distributions suggest that the dust storms in the different cluster regions are due to different generation mechanisms and are associated with different dust sources. The regional growth curves are then constructed using the regional distribution models. The sharp slope of the growth curve for Cluster 2 and 3 suggests that the dust storms in the northwestern and western parts of Iran are mainly due to dust transport from the Sahara, Rub al Khali and Arabian deserts.

|Discussion of hypermedia systems focuses on a comparison of two types of adaptive algorithm (genetic algorithm and neural network) in clustering hypermedia documents. These clusters allow the user to index into the nodes to find needed information more quickly, since clustering is "personalized" based on the user's paths rather than representing…

In this paper, we address the problem of entropy-based clustering in the framework of possibility theory. First, we introduce the possibilistic entropy with brief discussion. Then we develop the possibilistic entropy theory for clustering analysis and investigate the general Possibilistic Entropy Clustering (PEC) problems, based on which a Fully Unsupervised Possibilistic Entropy Clustering (FUPEC) algorithm is elaborated in detail with

This paper proposes an original approach to cluster multi-component data sets, including an estimation of the number of clusters. From the construction of a minimal spanning tree with Prim's algorithm, and the assumption that the vertices are approximately distributed according to a Poisson distribution, the number of clusters is estimated by thresholding the Prim's trajectory. The corresponding cluster centroids are

Laurent Galluccio; Olivier J. J. Michel; Pierre Comon; Eric Slezak; Alfred O. Hero

To address the sliding window based clustering, two types of exponential histogram of cluster features, false positive and false negative, are introduced in this paper. With these structures, a clusteringalgorithm based on sliding windows is proposed. The algorithm can precisely obtain the distribution of recent records with limited memory, thus it can produce the clustering result over sliding windows.

Background Support Vector Machines (SVMs) provide a powerful method for classification (supervised learning). Use of SVMs for clustering (unsupervised learning) is now being considered in a number of different ways. Results An SVM-based clusteringalgorithm is introduced that clusters data with no a priori knowledge of input classes. The algorithm initializes by first running a binary SVM classifier against a data set with each vector in the set randomly labelled, this is repeated until an initial convergence occurs. Once this initialization step is complete, the SVM confidence parameters for classification on each of the training instances can be accessed. The lowest confidence data (e.g., the worst of the mislabelled data) then has its' labels switched to the other class label. The SVM is then re-run on the data set (with partly re-labelled data) and is guaranteed to converge in this situation since it converged previously, and now it has fewer data points to carry with mislabelling penalties. This approach appears to limit exposure to the local minima traps that can occur with other approaches. Thus, the algorithm then improves on its weakly convergent result by SVM re-training after each re-labeling on the worst of the misclassified vectors – i.e., those feature vectors with confidence factor values beyond some threshold. The repetition of the above process improves the accuracy, here a measure of separability, until there are no misclassifications. Variations on this type of clustering approach are shown. Conclusion Non-parametric SVM-based clustering methods may allow for much improved performance over parametric approaches, particularly if they can be designed to inherit the strengths of their supervised SVM counterparts.

Construction of two haplotypes from a set of Single Nucleotide Polymorphism (SNP) fragments is called haplotype reconstruction problem. One of the most popular computational model for this problem is Minimum Error Correction (MEC). Since MEC is an NP-hard problem, here we propose a novel heuristic algorithm based on clustering analysis in data mining for haplotype reconstruction problem. Based on hamming distance and similarity between two fragments, our iterative algorithm produces two clusters of fragments; then, in each iteration, the algorithm assigns a fragment to one of the clusters. Our results suggest that the algorithm has less reconstruction error rate in comparison with other algorithms.

Eslahchi, Changiz; Sadeghi, Mehdi; Pezeshk, Hamid; Kargar, Mehdi; Poormohammadi, Hadi

Texture analysis has been efficiently utilized in the area of terrain classification. The widely used co-occurrence features have been reported most effective for this application. Since the number of co-occurrence features is very high, a terrain classifier based on co-occurrence features should deal with high dimensionality problem. This paper deals with how to solve high dimensionality problems by employing a

Data stream clustering is an importance issue in data stream mining. In most of the existing algorithms, only the continuous features are used for clustering. In this paper, we introduce an algorithm HDenStream for clustering data stream with heterogeneous features. The HDenstream is also a density-based algorithm, so it is capable enough to cluster arbitrary shapes and handle outliers. Theoretic

Clustering protein-protein interaction networks (PINs) helps to identify complexes that guide the cell machinery. Clusteringalgorithms often create a flat clustering, without considering the layered structure of PINs. We propose the MULIC clusteringalgorithm that produces layered clusters. We applied MULIC to five PINs. Clusters correlate with known MIPS protein complexes. For example, a cluster of 79 proteins overlaps with

Bill Andreopoulos; An Aijun; Xiangji Huang; Xiaogang Wang

SUMMARY This paper presents an RK-means clusteringalgorithm which is developed for reliable data grouping by introducing a new reli- ability evaluation to the K-means clusteringalgorithm. The conventional K-means clusteringalgorithm has two shortfalls: 1) the clustering result will become unreliable if the assumed number of the clusters is incorrect; 2) during the update of a cluster center, all

Clustering of data streams has received much attention in recent years. Many algorithms have been proposed in order to discover clusters in data streams efficiently and effectively. Most of these algorithms are devoted to discover the clusters with high density, correlation and low intra-cluster distance. Especially, many methods are considered to balance the memory, the time consumption and the clustering

Fuzzy clustering is one of the most frequently used methods for identifying homogeneous regions in remote sensing images. In the case of large images, the computational costs of fuzzy clustering can be prohibitive unless high performance computing is used. Therefore, efficient parallel implementations are highly desirable. This paper presents results on the efficiency of a parallelization strategy for the Fuzzy c-Means (FCM) algorithm. In addition, the parallelization strategy has been extended in the case of two FCM variants, which incorporates spatial information (Spatial FCM and Gaussian Kernel-based FCM with spatial bias correction). The high-level requirements that guided the formulation of the proposed parallel implementations are: (i) find appropriate partitioning of large images in order to ensure a balanced load of processors; (ii) use as much as possible the collective computations; (iii) reduce the cost of communications between processors. The parallel implementations were tested through several test cases including multispectral images and images having a large number of pixels. The experiments were conducted on both a computational cluster and a BlueGene/P supercomputer with up to 1024 processors. Generally, good scalability was obtained both with respect to the number of clusters and the number of spectral bands.

Contrary to conventional wisdom, the construction of clusters on a lattice can easily be vectorized, namely over each ``generation'' in a breadth first search. This applies directly to e.g. the single cluster variant of the Swendsen-Wang algorithm. On a Cray-YMP, total CPU time was reduced by a factor 3.5 - 7 in actual applications.

This short paper reports on work in progress related to applying data partitioning\\/clusteringalgorithms to ratings data in collaborative filtering. We use existing data partitioning and clusteringalgorithms to partition the set of items based on user rating data. Predictions are then computed independently within each partition. Ideally, partitioning will improve the quality of collaborative filtering predictions and increase the

In this paper, we consider the task of clustering multivariate normal distributions with respect to the relative entropy into\\u000a a prescribed number, k, of clusters using a generalization of Lloyd’s k-means algorithm [1]. We revisit this information-theoretic clustering problem under the auspices of mixed-type Bregman divergences,\\u000a and show that the approach of Davis and Dhillon [2] (NIPS*06) can also be

Camouflaged targets detection in complex background is a challenging problem. Spectral-polarimetric imaging can offers spectral information and polarization information from the objects in the scene. Fusion of the spectral and polarization information in the images will result in better camouflaged target identification and recognition. In this paper a novel spectral-polarimetric image fusion algorithm based on Shearlet transform is proposed. Firstly, every polarimetric image in each wave band is decomposed into images of low frequency components and high frequency components by Shearlet transform. Then, the fused low frequency approximate coefficients are obtained with weighted average method, and the fused high frequency coefficients are obtained with area-based feature selection method, so features and details from different spectral-polarimetric images are fused successfully. After that, the kernel fuzzy c-meansclusteringalgorithm is used for camouflaged object separation from its background. Experimental results have shown that better identification performance was achieved.

Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as hierarchical\\u000a clustering and k-means. Ant-based clustering stands out as the most widely used group of swarm-based clusteringalgorithms. Broadly speaking,\\u000a there are two main types of ant-based clustering: the first group of methods directly mimics the clustering behavior observed\\u000a in real ant colonies. The

This paper presents a genetic algorithm using a matrix genome encoding to schedule distributed tasks, represented by a directed acyclic graph, on processors in order to minimize the maximum task finishing time. Our experimental results show that this algorithm provides better scheduling results than list scheduling with insertion; and dominant sequence clustering heuristics. Our algorithm generates good schedules even in

Considering the shortages of the classic MDS- MAP algorithm on localization precision and algorithm complexity, a distributional localization algorithm based on MDS had been proposed in this paper. The method of cluster was involved to build different clusters. MDS algorithm was used in every cluster for local relative coordinates, Euclidean algorithm was used to calculate the distance matrix in this

A clusteringalgorithm based on the minimum volume ellipsoid (MVE) robust estimator is proposed. The MVE estimator identifies the least volume region containing h percent of the data points. The clusteringalgorithm iteratively partitions the space into clusters without prior information about their number. At each iteration, the MVE estimator is applied several times with values of h decreasing from

This paper presents a new iterative approach of the clustering problem. We analyze the behaviour of the method applied on the Dominant Sequence Clustering (DSC) algorithm and prove that the parallel time of the series of clusterings built by the iterations is decreasing. We establish the optimality of the Iterative algorithm on outtrees and series-parallel graphs. Finally some experimental results

Neural Network models are commonly used for cluster analysis in engineering, computational neuroscience, and the biological\\u000a sciences, although they are rarely used in the social sciences. In this study we compare the classification capabilities of\\u000a the 1-dimensional Kohonen neural network with two partitioning (Hartigan and Spthk-means) and three hierarchical (Ward's, complete linkage, and average linkage) cluster methods in 2,580 data

Niels G. Waller; Heather A. Kaiser; Janine B. Illian; Mike Manry

This paper focuses on the development of an effective cluster validity measure with outlier detection and cluster merging algorithms for support vector clustering (SVC). Since SVC is a kernel-based clustering approach, the parameter of kernel functions and the soft-margin constants in Lagrangian functions play a crucial role in the clustering results. The major contribution of this paper is that our

Brightest cluster galaxies (BCGs) play an important role in several fields of astronomical research. The literature includes many different methods and criteria for identifying the BCG in the cluster, such as choosing the brightest galaxy, the galaxy nearest the X-ray peak, or the galaxy with the most extended profile. Here we examine a sample of 75 clusters from the Archive of Chandra Cluster Entropy Profile Tables (ACCEPT) and the Sloan Digital Sky Survey (SDSS), measuring masked magnitudes and profiles for BCG candidates in each cluster. We first identified galaxies by hand; in 15% of clusters at least one team member selected a different galaxy than the others.We also applied 6 other identification methods to the ACCEPT sample; in 30% of clusters at least one of these methods selected a different galaxy than the other methods. We then developed an algorithm that weighs brightness, profile, and proximity to the X-ray peak and centroid. This algorithm incorporates the advantages of by-hand identification (weighing multiple properties) and automated selection (repeatable and consistent). The BCG population chosen by the algorithm is more uniform in its properties than populations selected by other methods, particularly in the relation between absolute magnitude (a proxy for galaxy mass) and average gas temperature (a proxy for cluster mass). This work supported by a Barry M. Goldwater Scholarship and a Sid Jansma Summer Research Fellowship.

Leisman, Luke; Haarsma, D. B.; Sebald, D. A.; ACCEPT Team

K-means algorithm has been shown to be an effective and efficient algorithm for clustering. However, the k-means algorithm is developed for numerical data only. It is not suitable for the clustering of non-numerical data. K-modes algorithm has been developed for clustering categorical objects by extending from the k-means algorithm. However, no one applies this technique for classification of categorical data.

Ching-San Chiang; Shu-Chuan Chu; Yi-Chih Hsin; Ming-Hui Wang

Breadth-first search for a single cluster on a regular lattice is shown to be vectorizable. It is applied to construct clusters in the single-cluster variant of the Swendsen-Wang algorithm. On a Cray-YMP, total CPU time has been reduced by factors of 3.5-7 in large-scale applications. A multiple-cluster version is also described.

Modelling the traffic conditions has become necessary in the modern connected society. We have attempted to use clusteringalgorithms to classify traffic flow in and around Pune city into classes representing geographical locations of sampling of the data. The algorithm employs Sammon's mapping along with fuzzy clusteringalgorithms to cluster the data. Such high-end parameterization of traffic flow can help

From the beginning of the data analysis system cluster computing plays an important role on it. The very early developed clusteringalgorithms which can handle only numerical data and K-means clustering is one of them and was proposed by Macqueen [1] in 1967. This algorithm helps us to find the homogeneity of the data set. This K-means algorithm has been

We describe here the use of Density Matrix Renormalization Group and Matrix Product algorithms for highly scalable impurity solvers in cluster Dynamical Mean Field Theory. We present results on 2- and 4-site clusters for the Hubbard model.

Several clusteringalgorithms have been suggested to analyse genome expression data, but fewer solutions have been implemented to guide the design of clustering-based experiments and assess the quality of their outcomes. A cluster validity framework provides insights into the problem of predicting the correct the number of clusters. This paper presents several validation techniques for gene expression data analysis. Normalisation

In this paper, we present an approach for automatic clustering of multi-dimensional dynamic trajectories corresponding to speech data that is based on Trajectory Clustering (TC). TC uses the Expec- tation Maximization algorithm (EM) for clustering with the mix- tures of Multiple Linear Regression model. Since the initial val- ues of the model parameters are critical to the clustering perfor- mance,

A new operational definition of cluster is proposed, and a fuzzy clusteringalgorithm with minimal biases is formulated by making use of the maximum entropy principle to maximize the entropy of the centroids with respect to the data points (clustering entropy). The authors make no assumptions on the number of clusters or their initial positions. For each value of an

High fidelity analysis are utilized in modern engineering design optimization problems which involve expensive black-box models. For computation-intensive engineering design problems, efficient global optimization methods must be developed to relieve the computational burden. A new metamodel-based global optimization method using fuzzy clustering for design space reduction (MGO-FCR) is presented. The uniformly distributed initial sample points are generated by Latin hypercube design to construct the radial basis function metamodel, whose accuracy is improved with increasing number of sample points gradually. Fuzzy c-mean method and Gath-Geva clustering method are applied to divide the design space into several small interesting cluster spaces for low and high dimensional problems respectively. Modeling efficiency and accuracy are directly related to the design space, so unconcerned spaces are eliminated by the proposed reduction principle and two pseudo reduction algorithms. The reduction principle is developed to determine whether the current design space should be reduced and which space is eliminated. The first pseudo reduction algorithm improves the speed of clustering, while the second pseudo reduction algorithm ensures the design space to be reduced. Through several numerical benchmark functions, comparative studies with adaptive response surface method, approximated unimodal region elimination method and mode-pursuing sampling are carried out. The optimization results reveal that this method captures the real global optimum for all the numerical benchmark functions. And the number of function evaluations show that the efficiency of this method is favorable especially for high dimensional problems. Based on this global design optimization method, a design optimization of a lifting surface in high speed flow is carried out and this method saves about 10 h compared with genetic algorithms. This method possesses favorable performance on efficiency, robustness and capability of global convergence and gives a new optimization strategy for engineering design optimization problems involving expensive black box models.

We present an integrated method for record clustering and reorganization which can be applied to any set of queries whose frequencies of request are known. The clusteringalgorithm works by splitting and merging current clusters and, furthermore, produces a new assignment of these clusters to pages in secondary storage. The reorganization algorithm is an on-line, incremental procedure for allocating the

In this paper we present a scheduling algorithm that assigns tasks of medium size grain. The behavior of the proposed algorithm, called extended latency time (ELT), is compared with the dominant sequence clustering (DSC) algorithm. One of the inputs values required by the ELT algorithm is the maximum number of processors available in the architecture. This value corresponds to the

In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial assumptions about the structure of data. Here, we reformulate the clustering problem from an information theoretic perspective that avoids many of these assumptions. In particular, our formulation obviates the need for defining a cluster “prototype,” does not require an a priori similarity metric, is invariant to changes in the representation of the data, and naturally captures nonlinear relations. We apply this approach to different domains and find that it consistently produces clusters that are more coherent than those extracted by existing algorithms. Finally, our approach provides a way of clustering based on collective notions of similarity rather than the traditional pairwise measures.

Slonim, Noam; Atwal, Gurinder Singh; Tkacik, Gasper; Bialek, William

Data clusteringalgorithmic schemes receive much careful research insight due to the prominent role that clustering plays\\u000a during data analysis. Proper data clustering reveals data structure and makes possible further data processing and analysis.\\u000a In the application area, k-means clusteringalgorithms are most often exploited in almost all important branches of data processing and data exploration.\\u000a During last decades, a

Clustering multimodal datasets can be problematic when a conventional algorithm such as k-means is applied due to its implicit assumption of Gaussian distribution of the dataset. This paper proposes a tandem clustering process for multimodal data sets. The proposed method first divides the multimodal dataset into many small pre-clusters by applying k-means or fuzzy k-means algorithm. These pre-clusters are then

Catherine Cho; Sooyoung Kim; Jaewook Lee; Dae-won Lee

In this paper we present a system for static and dy- namic information organization and show our evaluations of this system on TREC data. We introduce the off-line and on-line star clusteringalgorithms for information or- ganization. Our evaluation experiments show that the off- line star algorithm outperforms the single link and average link clusteringalgorithms. Since the star algorithm

In this work we compare balance and edge-cut evaluation metrics to measure the performance of two wellknown graph data-grouping algorithms applied to four web and social network graphs. One of the algorithms employs a partitioning technique using Kmetis tool, and the other employs a clustering technique using Scluster tool. Because clusteringalgorithms use a similarity measure between each graph node and partitioning algorithms use a dissimilarity measure (weight), it was necessary to apply a normalized function to convert weighted graphs to similarity matrices.

Afonso, Carlos; Ferreira, Fábio; Exposto, José; Pereira, Ana I.

Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clusteringalgorithms play an important role in helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clusteringalgorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.

Motivation: Clustering is a useful exploratory technique for the analysis of gene expression data. Many different heuristic clusteringalgorithms have been proposed in this context. Clusteringalgorithms based on probability models offer a principled alternative to heuristic algorithms. In particular, model-based clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal

Ka Yee Yeung; Chris Fraley; A. Murua; Adrian E. Raftery; Walter L. Ruzzo

How to design an energy efficient routing algorithm is a hot topic in the research of wireless sensor networks. In this paper, based on the analysis of some typical cluster-based routing algorithms, a novel cluster-based energy efficient routing algorithm is proposed to solve the hot spot problem involved in inter-cluster routing and optimize network lifetime. Clusters are formed by local

Background Potentially inappropriate prescribing in older people is common in primary care and can result in increased morbidity, adverse drug events, hospitalizations and mortality. In Ireland, 36% of those aged 70 years or over received at least one potentially inappropriate medication, with an associated expenditure of over €45 million. The main objective of this study is to determine the effectiveness and acceptability of a complex, multifaceted intervention in reducing the level of potentially inappropriate prescribing in primary care. Methods/design This study is a pragmatic cluster randomized controlled trial, conducted in primary care (OPTI-SCRIPT trial), involving 22 practices (clusters) and 220 patients. Practices will be allocated to intervention or control arms using minimization, with intervention participants receiving a complex multifaceted intervention incorporating academic detailing, medicines review with web-based pharmaceutical treatment algorithms that provide recommended alternative treatment options, and tailored patient information leaflets. Control practices will deliver usual care and receive simple patient-level feedback on potentially inappropriate prescribing. Routinely collected national prescribing data will also be analyzed for nonparticipating practices, acting as a contemporary national control. The primary outcomes are the proportion of participant patients with potentially inappropriate prescribing and the mean number of potentially inappropriate prescriptions per patient. In addition, economic and qualitative evaluations will be conducted. Discussion This study will establish the effectiveness of a multifaceted intervention in reducing potentially inappropriate prescribing in older people in Irish primary care that is generalizable to countries with similar prescribing challenges. Trial registration Current controlled trials ISRCTN41694007

Genetic algorithms are becoming increasingly valuable in solving large-scale, realistic, difficult problems, and selecting replica with multiple selection criteria - availability, security and time- is one of these problems. In this paper, a rank based elitist clustering Genetic Algorithm is proposed named RRWSGA, which alleviates the problem of being trapped in local clustering centroids using k-mean. Simulation results show that

Omar Al Jadaan; W. Abdulal; M. A. Hameed; A. Jabas

Identifying atypical objects is one of the classic tasks in machine learning. Recent works, e.g., One-class Clustering and Minority Detection, have explored the task further to identify clusters of atypical objects which strongly contrast from the rest of the dataset. In such problems, avoiding false positive detection is an important yet significantly difficult issue. In this paper, we propose an information theoretic clustering which aims to compactly represent the global and local structures of the dataset and identify atypical clusters in terms of information geometric distance. The former objective contributes to reducing the number of false positive detections. Its formalization further yields a unifying view of the classic outlier detection and the novel tasks. We present a scalable algorithm for detecting multiple clusters of atypical objects without a pre-defined number of clusters. The algorithm is evaluated as an unsupervised two-class classification using simulated datasets and a text classification benchmark.

Motivation: Clustering genes based upon their expres- sion patterns allows us to predict gene function. Most existing clus- tering algorithmscluster genes together when their expression pat- terns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive ó genes responding to the

Inderjit S. Dhillon; Edward M. Marcotte; Usman Roshan

This paper develops theory and algorithms concerning a new metric for clustering data. The metric minimizes the total volume of clusters, where the volume of a cluster is defined as the volume of the minimum volume ellipsoid (MVE) enclosing all data points in the cluster. This metric is scale-invariant, that is, the optimal clusters are invariant under an affine transformation

This paper develops theory and algorithms concerning a new metric for clustering data. The metric minimizes the total volume of clusters, where volume of a cluster is de- fined as the volume of the minimum volume ellipsoid (MVE) enclosing all data points in the cluster. This metric has the scale-invariant property, that is, the optimal clusters are in- variant under

Skull-stripping is one of the most important preprocessing steps in neuroimage analysis. We proposed a hybrid algorithm based on an adaptive balloon snake model to handle this challenging task. The proposed framework consists of two stages: first, the fuzzy possibilistic c-means (FPCM) is used for voxel clustering, which provides a labeled image for the snake contour initialization. In the second stage, the contour is initialized outside the brain surface based on the FPCM result and evolves under the guidance of the balloon snake model, which drives the contour with an adaptive inward normal force to capture the boundary of the brain. The similarity indices indicate that our method outperformed the BSE and BET methods in skull-stripping the MR image volumes in the IBSR data set. Experimental results show the effectiveness of this new scheme and potential applications in a wide variety of skull-stripping applications.

Liu, Hung-Ting; Sheu, Tony W. H.; Chang, Herng-Hua

Cluster analysis is an efficient way to display relationships among many objects. This technique is applied to water column soundspeed profiles to identify and geographically locate distinct oceanographic provinces. In this study, a hierarchical agglomerative cluster analysis algorithm that uses Euclidian distance as a measure of dissimilarity successfully identified acoustically distinct provinces in a matter of minutes. When profiles are

Many successful clustering techniques fail to handle data with a manifold structure, i.e. data that is not shaped in the form of compact point clouds, forming arbitrary shapes or paths through a high-dimensional space. In this paper we present a new method to evaluate distances in such spaces that naturally extend the application of many clusteringalgorithms to these cases.

The clustering problem is a dicult problem for the data stream domain. This is because the large volumes of data arriving in a stream renders most traditional algorithms too inef- cien t. In recent years, a few one-pass clus- tering algorithms have been developed for the data stream problem. Although such methods address the scalability issues of the clustering problem,

Charu C. Aggarwal; Jiawei Han; Jianyong Wang; Philip S. Yu

Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clusteringalgorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.

The density-based clusteringalgorithm presented is different from the classical Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al., 1996), and has the following advantages: first, Greedy algorithm substitutes for R(*)-tree (Bechmann et al., 1990) in DBSCAN to index the clustering space so that the clustering time cost is decreased to great extent and I/O memory load is reduced as well; second, the merging condition to approach to arbitrary-shaped clusters is designed carefully so that a single threshold can distinguish correctly all clusters in a large spatial dataset though some density-skewed clusters live in it. Finally, authors investigate a robotic navigation and test two artificial datasets by the proposed algorithm to verify its effectiveness and efficiency. PMID:15495334

A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.

The first step in creating a cellular manufacturing system is to identify machine groups and form part families. Clustering and data organization (CDR) algorithms (such as the bond energy algorithm) and array sorting (ARS) methods (such as the rank order clusteringalgorithm) have been proposed to solve the machine and part grouping problem. However, these methods do not always produce

Finding the "right" number of clusters, k, for a dataset is a difficult, and often ill-posed, problem. Ina probabilistic clustering context, likelihood-ratios,penalized likelihoods, and Bayesian techniques areamong the more popular techniques. In this papera new cross-validated likelihood criterion is investigatedfor determining cluster structure. A practicalclustering algorithm based on Monte Carlo crossvalidation (MCCV) is introduced. The algorithm permitsthe data analyst to

Artistic documents often contain text along curved paths. Commonly used computer algorithms for text extraction, which only deal with text along straight lines, cannot be employed to analyze these artistic documents. In this paper, we solve this problem using the fuzzy curve-tracing algorithm. In our method, the character pixels are grouped based on the fuzzy c-meansalgorithm to reduce the

Recent ensemble clustering techniques have been shown to be eective in improving the accuracy and stability of standard clus- tering algorithms. However, an inherent drawback of these techniques is the computational cost of generating and combining multiple clusterings of the data. In this paper, we present an ecient kernel-based ensem- ble clustering method suitable for application to large, high-dimensional datasets

We present a catalogue of astrophysical data for 520 Galactic open clusters. These are the clusters for which at least three most probable members (18 on average) could be identified in the ASCC-2.5, a catalogue of stars based on the Tycho-2 observations from the Hipparcos mission. We applied homogeneous methods and algorithms to determine angular sizes of cluster cores and

N. V. Kharchenko; A. E. Piskunov; S. Röser; E. Schilbach; R.-D. Scholz

Clustering is a mostly unsupervised procedure and the majority of the clusteringalgorithms depend on certain assumptions in order to define the subgroups present in a data set. Moreover, they may behave in a different way depending on the features of the data set and their input parameters values. Therefore, in most applications the resulting clustering scheme requires some sort

We develop a simple and effective approach for approximate estimation of the cluster centers on the basis of the concept of a mountain function. We call the procedure the mountain method. It can be useful for obtaining the initial values of the clusters that are required by more complex clusteralgorithms. It also can be used as a stand alone

We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.

Recursive identification of nonlinear systems is investigated using radial basis function networks. A novel approach is adopted which employs a hybrid clustering and least squares algorithm. The recursive clusteringalgorithm adjusts the centers of the ra...

In this paper we describe several new clusteringalgorithms for nodes in a mobile ad hoc network. We propose to combine two known approaches into a single clusteringalgorithm which considers connectivity as a primary and lower ID as secondary criterion for selecting clusterheads. The goal is to minimize number of clusters, which leads toward dominating sets of smaller sizes

Geng Chen; Fabian Garcia Nocetti; Julio Solano-gonzález; Ivan Stojmenovic

Metal cluster chemistry is one of the most rapidly developing areas of inorganic and organometallic chemistry. Prior to 1960 only a few metal clusters were well characterized. However, shortly after the early development of boron cluster chemistry, the field of metal cluster chemistry began to grow at a very rapid rate and a structural and a qualitative theoretical understanding of clusters came quickly. Analyzed here is the chemistry and the general significance of clusters with particular emphasis on the cluster research within my group. The importance of coordinately unsaturated, very reactive metal clusters is the major subject of discussion.

The main disadvantage of the k-means algorithm is that the number of clusters, K, must be supplied as a parameter. In this paper we present a simple validity measure based on the intra-cluster and inter-cluster distance measures which allows the number of clusters to be determined automatically. The basic procedure involves producing all the segmented images for 2 clusters up

In recent years, many human tracking researches have been proposed in order to analyze human dynamic trajectory. These researches are general technology applicable to various fields, such as customer purchase analysis in a shopping environment and safety control in a (railroad) crossing. In this paper, we present a new approach for tracking human positions by stereo image. We use the framework of two-stepped clustering with k-means method and fuzzy clustering to detect human regions. In the initial clustering, k-means method makes middle clusters from objective features extracted by stereo vision at high speed. In the last clustering, c-means fuzzy method cluster middle clusters based on attributes into human regions. Our proposed method can be correctly clustered by expressing ambiguity using fuzzy clustering, even when many people are close to each other. The validity of our technique was evaluated with the experiment of trajectories extraction of doctors and nurses in an emergency room of a hospital.

Abstract In recent years, spectral clustering method has gained attentions because of its superior performance compared to other traditional clusteringalgorithms such as K-means algorithm. The existing spectral clusteringalgorithms are all o-line algorithms, i.e., they can not incrementally update the clustering result given a small change of the data set. However, the capability of incrementally updating is essential to

Huazhong Ning; Wei Xu; Yun Chi; Yihong Gong; Thomas S. Huang

In recent years, spectral clustering method has gained attentions because of its superior performance com- pared to other traditional clusteringalgorithms such as K-means algorithm. The existing spectral cluster- ing algorithms are all o-line algorithms, i.e., they can not incrementally update the clustering result given a small change of the data set. However, the capabil- ity of incrementally updating is

Huazhong Ning; Wei Xu; Yun Chi; Yihong Gong; Thomas Huang

When analyzing spatial databases or other datasets with spatial attributes, one frequently wants to cluster the data according to spatial attributes. In this paper, we describe a novel density-based spatial clustering method called DBRS. The algorithm can identify clusters of widely varying shapes, clusters of varying densities, clusters which depend on non-spatial attributes, and approximate clusters in very large databases.

FCM algorithm is a well-known unsupervised clusteringalgorithm, which needs to know the number of clusters in advance. This paper proposes a method that uses GA according to the optimal value of an evaluation index CS to determine the best cluster result and its corresponding clustering number of FCM. This method prevents from the subjective and large computation by traditional

This paper describes a general fuzzy min-max (GFMM) neural network which is a generalization and extension of the fuzzy min-max clustering and classification algorithms of Simpson (1992, 1993). The GFMM method combines supervised and unsupervised learning in a single training algorithm. The fusion of clustering and classification resulted in an algorithm that can be used as pure clustering, pure classification,

We present an online adaptive clusteringalgorithm in a decision tree framework which has an adaptive tree and a code formation layer. The code formation layer stores the representative codes of the clusters and the tree adapts the separating hyperplanes between the clusters. The membership of a sample in a cluster is decided by the tree and the tree parameters

This paper presents an adaptive distributed TDMA slot assignment algorithm, called A-DRAND, which is an improved version of DRAM) in clustered wireless sensor networks where cluster heads need more slots and will be alternated afterwards by other cluster members for energy balance reason. It utilizes cluster info to allocate slots discriminately for different kind of sensor node and adapts its

This paper aims to assess the effectiveness of three different clusteringalgorithms, used to detect breast cancer recurrent events. The performance of a classical k-means algorithm is compared with a much more sophisticated Self-Organizing Map (SOM-Kohonen network) and a cluster network, closely related to both k-means and SOM. The three clusteringalgorithms have been applied on a concrete breast cancer

Smaranda Belciug; Abdel-Badeeh M. Salem; Florin Gorunescu; Marina Gorunescu

Abstract Clustering and tessellations are basic tools in data mining. The k-means and EM algorithms are two of the most im- portant algorithms in the Mixture Model-based clustering and tessellations. In this paper, we introduce a new cluster- ing strategy which shares common,features with both the EM and,k-means algorithms. Our methods,also lead to more general tessellations of a spatial region

Clustering and tessellations are basic tools in data mining. The k-means and EM algorithms are two of the most im- portant algorithms in the Mixture Model-based clustering and tessellations. In this paper, we introduce a new cluster- ing strategy which shares common features with both the EM and k-means algorithms. Our methods also lead to more general tessellations of a

In this article the searching capability of genetic algorithms has been exploited for automatically evolving the number of clusters as well as proper clustering of any data set. A new string representation, comprising both real numbers and the do not care symbol, is used in order to encode a variable number of clusters. The Davies–Bouldin index is used as a

Cluster headache causes severe unilateral temporal or periorbital pain, lasting 15 to 180 minutes and accompanied by autonomic symptoms in the nose, eyes, and face. Headaches often recur at the same time each day during the cluster period, which can last for weeks to months. Some patients have chronic cluster headache without remission periods. The pathophysiology of cluster headache is not fully understood, but may include a genetic component. Cluster headache is more prevalent in men and typically begins between 20 and 40 years of age. Treatment focuses on avoiding triggers and includes abortive therapies, prophylaxis during the cluster period, and long-term treatment in patients with chronic cluster headache. Evidence supports the use of supplemental oxygen, sumatriptan, and zolmitriptan for acute treatment of episodic cluster headache. Verapamil is first-line prophylactic therapy and can also be used to treat chronic cluster headache. More invasive treatments, including nerve stimulation and surgery, may be helpful in refractory cases. PMID:23939643

The majority of current distributed mutual exclusion algorithms are not suited for parallel or distributed applications on a Grid as they do not consider the heterogeneity of latency on Grids. We propose two distributed mutual exclusion algorithms, based on Naimi–Trehel's token-based algorithm, which take into account latency gaps, especially those between local and remote clusters of machines. Our first algorithm

Marin Bertier; Luciana Bezerra Arantes; Pierre Sens

Using a Lagrangian-based approach, we present a more elegant derivation of the equations necessary for the variational optimization of the molecular orbitals (MOs) for the coupled-cluster doubles (CCD) method and second-order Møller-Plesset perturbation theory (MP2). These orbital-optimized theories are referred to as OO-CCD and OO-MP2 (or simply "OD" and "OMP2" for short), respectively. We also present an improved algorithm for orbital optimization in these methods. Explicit equations for response density matrices, the MO gradient, and the MO Hessian are reported both in spin-orbital and closed-shell spin-adapted forms. The Newton-Raphson algorithm is used for the optimization procedure using the MO gradient and Hessian. Further, orbital stability analyses are also carried out at correlated levels. The OD and OMP2 approaches are compared with the standard MP2, CCD, CCSD, and CCSD(T) methods. All these methods are applied to H(2)O, three diatomics, and the O(4)(+) molecule. Results demonstrate that the CCSD and OD methods give nearly identical results for H(2)O and diatomics; however, in symmetry-breaking problems as exemplified by O(4)(+), the OD method provides better results for vibrational frequencies. The OD method has further advantages over CCSD: its analytic gradients are easier to compute since there is no need to solve the coupled-perturbed equations for the orbital response, the computation of one-electron properties are easier because there is no response contribution to the particle density matrices, the variational optimized orbitals can be readily extended to allow inactive orbitals, it avoids spurious second-order poles in its response function, and its transition dipole moments are gauge invariant. The OMP2 has these same advantages over canonical MP2, making it promising for excited state properties via linear response theory. The quadratically convergent orbital-optimization procedure converges quickly for OMP2, and provides molecular properties that are somewhat different than those of MP2 for most of the test cases considered (although they are similar for H(2)O). Bond lengths are somewhat longer, and vibrational frequencies somewhat smaller, for OMP2 compared to MP2. In the difficult case of O(4)(+), results for several vibrational frequencies are significantly improved in going from MP2 to OMP2. PMID:21932872

Bozkaya, U?ur; Turney, Justin M; Yamaguchi, Yukio; Schaefer, Henry F; Sherrill, C David

Using a Lagrangian-based approach, we present a more elegant derivation of the equations necessary for the variational optimization of the molecular orbitals (MOs) for the coupled-cluster doubles (CCD) method and second-order Møller-Plesset perturbation theory (MP2). These orbital-optimized theories are referred to as OO-CCD and OO-MP2 (or simply ``OD'' and ``OMP2'' for short), respectively. We also present an improved algorithm for orbital optimization in these methods. Explicit equations for response density matrices, the MO gradient, and the MO Hessian are reported both in spin-orbital and closed-shell spin-adapted forms. The Newton-Raphson algorithm is used for the optimization procedure using the MO gradient and Hessian. Further, orbital stability analyses are also carried out at correlated levels. The OD and OMP2 approaches are compared with the standard MP2, CCD, CCSD, and CCSD(T) methods. All these methods are applied to H2O, three diatomics, and the O4+ molecule. Results demonstrate that the CCSD and OD methods give nearly identical results for H2O and diatomics; however, in symmetry-breaking problems as exemplified by O4+, the OD method provides better results for vibrational frequencies. The OD method has further advantages over CCSD: its analytic gradients are easier to compute since there is no need to solve the coupled-perturbed equations for the orbital response, the computation of one-electron properties are easier because there is no response contribution to the particle density matrices, the variational optimized orbitals can be readily extended to allow inactive orbitals, it avoids spurious second-order poles in its response function, and its transition dipole moments are gauge invariant. The OMP2 has these same advantages over canonical MP2, making it promising for excited state properties via linear response theory. The quadratically convergent orbital-optimization procedure converges quickly for OMP2, and provides molecular properties that are somewhat different than those of MP2 for most of the test cases considered (although they are similar for H2O). Bond lengths are somewhat longer, and vibrational frequencies somewhat smaller, for OMP2 compared to MP2. In the difficult case of O4+, results for several vibrational frequencies are significantly improved in going from MP2 to OMP2.

Bozkaya, U?ur; Turney, Justin M.; Yamaguchi, Yukio; Schaefer, Henry F.; Sherrill, C. David

In this chapter we describe the basics of Genetic Algorithms and how they can be used to train Artificial Neural Networks.\\u000a Supervised training of Multilayer Perceptrons for classification problems is considered. We also explain how the Genetic Algorithm\\u000a can be hybridized with other algorithms and present two hybrids between it and two classical algorithms for the neural network\\u000a training: Backpropagation

Algorithm Engineering is concerned with the design, analysis, implementation, tun- ing, debugging and experimental evaluation of computer programs for solving algorithmic problems. It provides methodologies and tools for developing and engineering efficient al- gorithmic codes and aims at integrating and reinforcing traditional theoretical approaches for the design and analysis of algorithms and data structures.

Camil Demetrescu; Irene FinocchiGiuseppe; F. Italianok

Clustering very large databases is a challenge for traditional pattern recognition algorithms, e.g. the expectation-maximization (EM) algorithm for fitting mixture models, because of high memory and iteration requirements. Over large databases, the cost of the numerous scans required to converge and large memory requirement of the algorithm becomes prohibitive. We present a decomposition of the EM algorithm requiring a small

Segmentation of satellite images is an important step for the success of the object detection and recognition in image processing. Segmentation is the process of dividing the image into disjoint homogeneous regions. There are many segmentation methods and approaches, the most popular are clustering methods and approaches such as Fuzzy C-Means (FCM) and K- means. The success of clustering methods

Pure and mixed clusters of fullerenes (C60 and C70) as well as of nucleobases have been produced within a cluster aggregation source and have been multiply ionised in collisions with highly charged ions. Multiply charged clusters and the corresponding appearance sizes have been identified for charge states up to q=5. In the fullerene case, the dominant fragmentation channel leads to the emission of singly charged fullerenes. Furthermore, it is concluded that the fullerene clusters, which in their neutral state are insulators, bound only by weak van der Waals forces, become conducting as soon as being multiply charged. Thus, the excess charges turn out to be delocalised. This phenomenon is explained by a rapid charge transfer to neighboured fullerene molecules well in agreement with predictions of the conducting sphere model. In the case of biomolecular clusters, it is found that the aggregation probability varies strongly for different nucleobases. In some cases the cluster distributions show strong variations due to possible shell effects. In the case of mixed biomolecular clusters a strong enhancement is observed for those clusters containing a so called Watson-Crick pair, for example a dimer of thymine and adenine.

An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clusteringalgorithm designed for very large data sets. The comparison is done for fuzzy and hard k-means based clusteringalgorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups.

Hore, Prodip; Hall, Lawrence O.; Goldgof, Dmitry B.

In the study of geographic disease clusters, an alternative to traditional methods based on rates is to analyze case locations on a transformed map in which population density is everywhere equal. Although the analyst's task is thereby simplified, the specification of the density equalizing map projection (DEMP) itself is not simple and continues to be the subject of considerable research. Here a new DEMP algorithm is described, which avoids some of the difficulties of earlier approaches. The new algorithm (a) avoids illegal overlapping of transformed polygons; (b) finds the unique solution that minimizes map distortion; (c) provides constant magnification over each map polygon; (d) defines a continuous transformation over the entire map domain; (e) defines an inverse transformation; (f) can accept optional constraints such as fixed boundaries; and (g) can use commercially supported minimization software. Work is continuing to improve computing efficiency and improve the algorithm. 21 refs., 15 figs., 2 tabs.

In order to meet users find valuable information in lots of information, the recommended system came into being. Recommended system in e-commerce platform to play the role of sales staff, recommend products to users, help users find the products, collaborative filtering technology is recommender system the application of the earliest and one of the most successful techniques, However, with the

A Radial Basis Function (RBF) Iterative Construction Algorithm (RICA) is presented that autonomously determines the size of the network architecture needed to perform classification on a given data set. The algorithm uses a combination of a Gaussian goodness-of-fit measure and Mahalanobis distance clustering to calculate the number of hidden nodes needed and to estimate the parameters of the hidden node basis functions. An iterative minimum squared error reduction method is used to optimize the output layer weights. RICA is compared to several neural network algorithms, including a fixed architecture multilayer perceptron (MLP), a fixed architecture RBF, and an adaptive architecture MLP, using optical character recognition and infrared image data.

Wilson, Terry A.; Rogers, Steven K.; Oxley, Mark E.; Rathbun, Thomas F.; Desimio, Martin P.; Kabrisky, Matthew

Content prepared for the Supercomputing 2002 session on "Using Clustering Technologies in the Classroom". Contains a series of exercises for teaching parallel computing concepts through kinesthetic activities.

Genetic algorithm simulations, using Buckingham potential to represent the anion-anion and cation-anion short-range interactions, were performed in order to predict the equilibrium positions of the Al and O ions in AlnOm clusters. In order to find the equilibrium structures of compounds a self-organizing genetic algorithm were constructed. The calculation were carried out for several clusters AlnOm, with different numbers of aluminium and oxygen atoms.

We study the problem of clustering uncertain objects whose locations are described by probability density func- tions (pdf). We show that the UK-means algorithm, which generalises the k-means algorithm to handle uncertain ob- jects, is very inefficient. The inefficiencycomes from the fact that UK-means computes expected distances (ED) between objects and cluster representatives. For arbitrary pdf's, ex- pected distances are

Ben Kao; Sau Dan Lee; David W. Cheung; Wai-shing Ho; K. F. Chan

Background Although many consensus clustering methods have been successfully used for combining multiple classifiers in many areas such as machine learning, applied statistics, pattern recognition and bioinformatics, few consensus clustering methods have been applied for combining multiple clusterings of chemical structures. It is known that any individual clustering method will not always give the best results for all types of applications. So, in this paper, three voting and graph-based consensus clusterings were used for combining multiple clusterings of chemical structures to enhance the ability of separating biologically active molecules from inactive ones in each cluster. Results The cumulative voting-based aggregation algorithm (CVAA), cluster-based similarity partitioning algorithm (CSPA) and hyper-graph partitioning algorithm (HGPA) were examined. The F-measure and Quality Partition Index method (QPI) were used to evaluate