Note: This page contains sample records for the topic c-means clustering algorithm from Science.gov.
While these samples are representative of the content of Science.gov,
they are not comprehensive nor are they the most current set.
We encourage you to perform a real-time search of Science.gov
to obtain the most current and comprehensive results.
Last update: August 15, 2014.
1

Efficient Implementation of the Fuzzy c-Means Clustering Algorithms.  

PubMed

This paper reports the results of a numerical comparison of two versions of the fuzzy c-means (FCM) clustering algorithms. In particular, we propose and exemplify an approximate fuzzy c-means (AFCM) implementation based upon replacing the necessary ``exact'' variates in the FCM equation with integer-valued or real-valued estimates. This approximation enables AFCM to exploit a lookup table approach for computing Euclidean distances and for exponentiation. The net effect of the proposed implementation is that CPU time during each iteration is reduced to approximately one sixth of the time required for a literal implementation of the algorithm, while apparently preserving the overall quality of terminal clusters produced. The two implementations are tested numerically on a nine-band digital image, and a pseudocode subroutine is given for the convenience of applications-oriented readers. Our results suggest that AFCM may be used to accelerate FCM processing whenever the feature space is comprised of tuples having a finite number of integer-valued coordinates. PMID:21869343

Cannon, R L; Dave, J V; Bezdek, J C

1986-02-01

2

The new image segmentation algorithm using adaptive evolutionary programming and fuzzy c-means clustering  

NASA Astrophysics Data System (ADS)

Image segmentation remains one of the major challenges in image analysis and computer vision. Fuzzy clustering, as a soft segmentation method, has been widely studied and successfully applied in mage clustering and segmentation. The fuzzy c-means (FCM) algorithm is the most popular method used in mage segmentation. However, most clustering algorithms such as the k-means and the FCM clustering algorithms search for the final clusters values based on the predetermined initial centers. The FCM clustering algorithms does not consider the space information of pixels and is sensitive to noise. In the paper, presents a new fuzzy c-means (FCM) algorithm with adaptive evolutionary programming that provides image clustering. The features of this algorithm are: 1) firstly, it need not predetermined initial centers. Evolutionary programming will help FCM search for better center and escape bad centers at local minima. Secondly, the spatial distance and the Euclidean distance is also considered in the FCM clustering. So this algorithm is more robust to the noises. Thirdly, the adaptive evolutionary programming is proposed. The mutation rule is adaptively changed with learning the useful knowledge in the evolving process. Experiment results shows that the new image segmentation algorithm is effective. It is providing robustness to noisy images.

Liu, Fang

2011-05-01

3

A New Validity Measure for a Correlation-Based Fuzzy C-means Clustering Algorithm  

PubMed Central

One of the major challenges in unsupervised clustering is the lack of consistent means for assessing the quality of clusters. In this paper, we evaluate several validity measures in fuzzy clustering and develop a new measure for a fuzzy c-means algorithm which uses a Pearson correlation in its distance metrics. The measure is designed with within-cluster sum of square, and makes use of fuzzy memberships. In comparing to the existing fuzzy partition coefficient and a fuzzy validity index, this new measure performs consistently across six microarray datasets. The newly developed measure could be used to assess the validity of fuzzy clusters produced by a correlation-based fuzzy c-means clustering algorithm.

Zhang, Mingrui; Zhang, Wei; Sicotte, Hugues; Yang, Ping

2009-01-01

4

A new validity measure for a correlation-based fuzzy c-means clustering algorithm.  

PubMed

One of the major challenges in unsupervised clustering is the lack of consistent means for assessing the quality of clusters. In this paper, we evaluate several validity measures in fuzzy clustering and develop a new measure for a fuzzy c-means algorithm which uses a Pearson correlation in its distance metrics. The measure is designed with within-cluster sum of square, and makes use of fuzzy memberships. In comparing to the existing fuzzy partition coefficient and a fuzzy validity index, this new measure performs consistently across six microarray datasets. The newly developed measure could be used to assess the validity of fuzzy clusters produced by a correlation-based fuzzy c-means clustering algorithm. PMID:19963601

Zhang, Mingrui; Zhang, Wei; Sicotte, Hugues; Yang, Ping

2009-01-01

5

An improved fuzzy c-means algorithm for unbalanced sized clusters  

NASA Astrophysics Data System (ADS)

In this paper, we propose an improved fuzzy c-means (FCM) algorithm based on cluster height information to deal with the sensitivity of unbalanced sized clusters in FCM. As we know, cluster size sensitivity is an major drawback of FCM, which tends to balance the cluster sizes during iteration, so the center of smaller cluster might be drawn to the adjacent larger one, which will lead to bad classification. To overcome this problem, the cluster height information is considered and introduced to the distance function to adjust the conventional Euclidean distance, thus to control the effect on classification from cluster size difference. Experimental results demonstrate that our algorithm can obtain good clustering results in spite of great size difference, while traditional FCM cannot work well in such case. The improved FCM has shown its potential for extracting small clusters, especially in medical image segmentation.

Gu, Shuguo; Liu, Jingjing; Xie, Qingguo; Wang, Luyao

2012-02-01

6

Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions.  

PubMed

The fuzziness index m has important influence on the clustering result of fuzzy clustering algorithms, and it should not be forced to fix at the usual value m = 2. In view of its distinctive features in applications and its limitation in having m = 2 only, a recent advance of fuzzy clustering called fuzzy c-means clustering with improved fuzzy partitions (IFP-FCM) is extended in this paper, and a generalized algorithm called GIFP-FCM for more effective clustering is proposed. By introducing a novel membership constraint function, a new objective function is constructed, and furthermore, GIFP-FCM clustering is derived. Meanwhile, from the viewpoints of L(p) norm distance measure and competitive learning, the robustness and convergence of the proposed algorithm are analyzed. Furthermore, the classical fuzzy c-means algorithm (FCM) and IFP-FCM can be taken as two special cases of the proposed algorithm. Several experimental results including its application to noisy image texture segmentation are presented to demonstrate its average advantage over FCM and IFP-FCM in both clustering and robustness capabilities. PMID:19174354

Zhu, Lin; Chung, Fu-Lai; Wang, Shitong

2009-06-01

7

Segmentation of M-FISH Images for improved classification of chromosomes with an adaptive fuzzy c-means clustering algorithm  

Microsoft Academic Search

An adaptive fuzzy c-means (AFCM) clustering based algorithm was developed and applied to the segmentation and classification of multi-color fluorescence in situ hybridization (M-FISH) images, which can be used to detect chromosomal abnormalities for cancer and genetic disease diagnosis. The algorithm improves the classical fuzzy cmeans (FCM) clustering algorithm by introducing a gain field, which models and corrects intensity inhomogeneities

Hongbao Cao; Yu-Ping Wang

2011-01-01

8

Sequential Competitive Learning and the Fuzzy c-Means Clustering Algorithms.  

PubMed

Several recent papers have described sequential competitive learning algorithms that are curious hybrids of algorithms used to optimize the fuzzy c-means (FCM) and learning vector quantization (LVQ) models. First, we show that these hybrids do not optimize the FCM functional. Then we show that the gradient descent conditions they use are not necessary conditions for optimization of a sequential version of the FCM functional. We give a numerical example that demonstrates some weaknesses of the sequential scheme proposed by Chung and Lee. And finally, we explain why these algorithms may work at times, by exhibiting the stochastic approximation problem that they unknowingly attempt to solve. Copyright 1996 Published by Elsevier Science Ltd PMID:12662563

Hathaway, Richard J.; Bezdek, James C.; Pal, Nikhil R.

1996-07-01

9

Alternative Adaptive Fuzzy C-Means Clustering  

Microsoft Academic Search

Fuzzy C-Means (FCM) clustering algorithm is used in a variety of application domains. Fundamentally, it cannot be used for the subsequent data (adaptive data). A complete dataset has to be static prior to implementing the algorithm. This paper presents an alternative adaptive FCM which is able to cope with this limitation. The adaptive FCM using Euclidean and Mahalanobis distances were

SOMCHAI CHAMPATHONG; SARTRA WONGTHANAVASU; KHAMRON SUNAT

2006-01-01

10

Segmentation of M-FISH Images for Improved Classification of Chromosomes With an Adaptive Fuzzy C-means Clustering Algorithm  

Microsoft Academic Search

An adaptive fuzzy c-means algorithm was developed and applied to the segmentation and classification of multicolor fluorescence in situ hybridization (M-FISH) images, which can be used to detect chromosomal abnormalities for cancer and genetic disease diagnosis. The algorithm improves the classical fuzzy c-means algorithm (FCM) by the use of a gain field, which models and corrects intensity inhomogeneities caused by

Hongbao Cao; Hong-Wen Deng; Yu-Ping Wang

2012-01-01

11

A Fuzzy-C-means clustering algorithm for a volumetric analysis of paranasal sinus and nasal cavity cancers.  

PubMed

In this paper, a semi-automatic segmentation algorithm for volumetric analysis of paranasal sinus and nasal cavity cancers is presented and validated. The algorithm, based on a semi-supervised Fuzzy-C-means method, was applied to a Magnetic Resonance data sets (each of them composed by T1-weighted, Contrast Enhanced T1-weighted and T2-weighted images) for a total of 64 tumor-contained slices. Method performances are tested by both a numerical and a clinical validation. Results show that the proposed method has a higher accuracy in quantifying lesion area than a region growing algorithm and it can be applied in the evaluation of tumor response to therapy. PMID:17945753

Passera, K; Potepan, P; Setti, E; Vergnaghi, D; Sarti, A; Mainardi, L; Cerutti, S

2006-01-01

12

Cluster Validity for the Fuzzy c-Means Clustering Algorithrm.  

PubMed

The uniform data function is a function which assigns to the output of the fuzzy c-means (Fc-M) or fuzzy isodata algorithm a number which measures the quality or validity of the clustering produced by the algorithm. For the preselected number of cluster c, the Fc-M algorithm produces c vectors in the space in which the data lie, called cluster centers, which represent points about which the data are concentrated. It also produces for each data point c-membership values, numbers between zero and one which measure the similarity of the data points to each of the cluster centers. It is these membership values which indicate how the point is classified. They also indicate how well the point has been classified, in that values close to one indicate that the point is close to a particular center, but uniformly low memberships indicate that the point has not been classified clearly. The uniform data functional (UDF) combines the memberships in such a way as to indicate how well the data have been classified and is computed as follows. For each data point compute the ratio of its smallest membership to its largest and then compute the probability that one could obtain a smaller ratio (indicating better classification) from a clustering of a standard data set in which there is no cluster structure. These probabilities are then averaged over the data set to obtain the values of the UDF. PMID:21869049

Windham, M P

1982-04-01

13

Computerized segmentation and characterization of breast lesions in dynamic contrast-enhanced MR images using fuzzy c-means clustering and snake algorithm.  

PubMed

This paper presents a novel two-step approach that incorporates fuzzy c-means (FCMs) clustering and gradient vector flow (GVF) snake algorithm for lesions contour segmentation on breast magnetic resonance imaging (BMRI). Manual delineation of the lesions by expert MR radiologists was taken as a reference standard in evaluating the computerized segmentation approach. The proposed algorithm was also compared with the FCMs clustering based method. With a database of 60 mass-like lesions (22 benign and 38 malignant cases), the proposed method demonstrated sufficiently good segmentation performance. The morphological and texture features were extracted and used to classify the benign and malignant lesions based on the proposed computerized segmentation contour and radiologists' delineation, respectively. Features extracted by the computerized characterization method were employed to differentiate the lesions with an area under the receiver-operating characteristic curve (AUC) of 0.968, in comparison with an AUC of 0.914 based on the features extracted from radiologists' delineation. The proposed method in current study can assist radiologists to delineate and characterize BMRI lesion, such as quantifying morphological and texture features and improving the objectivity and efficiency of BMRI interpretation with a certain clinical value. PMID:22952558

Pang, Yachun; Li, Li; Hu, Wenyong; Peng, Yanxia; Liu, Lizhi; Shao, Yuanzhi

2012-01-01

14

Computerized Segmentation and Characterization of Breast Lesions in Dynamic Contrast-Enhanced MR Images Using Fuzzy c-Means Clustering and Snake Algorithm  

PubMed Central

This paper presents a novel two-step approach that incorporates fuzzy c-means (FCMs) clustering and gradient vector flow (GVF) snake algorithm for lesions contour segmentation on breast magnetic resonance imaging (BMRI). Manual delineation of the lesions by expert MR radiologists was taken as a reference standard in evaluating the computerized segmentation approach. The proposed algorithm was also compared with the FCMs clustering based method. With a database of 60 mass-like lesions (22 benign and 38 malignant cases), the proposed method demonstrated sufficiently good segmentation performance. The morphological and texture features were extracted and used to classify the benign and malignant lesions based on the proposed computerized segmentation contour and radiologists' delineation, respectively. Features extracted by the computerized characterization method were employed to differentiate the lesions with an area under the receiver-operating characteristic curve (AUC) of 0.968, in comparison with an AUC of 0.914 based on the features extracted from radiologists' delineation. The proposed method in current study can assist radiologists to delineate and characterize BMRI lesion, such as quantifying morphological and texture features and improving the objectivity and efficiency of BMRI interpretation with a certain clinical value.

Pang, Yachun; Li, Li; Hu, Wenyong; Peng, Yanxia; Liu, Lizhi; Shao, Yuanzhi

2012-01-01

15

On cluster validity for the fuzzy c-means model  

Microsoft Academic Search

Many functionals have been proposed for validation of partitions of object data produced by the fuzzy c-means (FCM) clustering algorithm. We examine the role a subtle but important parameter-the weighting exponent m of the FCM model-plays in determining the validity of FCM partitions. The functionals considered are the partition coefficient and entropy indexes of Bezdek, the Xie-Beni (1991), and extended

N. R. Pal; J. C. Bezdek

1995-01-01

16

Fuzzy c-means clustering with prior biological knowledge  

PubMed Central

We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.

Tari, Luis; Baral, Chitta; Kim, Seungchan

2009-01-01

17

Fuzzy c-means clustering with prior biological knowledge.  

PubMed

We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/. PMID:18595779

Tari, Luis; Baral, Chitta; Kim, Seungchan

2009-02-01

18

Efficient inhomogeneity compensation using fuzzy c-means clustering models.  

PubMed

Intensity inhomogeneity or intensity non-uniformity (INU) is an undesired phenomenon that represents the main obstacle for magnetic resonance (MR) image segmentation and registration methods. Various techniques have been proposed to eliminate or compensate the INU, most of which are embedded into classification or clustering algorithms, they generally have difficulties when INU reaches high amplitudes and usually suffer from high computational load. This study reformulates the design of c-means clustering based INU compensation techniques by identifying and separating those globally working computationally costly operations that can be applied to gray intensity levels instead of individual pixels. The theoretical assumptions are demonstrated using the fuzzy c-means algorithm, but the proposed modification is compatible with a various range of c-means clustering based INU compensation and MR image segmentation algorithms. Experiments carried out using synthetic phantoms and real MR images indicate that the proposed approach produces practically the same segmentation accuracy as the conventional formulation, but 20-30 times faster. PMID:22405524

Szilágyi, László; Szilágyi, Sándor M; Benyó, Balázs

2012-10-01

19

Convergence and Consistency of Fuzzy c-means/ISODATA Algorithms.  

PubMed

The fuzzy c-means/ISODATA algorithm is usually described in terms of clustering a finite data set. An equivalent point of view is that the algorithm clusters the support points of a finite-support probability distribution. Motivated by recent work on the hard version of the algorithm, this paper extends the definition to arbitrary distributions and considers asymptotic properties. It is shown that fixed points of the algorithm are stationary points of the fuzzy objective functional, and vice versa. When the algorithm is iteratively applied to an initial prototype set, the sequence of prototype sets produced approaches the set of fixed points. If an unknown distribution is approximated by the empirical distribution of stationary, ergodic observations, then as the number of observations grows large, fixed points of the algorithm based on the empirical distribution approach fixed points of the algorithm based on the true distribution. Furthermore, with respect to minimizing the fuzzy objective functional, the algorithm based on the empirical distribution is asymptotically at least as good as the algorithm based on the true distribution. PMID:21869424

Sabin, M J

1987-05-01

20

Fuzzy C-Mean Algorithm Based on Mahalanobis Distance and New Separable Criterion  

Microsoft Academic Search

The well known fuzzy partition clustering algorithms are most based on Euclidean distance function, which can only be used to detect spherical structural clusters. Gustafson-Kessel (GK) clustering algorithm and Gath-Geva (GG) clustering algorithm, were developed to detect non-spherical structural clusters, but both of them based on semi-supervised Mahalanobis distance needed additional prior information. An improved fuzzy C-mean algorithm based on

Hsiang-Chuan Liu; Jeng-Ming Yih; Der-Bang Wu; Chin-Chun Chen

2007-01-01

21

Fuzzy c-means clustering of incomplete data.  

PubMed

The problem of clustering a real s-dimensional data set X={x(1 ),,,,,x(n)} subset R(s) is considered. Usually, each observation (or datum) consists of numerical values for all s features (such as height, length, etc.), but sometimes data sets can contain vectors that are missing one or more of the feature values. For example, a particular datum x(k) might be incomplete, having the form x(k)=(254.3, ?, 333.2, 47.45, ?)(T), where the second and fifth feature values are missing. The fuzzy c-means (FCM) algorithm is a useful tool for clustering real s-dimensional data, but it is not directly applicable to the case of incomplete data. Four strategies for doing FCM clustering of incomplete data sets are given, three of which involve modified versions of the FCM algorithm. Numerical convergence properties of the new algorithms are discussed, and all approaches are tested using real and artificially generated incomplete data sets. PMID:18244838

Hathaway, R J; Bezdek, J C

2001-01-01

22

Fuzzy C-mean algorithm based on “complete” Mahalanobis distances  

Microsoft Academic Search

The well known fuzzy partition clustering algorithms are most based on Euclidean distance function, which can only be used to detect spherical structural clusters. Gustafson-Kessel (GK) clustering algorithm and Gath-Geva (GG) clustering algorithm, were developed to detect non-spherical structural clusters, but both of them based on semi-supervised Mahalanobis distance, these two algorithms fail to consider the relationships between cluster centers

Hsiang-Chuan Liu; Jeng-Ming Yih; Der-Bang Wu; Shin-Wu Liu

2008-01-01

23

Field-scale prediction of soil moisture patterns by means of a fuzzy c-means clustering algorithm, digital elevation data, and sparse TDR measurements  

NASA Astrophysics Data System (ADS)

Soil moisture is a key variable of the hydrological cycle. For example, it controls partitioning of rainfall into a runoff and an infiltration component and modulating physical, chemical and biological processes within the soil. For a better understanding of these processes, knowledge about the spatio-temporal distribution of soil moisture is indispensable. For the field to the small catchment scale with survey areas up to a few square kilometres, there are numerous new and innovative ground-based and remote sensing technologies available which have great potential to provide temporal information about soil moisture patterns. The aim of this work is to design an optimal soil moisture monitoring program for a low-mountain catchment in central Germany. In a first step, the fuzzy c-means clustering technique (Paasche et al., 2006) was used to identify structure-relevant patterns in a set of different terrain attributes derived from a DEM. Based on these patterns optimal measurement locations were identified to conduct in-situ soil moisture measurements. To consider different wetting and drying states in the catchment, several TDR measurement campaigns were conducted from April to October 2013. The TDR measurements have been integrated with the structure-relevant patterns obtained by the fuzzy cluster analysis to regionally predict soil moisture. In this study, we outline the conceptual framework of this integrative approach and present first results from field measurements. The results of the project are expected to improve the monitoring and understanding of small catchment-scale hydrological processes and to contribute to a better representation of soil moisture dynamics in physically-based, hydrological models operating at the field to the small catchment scale. Reference: Paasche, H., J. Tronicke, K. Holliger, A.G. Green, and H. Maurer (2006): Integration of diverse physical-property models: Subsurface zonation and petrophysical parameter estimation based on fuzzy c-means cluster analyses. Geophysics 71(3), H33-H44, doi:10.1190/1.2192927.

Schröter, Ingmar; Paasche, Hendik; Dietrich, Peter; Wollschläger, Ute

2014-05-01

24

Gaussian Mixture PDF Approximation and Fuzzy c-Means Clustering with Entropy Regularization  

Microsoft Academic Search

EM algorithm is a popular density estimation method that uses the likelihood function as the mea- sure of t. Fuzzy c-Means (FCM) is also a popular clustering algorithm by the distance-based objective function methods. This paper discusses about simi- larity between the Gaussian mixture model with EM algorithm and the FCM based on the Mahalanobis dis- tance with the entropy

Hidetomo ICHIHASHI; Katsuhiro HONDA; Naoki TANI

2000-01-01

25

Rough set based generalized fuzzy c-means algorithm and quantitative indices.  

PubMed

A generalized hybrid unsupervised learning algorithm, which is termed as rough-fuzzy possibilistic c-means (RFPCM), is proposed in this paper. It comprises a judicious integration of the principles of rough and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in class definition, the membership function of fuzzy sets enables efficient handling of overlapping partitions. It incorporates both probabilistic and possibilistic memberships simultaneously to avoid the problems of noise sensitivity of fuzzy c-means and the coincident clusters of PCM. The concept of crisp lower bound and fuzzy boundary of a class, which is introduced in the RFPCM, enables efficient selection of cluster prototypes. The algorithm is generalized in the sense that all existing variants of c-means algorithms can be derived from the proposed algorithm as a special case. Several quantitative indices are introduced based on rough sets for the evaluation of performance of the proposed c-means algorithm. The effectiveness of the algorithm, along with a comparison with other algorithms, has been demonstrated both qualitatively and quantitatively on a set of real-life data sets. PMID:18179071

Maji, Pradipta; Pal, Sankar K

2007-12-01

26

Pulse shape classification in liquid scintillators using the fuzzy c-means algorithm  

Microsoft Academic Search

A new approach to pulse shape classification for neutron detectors of type BC501A has been investigated. The method is based on the fuzzy c-means (FCM) algorithm, which allows finding clusters of similar shapes in a set of digitized detector pulses. The aim of the method is to provide principal pulse shapes, which further can be used to apply a pulse

D. Savran; B. Löher; M. Miklavec; M. Vencelj

2010-01-01

27

Identification of overlapping community structure in complex networks using fuzzy c -means clustering  

Microsoft Academic Search

Identification of (overlapping) communities\\/clusters in a complex network is a general problem in data mining of network data sets. In this paper, we devise a novel algorithm to identify overlapping communities in complex networks by the combination of a new modularity function based on generalizing NG's Q function, an approximation mapping of network nodes into Euclidean space and fuzzy c-means

Shihua Zhang; Rui-Sheng Wang; Xiang-Sun Zhang

2007-01-01

28

Integration and generalization of LVQ and c-means clustering (Invited Paper)  

NASA Astrophysics Data System (ADS)

This paper discusses the relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. We also discuss the impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often lends itself to clustering algorithms. Then we present two generalizations of LVQ that are explicitly designed as clustering algorithms: we refer to these algorithms as generalized LVQ equals GLVQ; and fuzzy LVQ equals FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. We use Anderson's IRIS data to compare the performance of GLVQ/FLVQ with a standard version of LVQ. Experiments show that the final centroids produced by GLVQ are independent of node initialization and learning coefficients. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution--these are taken care of automatically.

Bezdek, James C.

1992-11-01

29

Particle swarm optimization of kernel-based fuzzy c-means for hyperspectral data clustering  

NASA Astrophysics Data System (ADS)

Hyperspectral data classification using supervised approaches, in general, and the statistical algorithms, in particular, need high quantity and quality training data. However, these limitations, and the high dimensionality of these data, are the most important problems for using the supervised algorithms. As a solution, unsupervised or clustering algorithms can be considered to overcome these problems. One of the emerging clustering algorithms that can be used for this purpose is the kernel-based fuzzy c-means (KFCM), which has been developed by kernelizing the FCM algorithm. Nevertheless, there are some parameters that affect the efficiency of KFCM clustering of hyperspectral data. These parameters include kernel parameters, initial cluster centers, and the number of spectral bands. To address these problems, two new algorithms are developed. In these algorithms, the particle swarm optimization method is employed to optimize the KFCM with respect to these parameters. The first algorithm is designed to optimize the KFCM with respect to kernel parameters and initial cluster centers, while the second one selects the optimum discriminative subset of bands and the former parameters as well. The evaluations of the results of experiments show that the proposed algorithms are more efficient than the standard k-means and FCM algorithms for clustering hyperspectral remotely sensed data.

Niazmardi, Saeid; Naeini, Amin Alizadeh; Homayouni, Saeid; Safari, Abdolreza; Samadzadegan, Farhad

2012-01-01

30

Generalized rough fuzzy c-means algorithm for brain MR image segmentation.  

PubMed

Fuzzy sets and rough sets have been widely used in many clustering algorithms for medical image segmentation, and have recently been combined together to better deal with the uncertainty implied in observed image data. Despite of their wide spread applications, traditional hybrid approaches are sensitive to the empirical weighting parameters and random initialization, and hence may produce less accurate results. In this paper, a novel hybrid clustering approach, namely the generalized rough fuzzy c-means (GRFCM) algorithm is proposed for brain MR image segmentation. In this algorithm, each cluster is characterized by three automatically determined rough-fuzzy regions, and accordingly the membership of each pixel is estimated with respect to the region it locates. The importance of each region is balanced by a weighting parameter, and the bias field in MR images is modeled by a linear combination of orthogonal polynomials. The weighting parameter estimation and bias field correction have been incorporated into the iterative clustering process. Our algorithm has been compared to the existing rough c-means and hybrid clustering algorithms in both synthetic and clinical brain MR images. Experimental results demonstrate that the proposed algorithm is more robust to the initialization, noise, and bias field, and can produce more accurate and reliable segmentations. PMID:22088865

Ji, Zexuan; Sun, Quansen; Xia, Yong; Chen, Qiang; Xia, Deshen; Feng, Dagan

2012-11-01

31

A study on fuzzy C-means clustering-based systems in automatic spike detection.  

PubMed

In this study, different systems based on the fuzzy C-means (FCM) clustering algorithm are utilized for the detection of epileptic spikes in electroencephalogram (EEG) records. The systems are constructed as either single or two-stages. In contrast to single-stage systems, the two-stage system comprises a pre-classifier stage realized by a neural network. The FCM based two-stage system is also compared to a similar system implemented using the K-means clustering algorithm. The results imply that an FCM based two-stage system should be preferred as the spike detection system. PMID:17145054

Inan, Z Hilal; Kuntalp, Mehmet

2007-08-01

32

[A soft discretization method of celestial spectrum characteristic line based on fuzzy C-means clustering].  

PubMed

Discretization of continuous numerical attribute is one of the important research works in the preprocessing of celestial spectrum data. For characteristic line of celestial spectrum, a soft discretization algorithm is presented by using improved fuzzy C-means clustering. Firstly, candidate fuzzy clustering centers of characteristic line are chosen by using density values of sample data, so that its anti-noise ability is improved. Secondly, parameters in the fuzzy clustering are dynamically adjusted by taking compatibility of decision table as criteria, so that optimal discretization effect of the characteristic line is achieved. In the end, experimental results effectively validate that the algorithm has higher correct recognition rate of the algorithm by using three SDSS celestial spectrum data sets of high-redshift quasars, late-type star and quasars. PMID:22827108

Zhang, Ji-fu; Li, Xin; Yang, Hai-feng

2012-05-01

33

Fuzzy C-Means Clustering-Based Speaker Verification  

Microsoft Academic Search

In speaker verification, a claimed speaker’s score is computed to accept or reject the speaker claim. Most of the current\\u000a normalisation methods compute the score as the ratio of the claimed speaker’s and the impostors’ likelihood functions. Based\\u000a on analysing false acceptance error occured by the current methods, we propose a fuzzy c-means clusteringbased normalisation\\u000a method to find a better

Dat Tran; Michael Wagner

2002-01-01

34

Adaptive multi-cluster fuzzy C-means segmentation of breast parenchymal tissue in digital mammography.  

PubMed

The relative fibroglandular tissue content in the breast, commonly referred to as breast density, has been shown to be the most significant risk factor for breast cancer after age. Currently, the most common approaches to quantify density are based on either semi-automated methods or visual assessment, both of which are highly subjective. This work presents a novel multi-class fuzzy c-means (FCM) algorithm for fully-automated identification and quantification of breast density, optimized for the imaging characteristics of digital mammography. The proposed algorithm involves adaptive FCM clustering based on an optimal number of clusters derived by the tissue properties of the specific mammogram, followed by generation of a final segmentation through cluster agglomeration using linear discriminant analysis. When evaluated on 80 bilateral screening digital mammograms, a strong correlation was observed between algorithm-estimated PD% and radiological ground-truth of r=0.83 (p<0.001) and an average Jaccard spatial similarity coefficient of 0.62. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner. PMID:22003744

Keller, Brad; Nathan, Diane; Wang, Yan; Zheng, Yuanjie; Gee, James; Conant, Emily; Kontos, Despina

2011-01-01

35

Application of fuzzy c-means clustering in data analysis of metabolomics.  

PubMed

Fuzzy c-means (FCM) clustering is an unsupervised method derived from fuzzy logic that is suitable for solving multiclass and ambiguous clustering problems. In this study, FCM clustering is applied to cluster metabolomics data. FCM is performed directly on the data matrix to generate a membership matrix which represents the degree of association the samples have with each cluster. The method is parametrized with the number of clusters (C) and the fuzziness coefficient (m), which denotes the degree of fuzziness in the algorithm. Both have been optimized by combining FCM with partial least-squares (PLS) using the membership matrix as the Y matrix in the PLS model. The quality parameters R(2)Y and Q(2) of the PLS model have been used to monitor and optimize C and m. Data of metabolic profiles from three gene types of Escherichia coli were used to demonstrate the method above. Different multivariable analysis methods have been compared. Principal component analysis failed to model the metabolite data, while partial least-squares discriminant analysis yielded results with overfitting. On the basis of the optimized parameters, the FCM was able to reveal main phenotype changes and individual characters of three gene types of E. coli. Coupled with PLS, FCM provides a powerful research tool for metabolomics with improved visualization, accurate classification, and outlier estimation. PMID:19408956

Li, Xiang; Lu, Xin; Tian, Jing; Gao, Peng; Kong, Hongwei; Xu, Guowang

2009-06-01

36

Fuzzy c-means clustering with spatial information for image segmentation.  

PubMed

A conventional FCM algorithm does not fully utilize the spatial information in the image. In this paper, we present a fuzzy c-means (FCM) algorithm that incorporates spatial information into the membership function for clustering. The spatial function is the summation of the membership function in the neighborhood of each pixel under consideration. The advantages of the new method are the following: (1) it yields regions more homogeneous than those of other methods, (2) it reduces the spurious blobs, (3) it removes noisy spots, and (4) it is less sensitive to noise than other techniques. This technique is a powerful method for noisy image segmentation and works for both single and multiple-feature data with spatial information. PMID:16361080

Chuang, Keh-Shih; Tzeng, Hong-Long; Chen, Sharon; Wu, Jay; Chen, Tzong-Jer

2006-01-01

37

A wavelet relational fuzzy C-means algorithm for 2D gel image segmentation.  

PubMed

One of the most famous algorithms that appeared in the area of image segmentation is the Fuzzy C-Means (FCM) algorithm. This algorithm has been used in many applications such as data analysis, pattern recognition, and image segmentation. It has the advantages of producing high quality segmentation compared to the other available algorithms. Many modifications have been made to the algorithm to improve its segmentation quality. The proposed segmentation algorithm in this paper is based on the Fuzzy C-Means algorithm adding the relational fuzzy notion and the wavelet transform to it so as to enhance its performance especially in the area of 2D gel images. Both proposed modifications aim to minimize the oversegmentation error incurred by previous algorithms. The experimental results of comparing both the Fuzzy C-Means (FCM) and the Wavelet Fuzzy C-Means (WFCM) to the proposed algorithm on real 2D gel images acquired from human leukemias, HL-60 cell lines, and fetal alcohol syndrome (FAS) demonstrate the improvement achieved by the proposed algorithm in overcoming the segmentation error. In addition, we investigate the effect of denoising on the three algorithms. This investigation proves that denoising the 2D gel image before segmentation can improve (in most of the cases) the quality of the segmentation. PMID:24174990

Rashwan, Shaheera; Faheem, Mohamed Talaat; Sarhan, Amany; Youssef, Bayumy A B

2013-01-01

38

Query by example video based on fuzzy c-means initialized by fixed clustering center  

NASA Astrophysics Data System (ADS)

Currently, the high complexity of video contents has posed the following major challenges for fast retrieval: (1) efficient similarity measurements, and (2) efficient indexing on the compact representations. A video-retrieval strategy based on fuzzy c-means (FCM) is presented for querying by example. Initially, the query video is segmented and represented by a set of shots, each shot can be represented by a key frame, and then we used video processing techniques to find visual cues to represent the key frame. Next, because the FCM algorithm is sensitive to the initializations, here we initialized the cluster center by the shots of query video so that users could achieve appropriate convergence. After an FCM cluster was initialized by the query video, each shot of query video was considered a benchmark point in the aforesaid cluster, and each shot in the database possessed a class label. The similarity between the shots in the database with the same class label and benchmark point can be transformed into the distance between them. Finally, the similarity between the query video and the video in database was transformed into the number of similar shots. Our experimental results demonstrated the performance of this proposed approach.

Hou, Sujuan; Zhou, Shangbo; Siddique, Muhammad Abubakar

2012-04-01

39

Analysis of active feature selection in optic nerve data using labeled fuzzy C-means clustering  

Microsoft Academic Search

Describes an iterative analysis technique that aids in the process of searching for an optimal set of features for classification, and its application to detection of early glaucoma from optic nerve data in an evolving data acquisition system. The selection and evaluation of features were done using fuzzy C-means clustering and support vector machines. The clustering method was updated using

Jong-Min Park; Hyae-Duk Yae

2002-01-01

40

Fuzzy C-means method with empirical mode decomposition for clustering microarray data.  

PubMed

Microarray techniques have revolutionised genomic research by making it possible to monitor the expression of thousands of genes in parallel. The Fuzzy C-Means (FCM) method is an efficient clustering approach devised for microarray data analysis. However, microarray data contains noise, which would affect clustering results. In this paper, we propose to combine the FCM method with the Empirical Mode Decomposition (EMD) for clustering microarray data to reduce the effect of the noise. The results suggest the clustering structures of denoised microarray data are more reasonable and genes have tighter association with their clusters than those using FCM only. PMID:23777170

Wang, Yan-Fei; Yu, Zu-Guo; Anh, Vo

2013-01-01

41

Generalized fuzzy c-means clustering strategies using Lp norm distances  

Microsoft Academic Search

Fuzzy c-means (FCM) is a useful clustering technique. Modifications of FCM using L1 norm distances increase robustness to outliers. Object and relational data versions of FCM clustering are defined for the more general case where the Lp norm (p⩾1) or semi-norm (0

Richard J. Hathaway; James C. Bezdek; Yingkang Hu

2000-01-01

42

Recognition of olfactory signals based on supervised fuzzy C-means and k-NN algorithms  

Microsoft Academic Search

In this paper we present a novel method for odour recognition based on a supervised fuzzy C-means (SFCM) algorithm and a k-nearest neighbour (k-NN) algorithm. The method is applied to experimental data collected from a sensor array composed of metal oxide sensors (MOSs). The sensors are exposed to odourants and the relative resistance values are used for classification. SFCM selects

Francesco Marcelloni

2001-01-01

43

Image watermarking using a dynamically weighted fuzzy c-means algorithm  

NASA Astrophysics Data System (ADS)

Digital watermarking has received extensive attention as a new method of protecting multimedia content from unauthorized copying. In this paper, we present a nonblind watermarking system using a proposed dynamically weighted fuzzy c-means (DWFCM) technique combined with discrete wavelet transform (DWT), discrete cosine transform (DCT), and singular value decomposition (SVD) techniques for copyright protection. The proposed scheme efficiently selects blocks in which the watermark is embedded using new membership values of DWFCM as the embedding strength. We evaluated the proposed algorithm in terms of robustness against various watermarking attacks and imperceptibility compared to other algorithms [DWT-DCT-based and DCT- fuzzy c-means (FCM)-based algorithms]. Experimental results indicate that the proposed algorithm outperforms other algorithms in terms of robustness against several types of attacks, such as noise addition (Gaussian noise, salt and pepper noise), rotation, Gaussian low-pass filtering, mean filtering, median filtering, Gaussian blur, image sharpening, histogram equalization, and JPEG compression. In addition, the proposed algorithm achieves higher values of peak signal-to-noise ratio (approximately 49 dB) and lower values of measure-singular value decomposition (5.8 to 6.6) than other algorithms.

Kang, Myeongsu; Ho, Linh Tran; Kim, Yongmin; Kim, Cheol Hong; Kim, Jong-Myon

2011-10-01

44

A novel kernelized fuzzy C-means algorithm with application in medical image segmentation.  

PubMed

Image segmentation plays a crucial role in many medical imaging applications. In this paper, we present a novel algorithm for fuzzy segmentation of magnetic resonance imaging (MRI) data. The algorithm is realized by modifying the objective function in the conventional fuzzy C-means (FCM) algorithm using a kernel-induced distance metric and a spatial penalty on the membership functions. Firstly, the original Euclidean distance in the FCM is replaced by a kernel-induced distance, and thus the corresponding algorithm is derived and called as the kernelized fuzzy C-means (KFCM) algorithm, which is shown to be more robust than FCM. Then a spatial penalty is added to the objective function in KFCM to compensate for the intensity inhomogeneities of MR image and to allow the labeling of a pixel to be influenced by its neighbors in the image. The penalty term acts as a regularizer and has a coefficient ranging from zero to one. Experimental results on both synthetic and real MR images show that the proposed algorithms have better performance when noise and other artifacts are present than the standard algorithms. PMID:15350623

Zhang, Dao-Qiang; Chen, Song-Can

2004-09-01

45

Deterring Password Sharing: User Authentication via Fuzzy c-Means Clustering Applied to Keystroke Biometric Data  

Microsoft Academic Search

This work describes a clustering-based system to enhance user authentication by applying fuzzy techniques to biometric data in order to deter password sharing. Fuzzy c-means is used to train personal, per-keyboard profiles based on the keystroke dynamics of users when entering passwords on a keyboard. These profiles use DES encryption taking the actual passwords as key and are read at

Salvador Mandujano; Rogelio Soto

2004-01-01

46

Self-organization and clustering algorithms  

NASA Technical Reports Server (NTRS)

Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.

Bezdek, James C.

1991-01-01

47

[Research on oil atomic spectrometric data semi-supervised fuzzy C-means clustering based on Parzen window].  

PubMed

A Parzen window based semi-supervised fuzzy c-means (PSFCM) clustering algorithm was presented. The initial clustering centers of fuzzy c-means (FCM) were determined with training samples. The membership iteration of FCM was redefined after the membership degrees of testing samples relatively to each state were calculated using Parzen window. Two typical faults of gear box were simulated through the gear box bed in order to acquire the lubricant samples. Concentration of Fe, Si and B, which were the representative elements, was selected as the three-dimensional feature vectors to be analyzed with FCM and PSFCM clustering methods. The clustering results were that the correct ratio of FCM was 48.9%, while that of PSFCM was 97.4% because of integrating with supervised information. Experimental results also indicated that it can reduce the dependence of the experience and lots of faults data to introduce PSFCM into oil atomic spectrometric analysis. It was of great help in improving the wear faults diagnosis ratio. PMID:20939333

Xu, Chao; Zhang, Pei-lin; Ren, Guo-quan; Wu, Ding-hai

2010-08-01

48

A modified fuzzy C-means algorithm for bias field estimation and segmentation of MRI data.  

PubMed

In this paper, we present a novel algorithm for fuzzy segmentation of magnetic resonance imaging (MRI) data and estimation of intensity inhomogeneities using fuzzy logic. MRI intensity inhomogeneities can be attributed to imperfections in the radio-frequency coils or to problems associated with the acquisition sequences. The result is a slowly varying shading artifact over the image that can produce errors with conventional intensity-based classification. Our algorithm is formulated by modifying the objective function of the standard fuzzy c-means (FCM) algorithm to compensate for such inhomogeneities and to allow the labeling of a pixel (voxel) to be influenced by the labels in its immediate neighborhood. The neighborhood effect acts as a regularizer and biases the solution toward piecewise-homogeneous labelings. Such a regularization is useful in segmenting scans corrupted by salt and pepper noise. Experimental results on both synthetic images and MR data are given to demonstrate the effectiveness and efficiency of the proposed algorithm. PMID:11989844

Ahmed, Mohamed N; Yamany, Sameh M; Mohamed, Nevin; Farag, Aly A; Moriarty, Thomas

2002-03-01

49

Recursive fuzzy c-means clustering for recursive fuzzy identification of time-varying processes.  

PubMed

In this paper we propose a new approach to on-line Takagi-Sugeno fuzzy model identification. It combines a recursive fuzzy c-means algorithm and recursive least squares. First the method is derived and than it is tested and compared on a benchmark problem of the Mackey-Glass time series with other established on-line identification methods. We showed that the developed algorithm gives a comparable degree of accuracy to other algorithms. The proposed algorithm can be used in a number of fields, including adaptive nonlinear control, model predictive control, fault detection, diagnostics and robotics. An example of identification based on a real data of the waste-water treatment process is also presented. PMID:21292263

Dovžan, Dejan; Skrjanc, Igor

2011-04-01

50

Fuzzy C-means clustering with local information and kernel metric for image segmentation.  

PubMed

In this paper, we present an improved fuzzy C-means (FCM) algorithm for image segmentation by introducing a tradeoff weighted fuzzy factor and a kernel metric. The tradeoff weighted fuzzy factor depends on the space distance of all neighboring pixels and their gray-level difference simultaneously. By using this factor, the new algorithm can accurately estimate the damping extent of neighboring pixels. In order to further enhance its robustness to noise and outliers, we introduce a kernel distance measure to its objective function. The new algorithm adaptively determines the kernel parameter by using a fast bandwidth selection rule based on the distance variance of all data points in the collection. Furthermore, the tradeoff weighted fuzzy factor and the kernel distance measure are both parameter free. Experimental results on synthetic and real images show that the new algorithm is effective and efficient, and is relatively independent of this type of noise. PMID:23008257

Gong, Maoguo; Liang, Yan; Shi, Jiao; Ma, Wenping; Ma, Jingjing

2013-02-01

51

Comparison of K-means and fuzzy c-means algorithm performance for automated determination of the arterial input function.  

PubMed

The arterial input function (AIF) plays a crucial role in the quantification of cerebral perfusion parameters. The traditional method for AIF detection is based on manual operation, which is time-consuming and subjective. Two automatic methods have been reported that are based on two frequently used clustering algorithms: fuzzy c-means (FCM) and K-means. However, it is still not clear which is better for AIF detection. Hence, we compared the performance of these two clustering methods using both simulated and clinical data. The results demonstrate that K-means analysis can yield more accurate and robust AIF results, although it takes longer to execute than the FCM method. We consider that this longer execution time is trivial relative to the total time required for image manipulation in a PACS setting, and is acceptable if an ideal AIF is obtained. Therefore, the K-means method is preferable to FCM in AIF detection. PMID:24503700

Yin, Jiandong; Sun, Hongzan; Yang, Jiawen; Guo, Qiyong

2014-01-01

52

A multiple-kernel fuzzy C-means algorithm for image segmentation.  

PubMed

In this paper, a generalized multiple-kernel fuzzy C-means (FCM) (MKFCM) methodology is introduced as a framework for image-segmentation problems. In the framework, aside from the fact that the composite kernels are used in the kernel FCM (KFCM), a linear combination of multiple kernels is proposed and the updating rules for the linear coefficients of the composite kernel are derived as well. The proposed MKFCM algorithm provides us a new flexible vehicle to fuse different pixel information in image-segmentation problems. That is, different pixel information represented by different kernels is combined in the kernel space to produce a new kernel. It is shown that two successful enhanced KFCM-based image-segmentation algorithms are special cases of MKFCM. Several new segmentation algorithms are also derived from the proposed MKFCM framework. Simulations on the segmentation of synthetic and medical images demonstrate the flexibility and advantages of MKFCM-based approaches. PMID:21803693

Chen, Long; Chen, C L Philip; Lu, Mingzhu

2011-10-01

53

Segmentation of pomegranate MR images using spatial fuzzy c-means (SFCM) algorithm  

NASA Astrophysics Data System (ADS)

Segmentation is one of the fundamental issues of image processing and machine vision. It plays a prominent role in a variety of image processing applications. In this paper, one of the most important applications of image processing in MRI segmentation of pomegranate is explored. Pomegranate is a fruit with pharmacological properties such as being anti-viral and anti-cancer. Having a high quality product in hand would be critical factor in its marketing. The internal quality of the product is comprehensively important in the sorting process. The determination of qualitative features cannot be manually made. Therefore, the segmentation of the internal structures of the fruit needs to be performed as accurately as possible in presence of noise. Fuzzy c-means (FCM) algorithm is noise-sensitive and pixels with noise are classified inversely. As a solution, in this paper, the spatial FCM algorithm in pomegranate MR images' segmentation is proposed. The algorithm is performed with setting the spatial neighborhood information in FCM and modification of fuzzy membership function for each class. The segmentation algorithm results on the original and the corrupted Pomegranate MR images by Gaussian, Salt Pepper and Speckle noises show that the SFCM algorithm operates much more significantly than FCM algorithm. Also, after diverse steps of qualitative and quantitative analysis, we have concluded that the SFCM algorithm with 5×5 window size is better than the other windows.

Moradi, Ghobad; Shamsi, Mousa; Sedaaghi, M. H.; Alsharif, M. R.

2011-10-01

54

Using Mahout for Clustering Wikipedia's Latest Articles: A Comparison between K-means and Fuzzy C-means in the Cloud  

Microsoft Academic Search

This paper compares k-means and fuzzy c-means for clustering a noisy realistic and big dataset. We made the comparison using a free cloud computing solution Apache Mahout\\/ Hadoop and Wikipedia's latest articles. In the past the usage of these two algorithms was restricted to small datasets. As so, studies were based on artificial datasets that do not represent a real

Rui M´ximo Esteves; Chunming Rong

2011-01-01

55

Analysis of density based and fuzzy c-means clustering methods on lesion border extraction in dermoscopy images  

PubMed Central

Background Computer-aided segmentation and border detection in dermoscopic images is one of the core components of diagnostic procedures and therapeutic interventions for skin cancer. Automated assessment tools for dermoscopy images have become an important research field mainly because of inter- and intra-observer variations in human interpretation. In this study, we compare two approaches for automatic border detection in dermoscopy images: density based clustering (DBSCAN) and Fuzzy C-Means (FCM) clustering algorithms. In the first approach, if there exists enough density –greater than certain number of points- around a point, then either a new cluster is formed around the point or an existing cluster grows by including the point and its neighbors. In the second approach FCM clustering is used. This approach has the ability to assign one data point into more than one cluster. Results Each approach is examined on a set of 100 dermoscopy images whose manually drawn borders by a dermatologist are used as the ground truth. Error rates; false positives and false negatives along with true positives and true negatives are quantified by comparing results with manually determined borders from a dermatologist. The assessments obtained from both methods are quantitatively analyzed over three accuracy measures: border error, precision, and recall. Conclusion As well as low border error, high precision and recall, visual outcome showed that the DBSCAN effectively delineated targeted lesion, and has bright future; however, the FCM had poor performance especially in border error metric.

2010-01-01

56

Clustering: Algorithms and Applications  

Microsoft Academic Search

In this paper, we describe algorithms that perform fuzzy clustering and feature weighting simultaneously and in an unsupervised manner. These algorithms are conceptually and computationally simple, and learn a different set of feature weights for each identified cluster. The cluster dependent feature weights offer two advantages. First, they guide the clustering process to partition the data into more meaningful clusters.

H. Frigui

2008-01-01

57

Volumetric analysis of liver metastases in computed tomography with the fuzzy C-means algorithm.  

PubMed

Tumor size is often determined from computed tomography (CT) images to assess disease progression. A study was conducted to demonstrate the advantages of the fuzzy C-means (FCM) algorithm for volumetric analysis of colorectal liver metastases in comparison with manual contouring. Intra-and interobserver variability was assessed for manual contouring and the FCM algorithm in a study involving contrast-enhanced helical CT images of 43 hypoattenuating liver lesions from 15 patients with a history of colorectal cancer. Measurement accuracy and interscan variability of the FCM and manual methods were assessed in a phantom study using paraffin pseudotumors. In the clinical imaging study, intra-and interobserver variability was reduced using the FCM algorithm as compared with manual contouring (P = 0.0070 and P = 0.0019, respectively). Accuracy of the measurement of the pseudotumor volume was improved using the FCM method as compared with the manual method (P = 0.047). Interscan variability of the pseudotumor volumes was measured using the FCM method as compared with the manual method (P = 0.04). The FCM algorithm volume was highly correlated with the manual contouring volume (r = 0.9997). Finally, the shorter time spent in calculating tumor volume using the FCM method versus the manual contouring method was marginally statistically significant (P = 0.080). These results suggest that the FCM algorithm has substantial advantages over manual contouring for volumetric measurement of colorectal liver metastases from CT. PMID:16628034

Yim, Peter J; Vora, Amit V; Raghavan, Deepak; Prasad, Ravi; McAullife, Matthew; Ohman-Strickland, Pamela; Nosher, John L

2006-01-01

58

Unsupervised change detection in satellite images using fuzzy c-means clustering and principal component analysis  

NASA Astrophysics Data System (ADS)

Change detection analyze means that according to observations made in different times, the process of defining the change detection occurring in nature or in the state of any objects or the ability of defining the quantity of temporal effects by using multitemporal data sets. There are lots of change detection techniques met in literature. It is possible to group these techniques under two main topics as supervised and unsupervised change detection. In this study, the aim is to define the land cover changes occurring in specific area of Kayseri with unsupervised change detection techniques by using Landsat satellite images belonging to different years which are obtained by the technique of remote sensing. While that process is being made, image differencing method is going to be applied to the images by following the procedure of image enhancement. After that, the method of Principal Component Analysis is going to be applied to the difference image obtained. To determine the areas that have and don't have changes, the image is grouped as two parts by Fuzzy C-Means Clustering method. For achieving these processes, firstly the process of image to image registration is completed. As a result of this, the images are being referred to each other. After that, gray scale difference image obtained is partitioned into 3 × 3 nonoverlapping blocks. With the method of principal component analysis, eigenvector space is gained and from here, principal components are reached. Finally, feature vector space consisting principal component is partitioned into two clusters using Fuzzy C-Means Clustering and after that change detection process has been done.

Kesiko?lu, M. H.; Atasever, Ü. H.; Özkan, C.

2013-10-01

59

BP network identification technology of infrared polarization based on fuzzy c-means clustering  

NASA Astrophysics Data System (ADS)

Infrared detection system is frequently employed on surveillance operations and reconnaissance mission to detect particular targets of interest in both civilian and military communities. By incorporating the polarization of light as supplementary information, the target discrimination performance could be enhanced. So this paper proposed an infrared target identification method which is based on fuzzy theory and neural network with polarization properties of targets. The paper utilizes polarization degree and light intensity to advance the unsupervised KFCM (kernel fuzzy C-Means) clustering method. And establish different material pol1arization properties database. In the built network, the system can feedback output corresponding material types of probability distribution toward any input polarized degree such as 10° 15°, 20°, 25°, 30°. KFCM, which has stronger robustness and accuracy than FCM, introduces kernel idea and gives the noise points and invalid value different but intuitively reasonable weights. Because of differences in characterization of material properties, there will be some conflicts in classification results. And D - S evidence theory was used in the combination of the polarization and intensity information. Related results show KFCM clustering precision and operation rate are higher than that of the FCM clustering method. The artificial neural network method realizes material identification, which reasonable solved the problems of complexity in environmental information of infrared polarization, and improperness of background knowledge and inference rule. This method of polarization identification is fast in speed, good in self-adaption and high in resolution.

Zeng, Haifang; Gu, Guohua; He, Weiji; Chen, Qian; Yang, Wei

2011-06-01

60

T1- and T2-weighted spatially constrained fuzzy c-means clustering for brain MRI segmentation  

NASA Astrophysics Data System (ADS)

The segmentation of brain tissue in magnetic resonance imaging (MRI) plays an important role in clinical analysis and is useful for many applications including studying brain diseases, surgical planning and computer assisted diagnoses. In general, accurate tissue segmentation is a difficult task, not only because of the complicated structure of the brain and the anatomical variability between subjects, but also because of the presence of noise and low tissue contrasts in the MRI images, especially in neonatal brain images. Fuzzy clustering techniques have been widely used in automated image segmentation. However, since the standard fuzzy c-means (FCM) clustering algorithm does not consider any spatial information, it is highly sensitive to noise. In this paper, we present an extension of the FCM algorithm to overcome this drawback, by combining information from both T1-weighted (T1-w) and T2-weighted (T2-w) MRI scans and by incorporating spatial information. This new spatially constrained FCM (SCFCM) clustering algorithm preserves the homogeneity of the regions better than existing FCM techniques, which often have difficulties when tissues have overlapping intensity profiles. The performance of the proposed algorithm is tested on simulated and real adult MR brain images with different noise levels, as well as on neonatal MR brain images with the gestational age of 39 weeks. Experimental quantitative and qualitative segmentation results show that the proposed method is effective and more robust to noise than other FCM-based methods. Also, SCFCM appears as a very promising tool for complex and noisy image segmentation of the neonatal brain.

Despotovi?, Ivana; Goossens, Bart; Vansteenkiste, Ewout; Philips, Wilfried

2010-03-01

61

Brain tissue classification based on DTI using an improved fuzzy C-means algorithm with spatial constraints.  

PubMed

We present an effective method for brain tissue classification based on diffusion tensor imaging (DTI) data. The method accounts for two main DTI segmentation obstacles: random noise and magnetic field inhomogeneities. In the proposed method, DTI parametric maps were used to resolve intensity inhomogeneities of brain tissue segmentation because they could provide complementary information for tissues and define accurate tissue maps. An improved fuzzy c-means with spatial constraints proposal was used to enhance the noise and artifact robustness of DTI segmentation. Fuzzy c-means clustering with spatial constraints (FCM_S) could effectively segment images corrupted by noise, outliers, and other imaging artifacts. Its effectiveness contributes not only to the introduction of fuzziness for belongingness of each pixel but also to the exploitation of spatial contextual information. We proposed an improved FCM_S applied on DTI parametric maps, which explores the mean and covariance of the feature spatial information for automated segmentation of DTI. The experiments on synthetic images and real-world datasets showed that our proposed algorithms, especially with new spatial constraints, were more effective. PMID:23891435

Wen, Ying; He, Lianghua; von Deneen, Karen M; Lu, Yue

2013-11-01

62

Carotid artery image segmentation using modified spatial fuzzy c-means and ensemble clustering.  

PubMed

Disease diagnosis based on ultrasound imaging is popular because of its non-invasive nature. However, ultrasound imaging system produces low quality images due to the presence of spackle noise and wave interferences. This shortcoming requires a considerable effort from experts to diagnose a disease from the carotid artery ultrasound images. Image segmentation is one of the techniques, which can help efficiently in diagnosing a disease from the carotid artery ultrasound images. Most of the pixels in an image are highly correlated. Considering the spatial information of surrounding pixels in the process of image segmentation may further improve the results. When data is highly correlated, one pixel may belong to more than one clusters with different degree of membership. In this paper, we present an image segmentation technique namely improved spatial fuzzy c-means and an ensemble clustering approach for carotid artery ultrasound images to identify the presence of plaque. Spatial, wavelets and gray level co-occurrence matrix (GLCM) features are extracted from carotid artery ultrasound images. Redundant and less important features are removed from the features set using genetic search process. Finally, segmentation process is performed on optimal or reduced features. Ensemble clustering with reduced feature set outperforms with respect to segmentation time as well as clustering accuracy. Intima-media thickness (IMT) is measured from the images segmented by the proposed approach. Based on IMT measured values, Multi-Layer Back-Propagation Neural Networks (MLBPNN) is used to classify the images into normal or abnormal. Experimental results show the learning capability of MLBPNN classifier and validate the effectiveness of our proposed technique. The proposed approach of segmentation and classification of carotid artery ultrasound images seems to be very useful for detection of plaque in carotid artery. PMID:22981822

Hassan, Mehdi; Chaudhry, Asmatullah; Khan, Asifullah; Kim, Jin Young

2012-12-01

63

Dynamic fuzzy c-means (dFCM) clustering and its application to calorimetric data reconstruction in high-energy physics  

NASA Astrophysics Data System (ADS)

In high-energy physics experiments, calorimetric data reconstruction requires a suitable clustering technique in order to obtain accurate information about the shower characteristics such as the position of the shower and energy deposition. Fuzzy clustering techniques have high potential in this regard, as they assign data points to more than one cluster, thereby acting as a tool to distinguish between overlapping clusters. Fuzzy c-means (FCM) is one such clustering technique that can be applied to calorimetric data reconstruction. However, it has a drawback: it cannot easily identify and distinguish clusters that are not uniformly spread. A version of the FCM algorithm called dynamic fuzzy c-means (dFCM) allows clusters to be generated and eliminated as required, with the ability to resolve non-uniformly distributed clusters. Both the FCM and dFCM algorithms have been studied and successfully applied to simulated data of a sampling tungsten-silicon calorimeter. It is seen that the FCM technique works reasonably well, and at the same time, the use of the dFCM technique improves the performance.

Sandhir, Radha Pyari; Muhuri, Sanjib; Nayak, Tapan K.

2012-07-01

64

Fuzzy C-means clustering and principal component analysis of time series from near-infrared imaging of forearm ischemia.  

PubMed

Fuzzy C-means clustering and principal components analysis were used to analyze a temporal series of near-IR images taken of a human forearm during periods of venous outflow restriction and complete forearm ischemia. The principal component eigen-time course analysis provided no useful information and the principal component eigen-image analysis gave results that correlated poorly with anatomical features. The fuzzy C-means clustering analysis, on the other hand, showed distinct regional differences in the hemodynamic response and scattering properties of the tissue, which correlated well with the anatomical features of the forearm. PMID:9475436

Mansfield, J R; Sowa, M G; Scarth, G B; Somorjai, R L; Mantsch, H H

1997-01-01

65

Tissue viability by multispectral near infrared imaging: a fuzzy C-means clustering analysis.  

PubMed

Clinically, skin color, temperature, and capillary perfusion are used to assess tissue viability following microvascular tissue transfer. However, clinical signs that arise as a consequence of poor perfusion become evident only after several hours of compromised perfusion. This study demonstrates the potential usefulness of optical/infrared multispectral imaging in the prognosis of tissue viability immediately post-surgery. Multispectral images of a skin flap model acquired within 1 h of surgical elevation are analyzed in comparison to the final 72-h clinical outcome with a high degree of correlation. Regional changes in tissue perfusion and oxygenation present immediately following surgery are differentiated using fuzzy clustering and image processing algorithms. These methodologies reduce the intersubject variability inherent in infrared imaging methods such that the changes in perfusion are reproducible and clearly distinguishable across all subjects. Clinically, an early prognostic indicator of viability such as this would allow for a more timely intervention following surgery in the event of compromised microvasculature. PMID:10048858

Mansfield, J R; Sowa, M G; Payette, J R; Abdulrauf, B; Stranc, M F; Mantsch, H H

1998-12-01

66

Neutron\\/Gamma Discrimination Utilizing Fuzzy C-Means Clustering of the Signal from the Liquid Scintillator  

Microsoft Academic Search

The fuzzy c-means (FCM) clustering method was applied to the neutron\\/gamma discrimination of the pulses from the liquid scintillator. An experimental setup termed the portable real-time n\\/? discriminator with a BC-501A liquid scintillator detector was used to collect waveforms with a 500 Ms\\/s, 12 bit sampling ADC. The FCM clustering and PGA were applied to the same pulses dataset respectively

Xiaoliang Luo; Guofu Liu; Jun Yang

2010-01-01

67

Automatic breast masses boundary extraction in digital mammography using spatial fuzzy c-means clustering and active contour models  

Microsoft Academic Search

In this paper, we propose a novel approach for the automatic breast boundary segmentation using spatial fuzzy c­ means clustering and active contours models. We will evaluate the performance of the approach on screen film mammographic images digitized by specific scanner devices and full-field digital mammographic images at different spatial and pixel resolutions. Expert radiologists have supplied the reference boundary

Arianna Mencattini; Marcello Salmeri; Paola Casti; Grazia Raguso; Samuela L'Abbate; Loredana Chieppa; Antonietta Ancona; Fabio Mangieri; Maria Luisa Pepe

2011-01-01

68

An improved Kernel-based Fuzzy C-means Algorithm with spatial information for brain MR image segmentation  

Microsoft Academic Search

In this paper, we propose an improved Kernel-based Fuzzy C-means Algorithm (iKFCM) with spatial information to reduce the effect of noise for brain MR image segmentation. We use k-nearest neighbour model and a neighbourhood controlling factor by estimating image contextual constraints to optimize the objective function of conventional KFCM method. Conventional KFCM algorithms classify each pixel in image only by

Rong Xu; Jun Ohya

2010-01-01

69

Survey of clustering algorithms  

Microsoft Academic Search

Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics,

Rui Xu; Donald Wunsch II

2005-01-01

70

Classification of Parkinson's disease using feature weighting method on the basis of fuzzy C-means clustering  

Microsoft Academic Search

This study presents the application of fuzzy c-means (FCM) clustering-based feature weighting (FCMFW) for the detection of Parkinson's disease (PD). In the classification of PD dataset taken from University of California – Irvine machine learning database, practical values of the existing traditional and non-standard measures for distinguishing healthy people from people with PD by detecting dysphonia were applied to the

Kemal Polat

2012-01-01

71

Classification of Parkinson's disease using feature weighting method on the basis of fuzzy C-means clustering  

Microsoft Academic Search

This study presents the application of fuzzy c-means (FCM) clustering-based feature weighting (FCMFW) for the detection of Parkinson's disease (PD). In the classification of PD dataset taken from University of California – Irvine machine learning database, practical values of the existing traditional and non-standard measures for distinguishing healthy people from people with PD by detecting dysphonia were applied to the

Kemal Polat

2011-01-01

72

Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data  

Microsoft Academic Search

BACKGROUND: Using DNA microarrays, we have developed two novel models for tumor classification and target gene prediction. First, gene expression profiles are summarized by optimally selected Self-Organizing Maps (SOMs), followed by tumor sample classification by Fuzzy C-means clustering. Then, the prediction of marker genes is accomplished by either manual feature selection (visualizing the weighted\\/mean SOM component plane) or automatic feature

Junbai Wang; Trond Hellem Bø; Inge Jonassen; Ola Myklebost; Eivind Hovig

2003-01-01

73

Algorithm for Merging Hyperellipsoidal Clusters.  

National Technical Information Service (NTIS)

This report discusses an algorithm for merging hyperellipsoidal clusters. The effective merging radius between two clusters is introduced, and this measure is used to determine the order in which clusters are combined. We continue to merge clusters until ...

P. M. Kelly

1994-01-01

74

Model-free functional MRI analysis using Kohonen clustering neural network and fuzzy C-means.  

PubMed

Conventional model-based or statistical analysis methods for functional MRI (fMRI) suffer from the limitation of the assumed paradigm and biased results. Temporal clustering methods, such as fuzzy clustering, can eliminate these problems but are difficult to find activation occupying a small area, sensitive to noise and initial values, and computationally demanding. To overcome these adversities, a cascade clustering method combining a Kohonen clustering network and fuzzy, means is developed. Receiver operating characteristic (ROC) analysis is used to compare this method with correlation coefficient analysis and t test on a series of testing phantoms. Results shown that this method can efficiently and stably identify the actual functional response with typical signal change to noise ratio, from a small activation area occupying only 0.2% of head size, with phase delay, and from other noise sources such as head motion. With the ability of finding activities of small sizes stably this method can not only identify the functional responses and the active regions more precisely, but also discriminate responses from different signal sources, such as large venous vessels or different types of activation patterns in human studies involving motor cortex activation. Even when the experimental paradigm is unknown in a blind test such that model-based methods are inapplicable, this method can identify the activation patterns and regions correctly. PMID:10695525

Chuang, K H; Chiu, M J; Lin, C C; Chen, J H

1999-12-01

75

Hybrid multiresolution Slantlet transform and fuzzy c-means clustering approach for normal-pathological brain MR image segregation.  

PubMed

The paper presents a new approach for automated segregation of brain MR images, using an improved orthogonal discrete wavelet transform (DWT), known as the Slantlet transform (ST), and a fuzzy c-means (FCM) clustering approach. ST has excellent time-frequency resolution characteristics and these can be achieved with shorter supports for the filter, compared to DWT employed for identical situations. FCM clustering, on the other hand, can provide efficient classification results, if it is implemented for well-processed input feature vectors. Thus, by combining both the ST and the FCM clustering approaches, a hybrid scheme has been developed that can segregate brain MR images. This automated tool when developed can infer whether the input image is that of a normal brain or a pathological brain. The proposed technique has been applied to several benchmark brain MR images and the results reveal excellent accuracy in characterizing human brain MR imaging. PMID:17698397

Maitra, Madhubanti; Chatterjee, Amitava

2008-06-01

76

A modified fuzzy c-means algorithm for bias field estimation and segmentation of MRI data  

Microsoft Academic Search

In this paper, we present a novel algorithm for fuzzy segmentation of magnetic resonance imaging (MRI) data and es- timation of intensity inhomogeneities using fuzzy logic. MRI in- tensity inhomogeneities can be attributed to imperfections in the radio-frequency coils or to problems associated with the acqui- sition sequences. The result is a slowly varying shading artifact over the image that

Mohamed N. Ahmed; Sameh M. Yamany; Nevin Mohamed; Aly A. Farag; Thomas Moriarty

2002-01-01

77

A New Fuzzy Possibility Clustering Algorithms Based on Unsupervised Mahalanobis Distances  

Microsoft Academic Search

The well known Fuzzy Possibility C-Mean algorithm could improve the problems of outlier and noise in fuzzy c-mean, but it was based on Euclidean distance function, which can only be used to detect spherical structural clusters. Gustafson-Kessel clustering algorithm and Gath-Geva clustering algorithm, were developed to detect non-spherical structural clusters, but both of them based on semi-supervised Mahalanobis distance, these

Hsiang-Chuan Liu; Jeng-Ming Yih; Tian-Wei Sheu; Shin-Wu Liu

2007-01-01

78

Robust information gain based fuzzy c-means clustering and classification of carotid artery ultrasound images.  

PubMed

In this paper, a robust method is proposed for segmentation of medical images by exploiting the concept of information gain. Medical images contain inherent noise due to imaging equipment, operating environment and patient movement during image acquisition. A robust medical image segmentation technique is thus inevitable for accurate results in subsequent stages. The clustering technique proposed in this work updates fuzzy membership values and cluster centroids based on information gain computed from the local neighborhood of a pixel. The proposed approach is less sensitive to noise and produces homogeneous clustering. Experiments are performed on medical and non-medical images and results are compared with state of the art segmentation approaches. Analysis of visual and quantitative results verifies that the proposed approach outperforms other techniques both on noisy and noise free images. Furthermore, the proposed technique is used to segment a dataset of 300 real carotid artery ultrasound images. A decision system for plaque detection in the carotid artery is then proposed. Intima media thickness (IMT) is measured from the segmented images produced by the proposed approach. A feature vector based on IMT values is constructed for making decision about the presence of plaque in carotid artery using probabilistic neural network (PNN). The proposed decision system detects plaque in carotid artery images with high accuracy. Finally, effect of the proposed segmentation technique has also been investigated on classification of carotid artery ultrasound images. PMID:24239296

Hassan, Mehdi; Chaudhry, Asmatullah; Khan, Asifullah; Iftikhar, M Aksam

2014-02-01

79

Automatic Exudate Detection from Non-dilated Diabetic Retinopathy Retinal Images Using Fuzzy C-means Clustering  

PubMed Central

Exudates are the primary sign of Diabetic Retinopathy. Early detection can potentially reduce the risk of blindness. An automatic method to detect exudates from low-contrast digital images of retinopathy patients with non-dilated pupils using a Fuzzy C-Means (FCM) clustering is proposed. Contrast enhancement preprocessing is applied before four features, namely intensity, standard deviation on intensity, hue and a number of edge pixels, are extracted to supply as input parameters to coarse segmentation using FCM clustering method. The first result is then fine-tuned with morphological techniques. The detection results are validated by comparing with expert ophthalmologists’ hand-drawn ground-truths. Sensitivity, specificity, positive predictive value (PPV), positive likelihood ratio (PLR) and accuracy are used to evaluate overall performance. It is found that the proposed method detects exudates successfully with sensitivity, specificity, PPV, PLR and accuracy of 87.28%, 99.24%, 42.77%, 224.26 and 99.11%, respectively.

Sopharak, Akara; Uyyanonvara, Bunyarit; Barman, Sarah

2009-01-01

80

A cluster algorithm for graphs  

Microsoft Academic Search

A cluster algorithm for graphs called the emph{Markov Cluster algorithm (MCL~algorithm) is introduced. The algorithm provides basically an interface to an algebraic process defined on stochastic matrices, called the MCL~process. The graphs may be both weighted (with nonnegative weight) and directed. Let~$G$~be such a graph. The MCL~algorithm simulates flow in $G$ by first identifying $G$ in a canonical way with

S. Van Dongen

2000-01-01

81

A relational Fuzzy C-Means algorithm for detecting protein spots in two-dimensional gel images.  

PubMed

Two-dimensional polyacrylamide gel electrophoresis of proteins is a robust and reproducible technique. It is the most widely used separation tool in proteomics. Current efforts in the field are directed at the development of tools for expanding the range of proteins accessible with two-dimensional gels. Proteomics was built around the two-dimensional gel. The idea that multiple proteins can be analyzed in parallel grew from two-dimensional gel maps. Proteomics researchers needed to identify interested protein spots by examining the gel. This is time consuming, labor extensive and error prone. It is desired that the computer can analyze the proteins automatically by first detecting, then quantifying the protein spots in the 2D gel images. This paper focuses on the protein spot detection and segmentation of 2D gel electrophoresis images. We present a new technique for segmentation of 2D gel images using the Fuzzy C-Means (FCM) algorithm and matching spots using the notion of fuzzy relations. Through the experimental results, the new algorithm was found out to detect protein spots more accurately, then the current known algorithms. PMID:20865504

Rashwan, Shaheera; Faheem, Talaat; Sarhan, Amany; Youssef, Bayumy A B

2010-01-01

82

A meteor cluster detection algorithm  

NASA Astrophysics Data System (ADS)

We present an algorithm to identify groups of meteors within all-sky meteor network observations that are clustered in radiant, velocity, and time. These meteor clusters may reveal new minor meteor showers or uncover false negatives for known shower association. Sporadic meteoroid sources and established meteor showers exhibiting spatiotemporal proximity to identified clusters are reported by the algorithm for end-user reference, as well as the orbital similarity of cluster members quantified using the Drummond D-criterion. This algorithm will be integrated into the existing data-processing pipeline at the NASA Meteoroid Environments Office to alert staff in near-real time of clustered meteor events.

Burt, Joshua B.; Moorhead, Althea V.; Cooke, William J.

2014-02-01

83

A method of face recognition based on fuzzy c-means clustering and associated sub-NNs.  

PubMed

The face is a complex multidimensional visual model and developing a computational model for face recognition is difficult. In this paper, we present a method for face recognition based on parallel neural networks. Neural networks (NNs) have been widely used in various fields. However, the computing efficiency decreases rapidly if the scale of the NN increases. In this paper, a new method of face recognition based on fuzzy clustering and parallel NNs is proposed. The face patterns are divided into several small-scale neural networks based on fuzzy clustering and they are combined to obtain the recognition result. In particular, the proposed method achieved a 98.75 % recognition accuracy for 240 patterns of 20 registrants and a 99.58% rejection rate for 240 patterns of 20 nonregistrants. Experimental results show that the performance of our new face-recognition method is better than those of the backpropagation NN (BPNN) system, the hard c-means (HCM) and parallel NNs system, and the pattern-matching system. PMID:17278469

Lu, Jianming; Yuan, Xue; Yahagi, Takashi

2007-01-01

84

Automatic Exudate Detection from Non-dilated Diabetic Retinopathy Retinal Images Using Fuzzy C-means Clustering.  

PubMed

Exudates are the primary sign of Diabetic Retinopathy. Early detection can potentially reduce the risk of blindness. An automatic method to detect exudates from low-contrast digital images of retinopathy patients with non-dilated pupils using a Fuzzy C-Means (FCM) clustering is proposed. Contrast enhancement preprocessing is applied before four features, namely intensity, standard deviation on intensity, hue and a number of edge pixels, are extracted to supply as input parameters to coarse segmentation using FCM clustering method. The first result is then fine-tuned with morphological techniques. The detection results are validated by comparing with expert ophthalmologists' hand-drawn ground-truths. Sensitivity, specificity, positive predictive value (PPV), positive likelihood ratio (PLR) and accuracy are used to evaluate overall performance. It is found that the proposed method detects exudates successfully with sensitivity, specificity, PPV, PLR and accuracy of 87.28%, 99.24%, 42.77%, 224.26 and 99.11%, respectively. PMID:22574005

Sopharak, Akara; Uyyanonvara, Bunyarit; Barman, Sarah

2009-01-01

85

Hybrid Algorithm to Data Clustering  

Microsoft Academic Search

In this research an N-Dimentional clustering algorithm based on ACE algorithm for large datasets is described. Each part of\\u000a the algorithm will be explained and experimental results obtained from apply this algorithm are discussed. The research is\\u000a focused on the fast and accurate clustering using real databases as workspace instead of directly loaded data into memory\\u000a since this is very

Miguel Gil; Alberto Ochoa; Antonio Zamarrón; Juan Carpio

2009-01-01

86

Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data  

PubMed Central

Background Using DNA microarrays, we have developed two novel models for tumor classification and target gene prediction. First, gene expression profiles are summarized by optimally selected Self-Organizing Maps (SOMs), followed by tumor sample classification by Fuzzy C-means clustering. Then, the prediction of marker genes is accomplished by either manual feature selection (visualizing the weighted/mean SOM component plane) or automatic feature selection (by pair-wise Fisher's linear discriminant). Results The proposed models were tested on four published datasets: (1) Leukemia (2) Colon cancer (3) Brain tumors and (4) NCI cancer cell lines. The models gave class prediction with markedly reduced error rates compared to other class prediction approaches, and the importance of feature selection on microarray data analysis was also emphasized. Conclusions Our models identify marker genes with predictive potential, often better than other available methods in the literature. The models are potentially useful for medical diagnostics and may reveal some insights into cancer classification. Additionally, we illustrated two limitations in tumor classification from microarray data related to the biology underlying the data, in terms of (1) the class size of data, and (2) the internal structure of classes. These limitations are not specific for the classification models used.

Wang, Junbai; B?, Trond Hellem; Jonassen, Inge; Myklebost, Ola; Hovig, Eivind

2003-01-01

87

Characterization of carotid atherosclerosis based on motion and texture features and clustering using fuzzy c-means.  

PubMed

Analysis of B-mode ultrasound images of the carotid atheromatous plaque includes the estimation of texture from static images and the estimation of motion from image sequences. The combination of these two types of information may be valuable for accurate diagnosis of vascular disease. The purpose of this paper was to study texture and motion patterns of carotid atherosclerosis and select the optimal combination of features that can characterize plaque. B-mode ultrasound images of 10 symptomatic and 9 asymptomatic plaques were interrogated. A total of 99 texture features were estimated using first-order statistics, second-order statistics, Laws texture energy and the fractal dimension. Only five texture features were significantly different between the two groups. In the same subjects, the motion of selected plaque regions was estimated using region tracking and block-matching and expressed through: a/maximal surface velocity (MSV), and b/maximal relative surface velocity (MRSV). MSV and MRSV were significantly lower in asymptomatic plaques suggesting more homogeneous motion patterns. Clustering using fuzzy c-means correctly classified 74% of plaques based on texture features only, and 79% of plaques based on motion features only. Classification performance reached 84% when a combination of motion and texture features was used. PMID:17271957

Stoitsis, J; Golemati, S; Nikita, K S; Nicolaides, A N

2004-01-01

88

Estimation of breast percent density in raw and processed full field digital mammography images via adaptive fuzzy c-means clustering and support vector machine segmentation  

PubMed Central

Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., “FOR PROCESSING”) and vendor postprocessed (i.e., “FOR PRESENTATION”), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r = 0.82, p < 0.001) and processed (r = 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r = 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's ? ? 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies.

Keller, Brad M.; Nathan, Diane L.; Wang, Yan; Zheng, Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina

2012-01-01

89

Estimation of breast percent density in raw and processed full field digital mammography images via adaptive fuzzy c-means clustering and support vector machine segmentation  

SciTech Connect

Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') and vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r= 0.82, p < 0.001) and processed (r= 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r= 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's {kappa}{>=} 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies.

Keller, Brad M.; Nathan, Diane L.; Wang Yan; Zheng Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina [Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Applied Mathematics and Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States)

2012-08-15

90

Cluster update algorithm and recognition  

Microsoft Academic Search

We present a fast and robust cluster update algorithm that is especially efficient in implementing the task of image segmentation using the method of superparamagnetic clustering. We apply it to a Potts model with spin interactions that are are defined by gray-scale differences within the image. Motivated by biological systems, we introduce the concept of neural inhibition to the Potts

C. von Ferber; F. Wörgötter

2000-01-01

91

Matrix-based algorithms for document clustering  

Microsoft Academic Search

The clustering problem is the task of assigning each document in a collection to clusters of similar documents. The clustering process does not begin with pre-specified categories; rather it is the purpose of the clustering algorithm to discover natural categories in the collection of documents that it processes. We assume that the initial data for the clustering algorithms consists of

S. Oliveira; S. C. Seok

92

Clustering of detected changes in satellite imagery using fuzzy c-means algorithm  

Microsoft Academic Search

GeoCDX (Geospatial Change Detection and eXploitation) is an integrated system for detecting change between multi-temporal, high-resolution satellite or airborne images. Overlapping images are organized into 256×256 meter tiles in a global grid system. A tile change score measures the amount of change in the tile which is the aggregation of pixel-level change score. The tiles are initially ranked by these

Ozy Sjahputera; Grant J. Scott; Matthew K. Klaric; Brian C. Claywell; Nicholas J. Hudson; James M. Keller; Curt H. Davis

2010-01-01

93

CLAG: an unsupervised non hierarchical clustering algorithm handling biological data  

PubMed Central

Background Searching for similarities in a set of biological data is intrinsically difficult due to possible data points that should not be clustered, or that should group within several clusters. Under these hypotheses, hierarchical agglomerative clustering is not appropriate. Moreover, if the dataset is not known enough, like often is the case, supervised classification is not appropriate either. Results CLAG (for CLusters AGgregation) is an unsupervised non hierarchical clustering algorithm designed to cluster a large variety of biological data and to provide a clustered matrix and numerical values indicating cluster strength. CLAG clusterizes correlation matrices for residues in protein families, gene-expression and miRNA data related to various cancer types, sets of species described by multidimensional vectors of characters, binary matrices. It does not ask to all data points to cluster and it converges yielding the same result at each run. Its simplicity and speed allows it to run on reasonably large datasets. Conclusions CLAG can be used to investigate the cluster structure present in biological datasets and to identify its underlying graph. It showed to be more informative and accurate than several known clustering methods, as hierarchical agglomerative clustering, k-means, fuzzy c-means, model-based clustering, affinity propagation clustering, and not to suffer of the convergence problem proper to this latter.

2012-01-01

94

DAU StatRefresher: Clustering Algorithms  

NSDL National Science Digital Library

This interactive module helps students to understand the definition of and uses for clustering algorithms. Students will learn to categorize the types of clustering algorithms, to use the minimal spanning tree and the k-means clustering algorithm, and to solve exercise problems using clustering algorithms. Each component has a detailed explanation along with quiz questions. A series of questions is presented at the end to test the students understanding of the lesson's entire concept.

2009-01-22

95

Introduction to Cluster Monte Carlo Algorithms  

Microsoft Academic Search

This chapter provides an introduction to cluster Monte Carlo algorithms for classical statistical-mechanical systems. A brief\\u000a review of the conventional Metropolis algorithm is given, followed by a detailed discussion of the lattice cluster algorithm\\u000a developed by Swendsen and Wang and the single-cluster variant introduced by Wolff. For continuum systems, the geometric cluster\\u000a algorithm of Dress and Krauth is described. It

E. Luijten; Frederick Seitz

2006-01-01

96

On Spectral Clustering: Analysis and an algorithm  

Microsoft Academic Search

Despite many empirical successes of spectral clustering methods|algorithms that cluster points using eigenvectors of matrices derivedfrom the distances between the points|there are several unresolvedissues. First, there is a wide variety of algorithms thatuse the eigenvectors in slightly dierent ways. Second, many ofthese algorithms have no proof that they will actually compute areasonable clustering. In this paper, we present a simple

Andrew Y. Ng; Michael I. Jordan; Yair Weiss

2001-01-01

97

Scaling Clustering Algorithms to Large Databases  

Microsoft Academic Search

Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable clustering framework applicable to a wide class of iterative clustering. We require at most one scan of the database. In this work, the framework is instantiated and numerically justified with the popular K-Means clustering algorithm. The method is

Paul S. Bradley; Usama M. Fayyad; Cory Reina

1998-01-01

98

An Efficient Clustering Algorithm for Irregularly Shaped Clusters  

NASA Astrophysics Data System (ADS)

To detect the natural clusters for irregularly shaped data distribution is a difficult task in pattern recognition. In this study, we propose an efficient clustering algorithm for irregularly shaped clusters based on the advantages of spectral clustering and Affinity Propagation (AP) algorithm. We give a new similarity measure based on neighborhood dispersion analysis. The proposed algorithm is a simple but effective method. The experimental results on several data sets show that the algorithm can detect the natural clusters of input data sets, and the clustering results agree well with that of human judgment.

Tang, Dongming; Zhu, Qingxin; Cao, Yong; Yang, Fan

99

Research on VRP of Optimizing Based on Fuzzy C-Means Clustering and Iga Under Electronic Commerce  

Microsoft Academic Search

The logistic distribution under electronic commerce has the characteristic of dispersive customer positions, large order forms, little batches and many repeated routes. The traditional optimizing vehicle routing problems meet with diversified problems in different extents and are difficult to play their roles. Therefore, according to the particularity of logistic distribution under electronic commerce, the improved two-phase algorithm needs to be

Chun-Yu Ren; Xiao-Bo Wang; Jin-Ying Sun

2006-01-01

100

New unsupervised clustering algorithm for large datasets  

Microsoft Academic Search

A fast and accurate unsupervised clustering algorithm has been developed for clustering very large datasets. Though designed for very large volumes of geospatial data, the algorithm is general enough to be used in a wide variety of domain applications. The number of computations the algorithm requires is ~ O(N), and thus faster than hierarchical algorithms. Unlike the popular K-means heuristic,

William Peter; John Chiochetti; Clare Giardina

2003-01-01

101

Supervisory control of wastewater treatment plants by combining principal component analysis and fuzzy c-means clustering.  

PubMed

In this paper a methodology for integrated multivariate monitoring and control of biological wastewater treatment plants during extreme events is presented. To monitor the process, on-line dynamic principal component analysis (PCA) is performed on the process data to extract the principal components that represent the underlying mechanisms of the process. Fuzzy o-means (FCM) clustering is used to classify the operational state. Performing clustering on scores from PCA solves computational problems as well as increases robustness due to noise attenuation. The class-membership information from FCM is used to derive adequate control set points for the local control loops. The methodology is illustrated by a simulation study of a biological wastewater treatment plant, on which disturbances of various types are imposed. The results show that the methodology can be used to determine and co-ordinate control actions in order to shift the control objective and improve the effluent quality. PMID:11385841

Rosen, C; Yuan, Z

2001-01-01

102

Sampling and clustering algorithm for determining the number of clusters based on the rosette pattern  

NASA Astrophysics Data System (ADS)

Clustering is one of the image-processing methods used in non-destructive testing (NDT). As one of the initializing parameters, most clustering algorithms, like fuzzy C means (FCM), Iterative self-organization data analysis (ISODATA), K-means, and their derivatives, require the number of clusters. This paper proposes an algorithm for clustering the pixels in C-scan images without any initializing parameters. In this state-of-the-art method, an image is sampled based on the rosette pattern and according to the pattern characteristics, and extracted samples are clustered and then the number of clusters is determined. The centroids of the classes are computed by means of a method used to calculate the distribution function. Based on different data sets, the results show that the algorithm improves the clustering capability by 92.93% and 91.93% in comparison with FCM and K-means algorithms, respectively. Moreover, when dealing with high-resolution data sets, the efficiency of the algorithm in terms of cluster detection and run time improves considerably.

Sadr, Ali; Momtaz, Amirkeyvan

2012-01-01

103

Parallelization Of K-Means Clustering Algorithm  

Microsoft Academic Search

Clustering algorithms are used in various appli- cations like image analysis, image classification, performance analysis, financial modeling and so on. The K-Means algorithm provides an efficient way of clustering multi dimensional data into K groups by minimizing the intra cluster distance and maximiz- ing the inter cluster distance(1). One of the main restrictions of the k-means algorithm is that the

Amithash Prasad

104

Classification of some active HIV-1 protease inhibitors and their inactive analogues using some uncorrelated three-dimensional molecular descriptors and a fuzzy c-means algorithm.  

PubMed

A fuzzy c-means algorithm was used to classify some 3D convex hull descriptors computed for 345 active HIV-1 protease inhibitors collected from literature and 437 inactive analogues searched from the MDL/ISIS database. The number of descriptors used to represent each compound was from 4 to 8, and they were uncorrelated using the principal component analysis. These uncorrelated descriptors were then divided into two groups and classified by the fuzzy c-means algorithm. The classification produced a clear-cut switch in membership functions computed for each uncorrelated descriptor at the group boundary. Compounds with nonswitching membership functions computed were treated as outliers, and they were counted for estimating the accuracy of the classification. The averaged accuracy of classification for the active inhibitor set was about 80% which was better than that directly classified by a linear discriminant function on the original 3D convex hull descriptors. The whole classification scheme was also applied to several sets of some conventional descriptors computed for each compound, but the averaged accuracy was around 58%. Further classification using some 3D convex hull descriptors searched from comparing the distribution of these descriptors was performed on a new data set composed of 289 outliers-deducted active inhibitors and 63 outliers identified from the inactive analogues through previous classification. This final classification identified 19 inactive analogues which were similar in structural and topological features to those of some highly active inhibitors classified together with them. PMID:12444748

Lin, Thy-Hou; Wang, Ging-Ming; Hsu, Yao-Hua

2002-01-01

105

Are approximation algorithms for consensus clustering worthwhile?  

Microsoft Academic Search

Consensus clustering has emerged as one of the principal clustering problems in the data mining community. In recent years the theoretical computer science community has generated a number of approximation algorithms for consensus clustering and similar problems. These algorithms run in polynomial time, with performance guaranteed to be at most a certain factor worse than optimal. We investigate the feasibility

Michael Bertolacci; Anthony Wirth

106

Hierarchical Clustering Algorithms for Document Datasets  

Microsoft Academic Search

Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In particular, clustering algorithms that build meaningful hierarchies out of large document collections are ideal tools for their interactive visualization and exploration as they provide data-views that are consistent, predictable,

Ying Zhao; George Karypis; Usama M. Fayyad

2005-01-01

107

Fuzzy c-means cluster analysis of early diagenetic effects on natural remanent magnetisation acquisition in a 1.1 Myr piston core from the Central Mediterranean  

NASA Astrophysics Data System (ADS)

The influence of early diagenesis on the natural remanent magnetisation (NRM) in sediments from the Calabrian ridge (Central Mediterranean) is analysed with the help of fuzzy c-means (FCM) cluster analysis and non-linear mapping (NLM). The sediments are variably coloured: white, beige, purplish, greenish and grey layers occur with occasionally intercalated sapropels. The NRM acquired depends on both depositional conditions and diagenetic processes. To describe these, FCM was performed with ?in, ARM, CaCO 3, Ba, Mn and S as variables. An eight-cluster model was derived with the clusters belonging to two categories: one expressing mainly diagenetic processes, i.e. dissolution and precipitation, and the other expressing mainly depositional conditions. The impact of diagenesis on NRM acquisition is profound and not restricted to the close vicinity of the anoxic sapropelitic layers. As a consequence, the influence of diagenetic processes on the NRM should be thoroughly assessed when selecting samples, e.g. for the determination of the relative palaeointensity of the geomagnetic field. Application of multivariate classification techniques appears to be useful because it links rock magnetic parameters to the geochemical environment. In the present piston core, three short reversed geomagnetic events in the Brunhes chron are preserved and, indeed, occur in clusters expressing no or minor diagenesis. The recording of the Blake event, however, has been prevented by later precipitation of magnetite in the corresponding interval.

Dekkers, M. J.; Langereis, C. G.; Vriend, S. P.; van Santvoort, P. J. M.; de Lange, G. J.

1994-08-01

108

Semi-Supervised Clustering Using Genetic Algorithms  

Microsoft Academic Search

A semi-supervised clustering algorithm is proposed that combines thebenefits of supervised and unsupervised learning methods. Data are segmented\\/clustered using an unsupervised learning technique that is biasedtoward producing segments or clusters as pure as possible in terms ofclass distribution. These clusters can then be used to predict the classof future points. For example in database marketing, the technique canbe used to

Ayhan Demiriz; Kristin Bennett

1999-01-01

109

Semi-Supervised Clustering Using Genetic Algorithms  

Microsoft Academic Search

A semi-supervised clustering algorithm is proposed that combines the benefits of supervised and unsupervised learning methods. Data are seg- mented\\/clustered using an unsupervised learning technique that is biased toward producing segments or clusters as pure as possible in terms of class distribution. These clusters can then be used to predict the class of future points. For example in database marketing,

Ayhan Demiriz; Kristin P. Bennett; Mark J. Embrechts

110

A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering  

Microsoft Academic Search

This paper presents new algorithms (fuzzy e-methods (FCMdd) and fuzzy c trimmed medoids (FCTMdd)) for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total dissimilarity within each cluster is minimized. A comparison of FCMdd with the relational fuzzy c-means algorithm shows that

Raghu Krishnapuram; Anupam Joshi; Liyu Yi

1999-01-01

111

FUZZY C-MEANS WITH VARIABLE COMPACTNESS  

PubMed Central

Fuzzy c-means (FCM) clustering has been extensively studied and widely applied in the tissue classification of biomedical images. Previous enhancements to FCM have accounted for intensity shading, membership smoothness, and variable cluster sizes. In this paper, we introduce a new parameter called “compactness” which captures additional information of the underlying clusters. We then propose a new classification algorithm, FCM with variable compactness (FCMVC), to classify three major tissues in brain MRIs by incorporating the compactness terms into a previously reported improvement to FCM. Experiments on both simulated phantoms and real magnetic resonance brain images show that the new method improves the repeatability of the tissue classification for the same subject with different acquisition protocols.

Roy, Snehashis; Agarwal, Harsh; Carass, Aaron; Bai, Ying; Pham, Dzung L.; Prince, Jerry L.

2009-01-01

112

FUZZY C-MEANS WITH VARIABLE COMPACTNESS.  

PubMed

Fuzzy c-means (FCM) clustering has been extensively studied and widely applied in the tissue classification of biomedical images. Previous enhancements to FCM have accounted for intensity shading, membership smoothness, and variable cluster sizes. In this paper, we introduce a new parameter called "compactness" which captures additional information of the underlying clusters. We then propose a new classification algorithm, FCM with variable compactness (FCMVC), to classify three major tissues in brain MRIs by incorporating the compactness terms into a previously reported improvement to FCM. Experiments on both simulated phantoms and real magnetic resonance brain images show that the new method improves the repeatability of the tissue classification for the same subject with different acquisition protocols. PMID:20126427

Roy, Snehashis; Agarwal, Harsh; Carass, Aaron; Bai, Ying; Pham, Dzung L; Prince, Jerry L

2008-01-01

113

Introduction to Cluster Monte Carlo Algorithms  

NASA Astrophysics Data System (ADS)

This chapter provides an introduction to cluster Monte Carlo algorithms for classical statistical-mechanical systems. A brief review of the conventional Metropolis algorithm is given, followed by a detailed discussion of the lattice cluster algorithm developed by Swendsen and Wang and the single-cluster variant introduced by Wolff. For continuum systems, the geometric cluster algorithm of Dress and Krauth is described. It is shown how their geometric approach can be generalized to incorporate particle interactions beyond hardcore repulsions, thus forging a connection between the lattice and continuum approaches. Several illustrative examples are discussed.

Luijten, E.

114

Treatment response assessment of breast masses on dynamic contrast-enhanced magnetic resonance scans using fuzzy c-means clustering and level set segmentation.  

PubMed

The goal of this study was to develop an automated method to segment breast masses on dynamic contrast-enhanced (DCE) magnetic resonance (MR) scans and to evaluate its potential for estimating tumor volume on pre- and postchemotherapy images and tumor change in response to treatment. A radiologist experienced in interpreting breast MR scans defined a cuboid volume of interest (VOI) enclosing the mass in the MR volume at one time point within the sequence of DCE-MR scans. The corresponding VOIs over the entire time sequence were then automatically extracted. A new 3D VOI representing the local pharmacokinetic activities in the VOI was generated from the 4D VOI sequence by summarizing the temporal intensity enhancement curve of each voxel with its standard deviation. The method then used the fuzzy c-means (FCM) clustering algorithm followed by morphological filtering for initial mass segmentation. The initial segmentation was refined by the 3D level set (LS) method. The velocity field of the LS method was formulated in terms of the mean curvature which guaranteed the smoothness of the surface, the Sobel edge information which attracted the zero LS to the desired mass margin, and the FCM membership function which improved segmentation accuracy. The method was evaluated on 50 DCE-MR scans of 25 patients who underwent neoadjuvant chemotherapy. Each patient had pre- and postchemotherapy DCE-MR scans on a 1.5 T magnet. The in-plane pixel size ranged from 0.546 to 0.703 mm and the slice thickness ranged from 2.5 to 4.5 mm. The flip angle was 15 degrees, repetition time ranged from 5.98 to 6.7 ms, and echo time ranged from 1.2 to 1.3 ms. Computer segmentation was applied to the coronal T1-weighted images. For comparison, the same radiologist who marked the VOI also manually segmented the mass on each slice. The performance of the automated method was quantified using an overlap measure, defined as the ratio of the intersection of the computer and the manual segmentation volumes to the manual segmentation volume. Pre- and postchemotherapy masses had overlap measures of 0.81 +/- 0.13 (mean +/- s.d.) and 0.71 +/- 0.22, respectively. The percentage volume reduction (PVR) estimated by computer and the radiologist were 55.5 +/- 43.0% (mean +/- s.d.) and 57.8 +/- 51.3%, respectively. Paired Student's t test indicated that the difference between the mean PVRs estimated by computer and the radiologist did not reach statistical significance (p = 0.641). The automated mass segmentation method may have the potential to assist physicians in monitoring volume change in breast masses in response to treatment. PMID:19994516

Shi, Jiazheng; Sahiner, Berkman; Chan, Heang-Ping; Paramagul, Chintana; Hadjiiski, Lubomir M; Helvie, Mark; Chenevert, Thomas

2009-11-01

115

Treatment response assessment of breast masses on dynamic contrast-enhanced magnetic resonance scans using fuzzy c-means clustering and level set segmentation  

PubMed Central

The goal of this study was to develop an automated method to segment breast masses on dynamic contrast-enhanced (DCE) magnetic resonance (MR) scans and to evaluate its potential for estimating tumor volume on pre- and postchemotherapy images and tumor change in response to treatment. A radiologist experienced in interpreting breast MR scans defined a cuboid volume of interest (VOI) enclosing the mass in the MR volume at one time point within the sequence of DCE-MR scans. The corresponding VOIs over the entire time sequence were then automatically extracted. A new 3D VOI representing the local pharmacokinetic activities in the VOI was generated from the 4D VOI sequence by summarizing the temporal intensity enhancement curve of each voxel with its standard deviation. The method then used the fuzzy c-means (FCM) clustering algorithm followed by morphological filtering for initial mass segmentation. The initial segmentation was refined by the 3D level set (LS) method. The velocity field of the LS method was formulated in terms of the mean curvature which guaranteed the smoothness of the surface, the Sobel edge information which attracted the zero LS to the desired mass margin, and the FCM membership function which improved segmentation accuracy. The method was evaluated on 50 DCE-MR scans of 25 patients who underwent neoadjuvant chemotherapy. Each patient had pre- and postchemotherapy DCE-MR scans on a 1.5 T magnet. The in-plane pixel size ranged from 0.546 to 0.703 mm and the slice thickness ranged from 2.5 to 4.5 mm. The flip angle was 15°, repetition time ranged from 5.98 to 6.7 ms, and echo time ranged from 1.2 to 1.3 ms. Computer segmentation was applied to the coronal T1-weighted images. For comparison, the same radiologist who marked the VOI also manually segmented the mass on each slice. The performance of the automated method was quantified using an overlap measure, defined as the ratio of the intersection of the computer and the manual segmentation volumes to the manual segmentation volume. Pre- and postchemotherapy masses had overlap measures of 0.81±0.13 (mean±s.d.) and 0.71±0.22, respectively. The percentage volume reduction (PVR) estimated by computer and the radiologist were 55.5±43.0% (mean±s.d.) and 57.8±51.3%, respectively. Paired Student’s t test indicated that the difference between the mean PVRs estimated by computer and the radiologist did not reach statistical significance (p=0.641). The automated mass segmentation method may have the potential to assist physicians in monitoring volume change in breast masses in response to treatment.

Shi, Jiazheng; Sahiner, Berkman; Chan, Heang-Ping; Paramagul, Chintana; Hadjiiski, Lubomir M.; Helvie, Mark; Chenevert, Thomas

2009-01-01

116

Classification of breast mass lesions using model-based analysis of the characteristic kinetic curve derived from fuzzy c-means clustering.  

PubMed

The purpose of this study is to evaluate the diagnostic efficacy of the representative characteristic kinetic curve of dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) extracted by fuzzy c-means (FCM) clustering for the discrimination of benign and malignant breast tumors using a novel computer-aided diagnosis (CAD) system. About the research data set, DCE-MRIs of 132 solid breast masses with definite histopathologic diagnosis (63 benign and 69 malignant) were used in this study. At first, the tumor region was automatically segmented using the region growing method based on the integrated color map formed by the combination of kinetic and area under curve color map. Then, the FCM clustering was used to identify the time-signal curve with the larger initial enhancement inside the segmented region as the representative kinetic curve, and then the parameters of the Tofts pharmacokinetic model for the representative kinetic curve were compared with conventional curve analysis (maximal enhancement, time to peak, uptake rate and washout rate) for each mass. The results were analyzed with a receiver operating characteristic curve and Student's t test to evaluate the classification performance. Accuracy, sensitivity, specificity, positive predictive value and negative predictive value of the combined model-based parameters of the extracted kinetic curve from FCM clustering were 86.36% (114/132), 85.51% (59/69), 87.30% (55/63), 88.06% (59/67) and 84.62% (55/65), better than those from a conventional curve analysis. The A(Z) value was 0.9154 for Tofts model-based parametric features, better than that for conventional curve analysis (0.8673), for discriminating malignant and benign lesions. In conclusion, model-based analysis of the characteristic kinetic curve of breast mass derived from FCM clustering provides effective lesion classification. This approach has potential in the development of a CAD system for DCE breast MRI. PMID:22245697

Chang, Yeun-Chung; Huang, Yan-Hao; Huang, Chiun-Sheng; Chang, Pei-Kang; Chen, Jeon-Hor; Chang, Ruey-Feng

2012-04-01

117

An efficient fractal dimension based clustering algorithm  

NASA Astrophysics Data System (ADS)

Clustering plays an important role in data mining. It helps to reveal intrinsic structure in data sets with little or no prior knowledge. The approaches of clustering have received great attention in recent years. However many published algorithms fail to do well in determining the number of cluster, finding arbitrary shapes of clusters or identifying the presence of noise. In this paper we present an efficient clustering algorithm which employs the theory of grid, density and fractal that can partition points in the same cluster with minimum change of fractal dimension meanwhile maximizing the self-similarity in the clusters. We show via experiments that FDC can quickly deal with multidimensional large data sets, identify the number of clusters, be capable of recognizing clusters of arbitrary shape and furthermore explore some qualitative information from data sets.

Xiong, Xiao; Zhang, Jie; Shi, Qingwei

2007-09-01

118

An object-oriented cluster search algorithm  

SciTech Connect

In this work we describe two object-oriented cluster search algorithms, which can be applied to a network of an arbitrary structure. First algorithm calculates all connected clusters, whereas the second one finds a path with the minimal number of connections. We estimate the complexity of the algorithm and infer that the number of operations has linear growth with respect to the size of the network.

Silin, Dmitry; Patzek, Tad

2003-01-24

119

Y-Means: An Autonomous Clustering Algorithm  

NASA Astrophysics Data System (ADS)

This paper proposes an unsupervised clustering technique for data classification based on the K-means algorithm. The K-means algorithm is well known for its simplicity and low time complexity. However, the algorithm has three main drawbacks: dependency on the initial centroids, dependency on the number of clusters, and degeneracy. Our solution accommodates these three issues, by proposing an approach to automatically detect a semi-optimal number of clusters according to the statistical nature of the data. As a side effect, the method also makes choices of the initial centroid-seeds not critical to the clustering results. The experimental results show the robustness of the Y-means algorithm as well as its good performance against a set of other well known unsupervised clustering techniques. Furthermore, we study the performance of our proposed solution against different distance and outlier-detection functions and recommend the best combinations.

Ghorbani, Ali A.; Onut, Iosif-Viorel

120

Cluster algorithms for anisotropic quantum spin models  

SciTech Connect

We present cluster Monte Carlo algorithms for the XYZ quantum spin models. In the special case of S=1/2, the new algorithm can be viewed as a cluster algorithm for the 8-vertex model. As an example, we study the S=1/2 XY model in two dimensions with a representation in which the quantization axis lies in the easy plane. We find that the numerical autocorrelation time for the cluster algorithm remains of the order of unity and does not show any significant dependence on the temperature, the system size, or the Trotter number. On the other hand, the autocorrelation time for the conventional algorithm strongly depends on these parameters and can be very large. The use of improved estimators for thermodynamic averages further enhances the efficiency of the new algorithms.

Kawashima, Naoki [Los Alamos National Lab., NM (United States)]|[Univ. of Tokyo (Japan)

1996-01-01

121

Error Evaluation for Stemming Algorithms as Clustering Algorithms.  

National Technical Information Service (NTIS)

The report develops mathematical evaluation measures to characterize the effect of known erroneous performance by stemming routines, then generalizes these procedures to other types of clustering algorithms. Various methods are presented for exact or appr...

J. B. Lovins

1969-01-01

122

Optimization of automated segmentation of monkeypox virus-induced lung lesions from normal lung CT images using hard C-means algorithm  

NASA Astrophysics Data System (ADS)

Monkeypox virus is an emerging zoonotic pathogen that results in up to 10% mortality in humans. Knowledge of clinical manifestations and temporal progression of monkeypox disease is limited to data collected from rare outbreaks in remote regions of Central and West Africa. Clinical observations show that monkeypox infection resembles variola infection. Given the limited capability to study monkeypox disease in humans, characterization of the disease in animal models is required. A previous work focused on the identification of inflammatory patterns using PET/CT image modality in two non-human primates previously inoculated with the virus. In this work we extended techniques used in computer-aided detection of lung tumors to identify inflammatory lesions from monkeypox virus infection and their progression using CT images. Accurate estimation of partial volumes of lung lesions via segmentation is difficult because of poor discrimination between blood vessels, diseased regions, and outer structures. We used hard C-means algorithm in conjunction with landmark based registration to estimate the extent of monkeypox virus induced disease before inoculation and after disease progression. Automated estimation is in close agreement with manual segmentation.

Castro, Marcelo A.; Thomasson, David; Avila, Nilo A.; Hufton, Jennifer; Senseney, Justin; Johnson, Reed F.; Dyall, Julie

2013-03-01

123

Improved arterial spin labeling after myocardial infarction in mice using cardiac and respiratory gated Look-Locker imaging with fuzzy C-means clustering  

PubMed Central

Experimental myocardial infarction (MI) in mice is an important disease model in part due to the ability to study genetic manipulations. MRI has been used to assess cardiac structural and functional changes after MI in mice, but changes in myocardial perfusion after acute MI have not previously been examined. Arterial spin labeling (ASL) non-invasively measures perfusion, but is sensitive to respiratory motion and heart rate variability, and is difficult to apply after acute MI in mice. To account for these factors, a cardio-respiratory gated (CRG) ASL sequence using a fuzzy C-means algorithm to retrospectively reconstruct images was developed. Using this method, myocardial perfusion was measured in remote and infarcted regions at 1, 7, 14, and 28 days post-MI. Baseline perfusion was 4.9 ± 0.5 (ml/g·min) and one day post-MI decreased to 0.9 ± 0.8 (ml/g·min) in infarcted myocardium (P<0.05 vs. baseline) while remaining at 5.2 ± 0.8 (ml/g·min) in remote myocardium. During the subsequent 28 days, perfusion in the remote zone remained unchanged, while a partial recovery of perfusion in the infarct zone was seen. This technique, when applied to genetically-engineered mice, will allow for the investigation of the roles of specific genes in myocardial perfusion during infarct healing.

Vandsburger, Moriel H; Janiczek, Robert L; Xu, Yaqin; French, Brent A; Meyer, Craig H; Kramer, Christopher M; Epstein, Frederick H

2010-01-01

124

Improved arterial spin labeling after myocardial infarction in mice using cardiac and respiratory gated look-locker imaging with fuzzy C-means clustering.  

PubMed

Experimental myocardial infarction (MI) in mice is an important disease model, in part due to the ability to study genetic manipulations. MRI has been used to assess cardiac structural and functional changes after MI in mice, but changes in myocardial perfusion after acute MI have not previously been examined. Arterial spin labeling noninvasively measures perfusion but is sensitive to respiratory motion and heart rate variability and is difficult to apply after acute MI in mice. To account for these factors, a cardiorespiratory-gated arterial spin labeling sequence using a fuzzy C-means algorithm to retrospectively reconstruct images was developed. Using this method, myocardial perfusion was measured in remote and infarcted regions at 1, 7, 14, and 28 days post-MI. Baseline perfusion was 4.9 +/- 0.5 mL/g min and 1 day post-MI decreased to 0.9 +/- 0.8 mL/g min in infarcted myocardium (P < 0.05 versus baseline) while remaining at 5.2 +/- 0.8 mL/g min in remote myocardium. During the subsequent 28 days, perfusion in the remote zone remained unchanged, while a partial recovery of perfusion in the infarct zone was seen. This technique, when applied to genetically engineered mice, will allow for the investigation of the roles of specific genes in myocardial perfusion during infarct healing. PMID:20187175

Vandsburger, Moriel H; Janiczek, Robert L; Xu, Yaqin; French, Brent A; Meyer, Craig H; Kramer, Christopher M; Epstein, Frederick H

2010-03-01

125

Bayesian Network Combined Fuzzy C-means Methodology for Turbine Blades Fatigue Performance Evaluation  

Microsoft Academic Search

In this paper, a fatigue performance evaluation model for steam turbine blades based on Bayesian network combined fuzzy c-means algorithm was proposed. Bayesian network was viewed as a classification technique to evaluate fatigue performance. Fuzzy c-means algorithm was applied to perform cluster analysis of fatigue performance values and made them discrete. Low-cycle fatigue tests on certain kind of steam turbine

Jihong Yan; Xingman Xiong; Shifeng Zhu

2010-01-01

126

An algorithm for spatial heirarchy clustering  

NASA Technical Reports Server (NTRS)

A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.

Dejesusparada, N. (principal investigator); Velasco, F. R. D.

1981-01-01

127

Image Segmentation of Thermal Waving Inspection based on Particle Swarm Optimization Fuzzy Clustering Algorithm  

NASA Astrophysics Data System (ADS)

The Fuzzy C-Mean clustering (FCM) algorithm is an effective image segmentation algorithm which combines the clustering of non-supervised and the idea of the blurry aggregate, it is widely applied to image segmentation, but it has many problems, such as great amount of calculation, being sensitive to initial data values and noise in images, and being vulnerable to fall into the shortcoming of local optimization. To conquer the problems of FCM, the algorithm of fuzzy clustering based on Particle Swarm Optimization (PSO) was proposed, this article first uses the PSO algorithm of a powerful global search capability to optimize FCM centers, and then uses this center to partition the images, the speed of the image segmentation was boosted and the segmentation accuracy was improved. The results of the experiments show that the PSO-FCM algorithm can effectively avoid the disadvantage of FCM, boost the speed and get a better image segmentation result.

Guofeng, Jin; Wei, Zhang; Zhengwei, Yang; Zhiyong, Huang; Yuanjia, Song; Dongdong, Wang; Gan, Tian

2012-12-01

128

Genetic Algorithm for Finding Cluster Hierarchies  

Microsoft Academic Search

\\u000a Hierarchical clustering algorithms have been studied extensively in the last years. However, existing approaches for hierarchical\\u000a clustering suffer from several drawbacks. The representation of the results is often hard to interpret even for large datasets.\\u000a Many approaches are not robust to noise objects or overcome these limitation only by difficult parameter settings. As many\\u000a approaches heavily depend on their initialization,

Christian Böhm; Annahita Oswald; Christian Richter; Bianca Wackersreuther; Peter Wackersreuther

129

CLU: A new algorithm for EST clustering  

PubMed Central

Background The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST clustering, the process of grouping of original fragments according to their annotation, similarity to known genomic DNA or each other. Clustered EST data, accumulated in databases such as UniGene, STACK and TIGR Gene Indices have proven to be crucial in research areas from gene discovery to regulation of gene expression. Results We have developed a new nucleotide sequence matching algorithm and its implementation for clustering EST sequences. The program is based on the original CLU match detection algorithm, which has improved performance over the widely used d2_cluster. The CLU algorithm automatically ignores low-complexity regions like poly-tracts and short tandem repeats. Conclusion CLU represents a new generation of EST clustering algorithm with improved performance over current approaches. An early implementation can be applied in small and medium-size projects. The CLU program is available on an open source basis free of charge. It can be downloaded from

Ptitsyn, Andrey; Hide, Winston

2005-01-01

130

A fusion method of Gabor wavelet transform and unsupervised clustering algorithms for tissue edge detection.  

PubMed

This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT) and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM) clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases. PMID:24790590

Ergen, Burhan

2014-01-01

131

A Fusion Method of Gabor Wavelet Transform and Unsupervised Clustering Algorithms for Tissue Edge Detection  

PubMed Central

This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT) and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM) clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases.

Ergen, Burhan

2014-01-01

132

Adaptive K-means clustering algorithm  

NASA Astrophysics Data System (ADS)

Clustering is a fundamental problem for a great variety fields such as pattern recognition, computer vision. A popular technique for clustering is based on K-means. However, it suffers from the four main disadvantages. Firstly, it is slow and scales poorly on the time. Secondly, it is often impractical to expect a user to specify the number of clusters. Thirdly, it may find worse local optima. Lastly, its performance heavily depends on the initial clustering centers. To overcome the above four disadvantages simultaneously, an effectively adaptive K-means clustering algorithm (AKM) is proposed in this paper. The AKM estimates the correct number of clusters and obtains the initial centers by the segmentation of the norm histogram in the linear normed space consisting of the data set, and then performs the local improvement heuristic algorithm for K-means clustering in order to avoid the local optima. Moreover, the kd-tree is used to store the data set for improving the speed. The AKM was tested on the synthetic data sets and the real images. The experimental results demonstrate the AKM outperforms the existing methods.

Chen, Hailin; Wu, Xiuqing; Hu, Junhua

2007-11-01

133

Fuzzy c-means with variable compactness  

Microsoft Academic Search

Fuzzy c-means (FCM) clustering has been extensively studied and widely applied in the tissue classification of biomedical images. Previous enhancements to FCM have accounted for intensity shading, membership smoothness, and variable cluster sizes. In this paper, we introduce a new parameter called \\

Snehashis Roy; Harsh K. Agarwal; Aaron Carass; Ying Bai; Dzung L. Pham; Jerry L. Prince

2008-01-01

134

CURE: An Efficient Clustering Algorithm for Large Databases  

Microsoft Academic Search

Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very frag- ile in the presence of outliers. We propose a new cluster- ing algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes

Sudipto Guha; Rajeev Rastogi; Kyuseok Shim

1998-01-01

135

Clustering algorithms for wireless ad hoc networks  

Microsoft Academic Search

Efficient clustering algorithms play a very important role in the fast connection establishment of ad hoc networks. In this paper, we describe a communication model that is derived directly from that of Bluetooth, an emerging technology for pervasive computing; this technology is expected to play a major role in future personal area network applications. We further propose two new distributed

Lakshmi Ramachandran; Manika Kapoor; Abhinanda Sarkar; Alok Aggarwal

2000-01-01

136

Genetic algorithm optimization of atomic clusters  

SciTech Connect

The authors have been using genetic algorithms to study the structures of atomic clusters and related problems. This is a problem where local minima are easy to locate, but barriers between the many minima are large, and the number of minima prohibit a systematic search. They use a novel mating algorithm that preserves some of the geometrical relationship between atoms, in order to ensure that the resultant structures are likely to inherit the best features of the parent clusters. Using this approach, they have been able to find lower energy structures than had been previously obtained. Most recently, they have been able to turn around the building block idea, using optimized structures from the GA to learn about systematic structural trends. They believe that an effective GA can help provide such heuristic information, and (conversely) that such information can be introduced back into the algorithm to assist in the search process.

Morris, J.R.; Deaven, D.M.; Ho, K.M.; Wang, C.Z.; Pan, B.C.; Wacker, J.G.; Turner, D.E. [Ames Lab., IA (United States)]|[Iowa State Univ., Ames, IA (United States). Dept. of Physics

1996-12-31

137

Classification of posture maintenance data with fuzzy clustering algorithms  

NASA Technical Reports Server (NTRS)

Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various conditions were collected in conjunction with JSC postural control studies using a Tilt-Translation Device (TTD). The University of West Florida proposed applying the Fuzzy C-Means Clustering (FCM) Algorithms to this data with a view towards identifying various states and stages. Data supplied by NASA/JSC were submitted to the FCM algorithms in an attempt to identify and characterize cluster substructure in a mixed ensemble of pre- and post-adaptational TTD data. Following several unsuccessful trials with FCM using a full 11 dimensional data set, a set of two channels (features) were found to enable FCM to separate pre- from post-adaptational TTD data. The main conclusions are that: (1) FCM seems able to separate pre- from post-TTD subject no. 2 on the one trial that was used, but only in certain subintervals of time; and (2) Channels 2 (right rear transducer force) and 8 (hip sway bar) contain better discrimination information than other supersets and combinations of the data that were tried so far.

Bezdek, James C.

1991-01-01

138

A systematic method of adaptive fuzzy logic modeling, using an improved fuzzy c-means clustering algorithm for rule generation  

Microsoft Academic Search

Complex dynamical systems, which are difficult to be mathematically modeled, can be described by a fuzzy model. This paper attempts to improve and to address the problems concerning the systematic fuzzy-logic modeling, by introducing the following concepts: 1) an effective theoretical base method to identify the optimum fuzziness parameter (weighting exponent) m instead of the heuristic selection method mainly reported

Meysar Zeinali; Leila Notash

2005-01-01

139

Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models.  

National Technical Information Service (NTIS)

We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several 'parent' clusters. For each data vector the algorithm makes a discrete decision among these alterna...

E. Mjoisness R. Castano A. Gray

1999-01-01

140

A flocking based algorithm for document clustering analysis  

Microsoft Academic Search

Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm

Xiaohui Cui; Jinzhu Gao; Thomas E. Potok

2006-01-01

141

An evolutionary clustering algorithm for gene expression microarray data analysis  

Microsoft Academic Search

Clustering is concerned with the discovery of in- teresting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms

Patrick C. H. Ma; Keith C. C. Chan; Xin Yao; David K. Y. Chiu

2006-01-01

142

Cluster compression algorithm: A joint clustering/data compression concept  

NASA Technical Reports Server (NTRS)

The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.

Hilbert, E. E.

1977-01-01

143

Strategies for Parallelizing KMeans Data Clustering Algorithm  

Microsoft Academic Search

\\u000a Data Clustering is a descriptive data mining task of finding groups of objects such that the objects in a group will be similar\\u000a (or related) to one another and different from (or unrelated to) the objects in other groups [5]. The motivation behind this\\u000a research paper is to explore KMeans partitioning algorithm in the currently available parallel architecture using parallel

S. Mohanavalli; S. M. Jaisakthi; C. Aravindan

144

Dimensionality Reduction Particle Swarm Algorithm for High Dimensional Clustering  

SciTech Connect

The Particle Swarm Optimization (PSO) clustering algorithm can generate more compact clustering results than the traditional K-means clustering algorithm. However, when clustering high dimensional datasets, the PSO clustering algorithm is notoriously slow because its computation cost increases exponentially with the size of the dataset dimension. Dimensionality reduction techniques offer solutions that both significantly improve the computation time, and yield reasonably accurate clustering results in high dimensional data analysis. In this paper, we introduce research that combines different dimensionality reduction techniques with the PSO clustering algorithm in order to reduce the complexity of high dimensional datasets and speed up the PSO clustering process. We report significant improvements in total runtime. Moreover, the clustering accuracy of the dimensionality reduction PSO clustering algorithm is comparable to the one that uses full dimension space.

Cui, Xiaohui [ORNL; ST Charles, Jesse Lee [ORNL; Potok, Thomas E [ORNL; Beaver, Justin M [ORNL

2008-01-01

145

Efficient fuzzy C-means architecture for image segmentation.  

PubMed

This paper presents a novel VLSI architecture for image segmentation. The architecture is based on the fuzzy c-means algorithm with spatial constraint for reducing the misclassification rate. In the architecture, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. In addition, an efficient pipelined circuit is used for the updating process for accelerating the computational speed. Experimental results show that the the proposed circuit is an effective alternative for real-time image segmentation with low area cost and low misclassification rate. PMID:22163980

Li, Hui-Ya; Hwang, Wen-Jyi; Chang, Chia-Yen

2011-01-01

146

Efficient Fuzzy C-Means Architecture for Image Segmentation  

PubMed Central

This paper presents a novel VLSI architecture for image segmentation. The architecture is based on the fuzzy c-means algorithm with spatial constraint for reducing the misclassification rate. In the architecture, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. In addition, an efficient pipelined circuit is used for the updating process for accelerating the computational speed. Experimental results show that the the proposed circuit is an effective alternative for real-time image segmentation with low area cost and low misclassification rate.

Li, Hui-Ya; Hwang, Wen-Jyi; Chang, Chia-Yen

2011-01-01

147

Kernelized fuzzy c-means method in fast segmentation of demyelination plaques in multiple sclerosis.  

PubMed

Fuzzy c-means method (FCM) is a popular tool for a fuzzy data processing. In the current study, a FCM-based method of fuzzy clustering in a kernel space has been implemented. First, a "kernel trick" is applied to the fuzzy c-means algorithm. Then, the new method is employed for a fast automated segmentation of demyelination plaques in Multiple Sclerosis (MS). The clusters in a Gaussian kernel space are analysed in the histogram context and used during the initial classification of the brain tissue. Received classification masks are then used to detect the region of interest, eliminate false positives and label MS lesions. PMID:18003286

Kawa, Jacek; Pietka, Ewa

2007-01-01

148

An Efficient Hybrid Algorithm for Data Clustering Using Improved Genetic Algorithm and Nelder Mead Simplex Search  

Microsoft Academic Search

Data clustering is a process of putting similar data into groups. A clustering algorithm partitions a data set into several groups such that the similarity within a group is larger than among groups. This paper presents data clustering using improved genetic algorithm (IGA) and the popular Nelder-Mead(NM) Simplex search . To improve the accuracy of data clustering, an improved GA

Suresh Chandra Satapathy; J. V. R. Murthy; P. V. G. D. Prasada Reddy

2007-01-01

149

Comparison of Clustering Algorithms in the Context of Software Evolution  

Microsoft Academic Search

To aid software analysis and maintenance tasks, a num- ber of software clustering algorithms have been proposed to automatically partition a software system into meaning- ful subsystems or clusters. However, it is unknown whether thesealgorithmsproducesimilar meaningfulclusteringsfor similar versions of a real-life software system under contin- ual change and growth. This paper describes a comparativestudy of six software clustering algorithms. We

Jingwei Wu; Ahmed E. Hassan; Richard C. Holt

2005-01-01

150

SEGMENTATION OF RETINAL BLOOD VESSELS USING A NOVEL CLUSTERING ALGORITHM  

Microsoft Academic Search

In this paper, segmentation of blood vessels from colour reti- nal images using a novel clustering algorithm and scale- space features is proposed. The proposed clustering algo- rithm, which we call Nearest Neighbour Clustering Algo- rithm (NNCA), uses the same concept as the K-nearest neigh- bour (KNN) classier with the advantage that the algorithm needs no training set and it

Sameh A. Salem; Nancy M. Salem; Asoke K. Nandi

2006-01-01

151

ROCK: A Robust Clustering Algorithm for Categorical Attributes  

Microsoft Academic Search

Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than points in dierent partitions. In this paper, we study clustering algorithms for data with boolean and

Sudipto Guha; Rajeev Rastogi; Kyuseok Shim

2000-01-01

152

A Heuristically Weighting K-Means algorithm for subspace clustering  

Microsoft Academic Search

Soft subspace clustering algorithms receive wide interests recently, because of their scalable and flexible ability at handling high dimensional sparse data. A disadvantage of those existing algorithms is their clustering results are affected by goodness of initial centroid selected by random initial method greatly. In this paper, we propose a heuristically weighting K-means algorithm and a corresponding initial method for

Boyang Li; Qingshan Jiang; Lifei Chen

2008-01-01

153

A cluster refinement algorithm for motif discovery.  

PubMed

Finding Transcription Factor Binding Sites, i.e., motif discovery, is crucial for understanding the gene regulatory relationship. Motifs are weakly conserved and motif discovery is an NP-hard problem. We propose a new approach called Cluster Refinement Algorithm for Motif Discovery (CRMD). CRMD employs a flexible statistical motif model allowing a variable number of motifs and motif instances. CRMD first uses a novel entropy-based clustering to find complete and good starting candidate motifs from the DNA sequences. CRMD then employs an effective greedy refinement to search for optimal motifs from the candidate motifs. The refinement is fast, and it changes the number of motif instances based on the adaptive thresholds. The performance of CRMD is further enhanced if the problem has one occurrence of motif instance per sequence. Using an appropriate similarity test of motifs, CRMD is also able to find multiple motifs. CRMD has been tested extensively on synthetic and real data sets. The experimental results verify that CRMD usually outperforms four other state-of-the-art algorithms in terms of the qualities of the solutions with competitive computing time. It finds a good balance between finding true motif instances and screening false motif instances, and is robust on problems of various levels of difficulty. PMID:21030733

Li, Gang; Chan, Tak-Ming; Leung, Kwong-Sak; Lee, Kin-Hong

2010-01-01

154

Comparing geometric and kinetic cluster algorithms for molecular simulation data.  

PubMed

The identification of metastable states of a molecule plays an important role in the interpretation of molecular simulation data because the free-energy surface, the relative populations in this landscape, and ultimately also the dynamics of the molecule under study can be described in terms of these states. We compare the results of three different geometric cluster algorithms (neighbor algorithm, K-medoids algorithm, and common-nearest-neighbor algorithm) among each other and to the results of a kinetic cluster algorithm. First, we demonstrate the characteristics of each of the geometric cluster algorithms using five two-dimensional data sets. Second, we analyze the molecular dynamics data of a beta-heptapeptide in methanol--a molecule that exhibits a distinct folded state, a structurally diverse unfolded state, and a fast folding/unfolding equilibrium--using both geometric and kinetic cluster algorithms. We find that geometric clustering strongly depends on the algorithm used and that the density based common-nearest-neighbor algorithm is the most robust of the three geometric cluster algorithms with respect to variations in the input parameters and the distance metric. When comparing the geometric cluster results to the metastable states of the beta-heptapeptide as identified by kinetic clustering, we find that in most cases the folded state is identified correctly but the overlap of geometric clusters with further metastable states is often at best approximate. PMID:20170218

Keller, Bettina; Daura, Xavier; van Gunsteren, Wilfred F

2010-02-21

155

Incremental clustering algorithm based on phrase-semantic similarity histogram  

Microsoft Academic Search

Incremental document clustering is an important key in organizing, searching, and browsing large datasets. Although, many incremental document clustering methods have been proposed, they do not focus on linguistic and semantic properties of the text Incremental clustering algorithms are preferred to traditional clustering techniques with the advent of online publishing in the World Wide Web. In this paper, an incremental

Walaa K. Gad; Mohamed S. Kamel

2010-01-01

156

Evaluation of Hierarchical Clustering Algorithms for Document Datasets.  

National Technical Information Service (NTIS)

Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical cluste...

Y. Zhao G. Karypis

2002-01-01

157

Comparison of Agglomerative and Partitional Document Clustering Algorithms.  

National Technical Information Service (NTIS)

Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters, and in greatly improving the retri...

Y. Zhao G. Karypis

2002-01-01

158

Effective fuzzy c-means based kernel function in segmenting medical images.  

PubMed

The objective of this paper is to develop an effective robust fuzzy c-means for a segmentation of breast and brain magnetic resonance images. The widely used conventional fuzzy c-means for medical image segmentations has limitations because of its squared-norm distance measure to measure the similarity between centers and data objects of medical images which are corrupted by heavy noise, outliers, and other imaging artifacts. To overcome the limitations this paper develops a novel objective function based standard objective function of fuzzy c-means that incorporates the robust kernel-induced distance for clustering the corrupted dataset of breast and brain medical images. By minimizing the novel objective function this paper obtains effective equation for optimal cluster centers and equation to achieve optimal membership grades for partitioning the given dataset. In order to solve the problems of clustering performance affected by initial centers of clusters, this paper introduces a specialized center initialization method for executing the proposed algorithm in segmenting medical images. Experiments are performed with synthetic, real breast and brain images to assess the performance of the proposed method. Further the validity of clustering results is obtained using silhouette method and this paper compares the results with the results of other recent reported fuzzy c-means methods. The experimental results show the superiority of the proposed clustering results. PMID:20444444

Kannan, S R; Ramathilagam, S; Sathya, A; Pandiyarajan, R

2010-06-01

159

Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models  

NASA Technical Reports Server (NTRS)

We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.

Mjoisness, Eric; Castano, Rebecca; Gray, Alexander

1999-01-01

160

Color sorting algorithm based on K-means clustering algorithm  

NASA Astrophysics Data System (ADS)

In the process of raisin production, there were a variety of color impurities, which needs be removed effectively. A new kind of efficient raisin color-sorting algorithm was presented here. First, the technology of image processing basing on the threshold was applied for the image pre-processing, and then the gray-scale distribution characteristic of the raisin image was found. In order to get the chromatic aberration image and reduce some disturbance, we made the flame image subtraction that the target image data minus the background image data. Second, Haar wavelet filter was used to get the smooth image of raisins. According to the different colors and mildew, spots and other external features, the calculation was made to identify the characteristics of their images, to enable them to fully reflect the quality differences between the raisins of different types. After the processing above, the image were analyzed by K-means clustering analysis method, which can achieve the adaptive extraction of the statistic features, in accordance with which, the image data were divided into different categories, thereby the categories of abnormal colors were distinct. By the use of this algorithm, the raisins of abnormal colors and ones with mottles were eliminated. The sorting rate was up to 98.6%, and the ratio of normal raisins to sorted grains was less than one eighth.

Zhang, Baofeng; Huang, Qian

2009-11-01

161

Robust Fuzzy C-Means and Bilateral Point Clouds Denoising  

Microsoft Academic Search

A point clouds denoising method is presented which combines fuzzy c-means clustering with bilateral filtering approach. Surfaces are reconstructed from unorganized point sets with large-scale noise. Firstly, we delete large-scale noise, partly smooth small-scale noise with improved method of fuzzy c-means clustering. The cluster centers are regarded as the new points. After acquiring new point sets being less noisy, we

Lihui Wang; Baozong Yuan; Jing Chen

2006-01-01

162

Document Clustering into an Unknown Number of Clusters Using a Genetic Algorithm  

Microsoft Academic Search

We present a genetic algorithm that deals with document clustering. This algorithm calculates an approximation of the optimum k value, and solves the best grouping of the documents into these k clusters. We have evaluated this algorithm with sets of documents that are the output of a query in a search engine. The experiments show that, most of the times,

Arantza Casillas; Mayte Teresa González De Lena; Raquel Martínez

2003-01-01

163

A Flocking Based algorithm for Document Clustering Analysis  

SciTech Connect

Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL

2006-01-01

164

CSIM: a document clustering algorithm based on swarm intelligence  

Microsoft Academic Search

This paper presents a document clustering algorithm based on swarm intelligence and k-means: CSIM. First, a document clustering algorithm based on swarm intelligence is employed. It is derived from a basic model interpreting ant colony organization of cemeteries. Swarm intelligence for flexibility, self-organization and robustness has been applied in a variety of areas. Taking advantage of these traits, good initial

Wu Bin; Zheng Yi; Liu Shaohui; Shi Zhongzhi

2002-01-01

165

Commonality analysis: A linear cell clustering algorithm for group technology  

Microsoft Academic Search

Numerous researchers have suggested methods for clustering machines into manufacturing cells in a Group Technology environment. Many of these methods are numerically complex. This paper presents a new linear clustering algorithm that is fast, simple and quite flexible. The algorithm is based on the calculation of a commonality score which indicates the similarity in the way two machines are used

JERRY C. WEI; GARY M. KERN

1989-01-01

166

Scaling Clustering Algorithms for Massive Data Sets using Data Streams  

Microsoft Academic Search

Computing data mining algorithms such as clustering techniques on massive geospatial data sets is still not feasi- ble nor efficient today. Massive data sets are continuously produced with a data rate of over several TB\\/day. In the case of compressing such data sets, we demonstrate the ne- cessity for clustering algorithms that are highly scalable with regard to data size

Silvia Nittel; Kelvin T. Leung; Amy Braverman

2004-01-01

167

Parallel Implementation of Strassen's Matrix Multiplication Algorithm for Heterogeneous Clusters  

Microsoft Academic Search

Summary form only given. We propose a new distribution scheme for a parallel Strassen's matrix multiplication algorithm on heterogeneous clusters. In the heterogeneous clustering environment, appropriate data distribution is the most important factor for achieving maximum overall performance. However, Strassen's algorithm reduces the total operation count to about 7\\/8 times per one recursion and, hence, the recursion level has an

Yuhsuke Ohtaki; Daisuke Takahashi; Taisuke Boku; Mitsuhisa Sato

2004-01-01

168

A Local Density Based Spatial Clustering Algorithm with Noise  

Microsoft Academic Search

Density-based clustering algorithms are attractive for the task of class identification in spatial database. However, in many cases, very different local-density clusters exist in different regions of data space, therefore, DBSCAN [Ester, M. et al., A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In E. Simoudis, J. Han, & U. M. Fayyad (Eds.), Proc. 2nd Int.

Lian Duan; Deyi Xiong; Jun Lee; Feng Guo

2006-01-01

169

Grid Clustering Algorithm with Simple Leaping Search Technique  

Microsoft Academic Search

Data mining is a critical data analysis technique for extracting hidden information from large databases for business or industrial applications. As the size of organizational databases increase, finding information and knowledge efficiently is essential. In the past, numerous clustering algorithms based on grid-clustering schemes have been proposed. This study proposes, simple-leaping search (SLS), a new grid-based clustering algorithm that partitions

Cheng-Fa Tsai; Jun-Hao Zhang

2012-01-01

170

Incremental Clustering Algorithm For Earth Science Data Mining  

SciTech Connect

Remote sensing data plays a key role in understanding the complex geographic phenomena. Clustering is a useful tool in discovering interesting patterns and structures within the multivariate geospatial data. One of the key issues in clustering is the specication of appropriate number of clusters, which is not obvious in many practical situations. In this paper we provide an extension of G-means algorithm which automatically learns the number of clusters present in the data and avoids over estimation of the number of clusters. Experimental evaluation on simulated and remotely sensed image data shows the effectiveness of our algorithm.

Vatsavai, Raju [ORNL

2009-01-01

171

Evaluation of hierarchical clustering algorithms for document datasets  

Microsoft Academic Search

Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering solutions provide a view of the data at different levels of granularity, making them ideal for people to visualize and interactively explore large document collections.In this

Ying Zhao; George Karypis

2002-01-01

172

A systematic comparison of genome-scale clustering algorithms  

Microsoft Academic Search

BACKGROUND: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and

Jeremy J Jay; John D Eblen; Yun Zhang; Mikael Benson; Andy D Perkins; Arnold M Saxton; Brynn H Voy; Elissa J Chesler; Michael A Langston

2012-01-01

173

A Systematic Comparison of Genome Scale Clustering Algorithms - (Extended Abstract)  

Microsoft Academic Search

\\u000a A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad array\\u000a of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique\\u000a communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative\\u000a effectiveness provides guidance to algorithm selection, development and implementation.

Jeremy J. Jay; John D. Eblen; Yun Zhang; Mikael Benson; Andy D. Perkins; Arnold M. Saxton; Brynn H. Voy; Elissa J. Chesler; Michael A. Langston

2011-01-01

174

A Fuzzy Clustering and Fuzzy Merging Algorithm  

Microsoft Academic Search

Some major problems in clustering are: i) find the optimal number K of clusters; ii) assess the validity of a given clustering; iii) permit the classes to form natural shapes rather than forcing them into normed balls of the distance function; iv) prevent the order in which the feature vectors are read in from affecting the clustering; and v) prevent

Carl G. Looney

1999-01-01

175

Cuckoo Search Clustering Algorithm: A novel strategy of biomimicry  

Microsoft Academic Search

A novel, nature inspired, unsupervised classification method, based on the most recent metaheuristic algorithm, stirred by the breeding strategy of the parasitic bird, the cuckoo, is introduced in this paper. The proposed Cuckoo Search Clustering Algorithm (CSCA) yields good results on benchmark dataset. Inspired by the results, the proposed algorithm is validated on two real time remote sensing satellite- image

Samiksha Goel; Arpita Sharma; Punam Bedi

2011-01-01

176

Comparison of Agglomerative and partitional document clustering algorithms  

Microsoft Academic Search

Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters, and in greatly improving the retrieval performance either via cluster-driven dimensionality reduction, term-weighting, or query expansion. This ever-increasing importance of document clustering and the expanded range of its applications led

Ying Zhao; George Karypis

2002-01-01

177

Combining multiple clusterings of chemical structures using cluster-based similarity partitioning algorithm.  

PubMed

Many types of clustering techniques for chemical structures have been used in the literature, but it is known that any single method will not always give the best results for all types of applications. Recent work on consensus clustering methods is motivated because of the successes of combining multiple classifiers in many areas and the ability of consensus clustering to improve the robustness, novelty, consistency and stability of individual clusterings. In this paper, the Cluster-based Similarity Partitioning Algorithm (CSPA) was examined for improving the quality of chemical structures clustering. The effectiveness of clustering was evaluated based on the ability to separate active from inactive molecules in each cluster and the results were compared with the Ward's clustering method. The chemical dataset MDL Drug Data Report (MDDR) database was used for experiments. The results, obtained by combining multiple clusterings, showed that the consensus clustering method can improve the robustness, novelty and stability of chemical structures clustering. PMID:24429501

Saeed, Faisal; Salim, Naomie; Abdo, Ammar

2014-01-01

178

A fuzzy clustering algorithm to detect planar and quadric shapes  

NASA Technical Reports Server (NTRS)

In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.

Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa

1992-01-01

179

Empirical Comparison of Fast Clustering Algorithms for Large Data Sets  

Microsoft Academic Search

Several fast algorithms for clustering very large data sets have been proposed in the literature. CLARA is a combination of a sampling procedure and the classical PAM algorithm, while CLARANS adopts a serial randomized search strategy to find the optimal set of medoids. GAC-R3 and GAC-RARw exploit genetic search heuristics for solving clustering problems. In this research, we conducted an

Chih-ping Wei; Yen-hsien Lee; Che-ming Hsu

2000-01-01

180

A Fast Implementation of the ISODATA Clustering Algorithm  

NASA Technical Reports Server (NTRS)

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2005-01-01

181

Implementing Agglomerative Hierarchic Clustering Algorithms for Use in Document Retrieval.  

ERIC Educational Resources Information Center

Describes a computerized information retrieval system that uses three agglomerative hierarchic clustering algorithms--single link, complete link, and group average link--and explains their implementations. It is noted that these implementations have been used to cluster a collection of 12,000 documents. (LRW)

Voorhees, Ellen M.

1986-01-01

182

Web user clustering analysis based on KMeans algorithm  

Microsoft Academic Search

As one of the most important tasks of Web Usage Mining (WUM), web user clustering, which establishes groups of users exhibiting similar browsing patterns, provides useful knowledge to personalized web services. In this paper, we cluster web users with KMeans algorithm based on web user log data. Given a set of web users and their associated historical web usage data,

JinHua Xu; Hong Liu

2010-01-01

183

Automatic Clustering Using an Improved Differential Evolution Algorithm  

Microsoft Academic Search

Differential evolution (DE) has emerged as one of the fast, robust, and efficient global search heuristics of current interest. This paper describes an application of DE to the au- tomatic clustering of large unlabeled data sets. In contrast to most of the existing clustering techniques, the proposed algorithm requires no prior knowledge of the data to be classified. Rather, it

Swagatam Das; Ajith Abraham; Amit Konar

2008-01-01

184

An Efficient Hybrid Evolutionary Algorithm for Cluster Analysis  

Microsoft Academic Search

Clustering problems appear in a wide range of unsupervised classification applications such as pattern recognition, vector quantization, data mining and knowledge discovery. The k-means algorithm is one of the most widely used clustering techniques. Unfortunately, k-means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite

Taher Niknam; Bahman Bahmani Firouzi; Majid Nayeripour

2008-01-01

185

A study of clustering applied to multiple target tracking algorithm  

Microsoft Academic Search

In this paper the effectiveness of two Data Association algorithms for Multiple Target Tracking (MTT) based on Global Nearest Neighbor approach are compared. As the time for assignment problem solution increases nonlinearly depending on the problem size, it is useful to divide the whole scenario on small groups of targets called clusters. For each cluster the assignment problem is solved

Pavlina Konstantinova; Milen Nikolov; Tzvetan Semerdjiev

2004-01-01

186

Fusion and clustering algorithms for spatial data  

Microsoft Academic Search

Spatial clustering is an approach for discovering groups of related data points in spatial data. Spatial clustering has attracted a lot of research attention due to various applications where it is needed. It holds practical importance in application domains such as geographic knowledge discovery, sensors, rare disease discovery, astronomy, remote sensing, and so on. The motivation for this work stems

Pavani Kuntala

2006-01-01

187

A novel clustering approach: Artificial Bee Colony (ABC) algorithm  

Microsoft Academic Search

Artificial Bee Colony (ABC) algorithm which is one of the most recently introduced optimization algorithms, simulates the intelligent foraging behavior of a honey bee swarm. Clustering analysis, used in many disciplines and applications, is an important tool and a descriptive task seeking to identify homogeneous groups of objects based on the values of their attributes. In this work, ABC is

Dervis Karaboga; Celal Ozturk

2011-01-01

188

Clustering of Hadronic Showers with a Structural Algorithm  

SciTech Connect

The internal structure of hadronic showers can be resolved in a high-granularity calorimeter. This structure is described in terms of simple components and an algorithm for reconstruction of hadronic clusters using these components is presented. Results from applying this algorithm to simulated hadronic Z-pole events in the SiD concept are discussed.

Charles, M.J.; /SLAC

2005-12-13

189

Measuring Constraint-Set Utility for Partitional Clustering Algorithms  

NASA Technical Reports Server (NTRS)

Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.

Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato

2006-01-01

190

[Multispectral image compression algorithm based on clustering and wavelet transform].  

PubMed

Aiming at the problem of high time-space complexity and inadequate usage of spectral characteristics of existing multispectral image compression algorithms, an inter-spectrum sparse equivalent representation of multispectral image and its clustering realization ways were studied. Meanwhile, a new multispectral image compression algorithm based on spectral adaptive clustering and wavelet transform was designed. The affinity propagation clustering was utilized to generate inter-spectrum sparse equivalent representation which can remove inter-spectrum redundancy under low complexity, two-dimensional wavelet transform was used to remove spatial redundancy, and set partitioning in hierarchical trees (SPIHT) was used to encode. The quality of reconstruction images was improved by error compensation mechanism. Experimental results show that the proposed approach achieves good performance in time-space complexity, the peak signal-to-noise ratio(PSNR) is significantly higher than that of similar compression algorithms under the same compression ratio, and it is a generic and effective algorithm. PMID:24409728

Liang, Wei; Zeng, Ping; Zhang, Hua; Luo, Xue-Mei

2013-10-01

191

K-Distributions: A New Algorithm for Clustering Categorical Data  

NASA Astrophysics Data System (ADS)

Clustering is one of the most important tasks in data mining. The K-means algorithm is the most popular one for achieving this task because of its efficiency. However, it works only on numeric values although data sets in data mining often contain categorical values. Responding to this fact, the K-modes algorithm is presented to extend the K-means algorithm to categorical domains. Unfortunately, it suffers from computing the dissimilarity between each pair of objects and the mode of each cluster. Aiming at addressing these problems confronting K-modes, we present a new algorithm called K-distributions in this paper. We experimentally tested K-distributions using the well known 36 UCI data sets selected by Weka, and compared it to K-modes. The experimental results show that K-distributions significantly outperforms K-modes in term of clustering accuracy and log likelihood.

Cai, Zhihua; Wang, Dianhong; Jiang, Liangxiao

192

Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.  

PubMed

Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: ?-SNMF and ? -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression. PMID:22042156

He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

2011-12-01

193

Cascade of clusters - from metaphor to algorithm?  

NASA Astrophysics Data System (ADS)

Models and observations provide the growing evidence that an earthquake of magnitude M and source dimension L(M) culminates a cascade of isolated clusters in a lower magnitude range and in the area proportional to L(M). These clusters, usually hidden in the background seismicity, form most, if not all, premonitory seismicity patterns. Similar phenomenon is observed, less clearly, in non-linear systems of different origin.

Keilis-Borok, V.; Gabrielov, A.; Turcotte, D.; Zaliapin, I.

2002-12-01

194

Sampling Within k-Means Algorithm to Cluster Large Datasets  

SciTech Connect

Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.

Bejarano, Jeremy [Brigham Young University; Bose, Koushiki [Brown University; Brannan, Tyler [North Carolina State University; Thomas, Anita [Illinois Institute of Technology; Adragni, Kofi [University of Maryland; Neerchal, Nagaraj [University of Maryland; Ostrouchov, George [ORNL

2011-08-01

195

Application of K- and fuzzy c-means for color segmentation of thermal infrared breast images.  

PubMed

Color segmentation of infrared thermal images is an important factor in detecting the tumor region. The cancerous tissue with angiogenesis and inflammation emits temperature pattern different from the healthy one. In this paper, two color segmentation techniques, K-means and fuzzy c-means for color segmentation of infrared (IR) breast images are modeled and compared. Using the K-means algorithm in Matlab, some empty clusters may appear in the results. Fuzzy c-means is preferred because the fuzzy nature of IR breast images helps the fuzzy c-means segmentation to provide more accurate results with no empty cluster. Since breasts with malignant tumors have higher temperature than healthy breasts and even breasts with benign tumors, in this study, we look for detecting the hottest regions of abnormal breasts which are the suspected regions. The effect of IR camera sensitivity on the number of clusters in segmentation is also investigated. When the camera is ultra sensitive the number of clusters being considered may be increased. PMID:20192053

EtehadTavakol, M; Sadri, S; Ng, E Y K

2010-02-01

196

Cluster algorithms with empahsis on quantum spin systems  

SciTech Connect

The purpose of this lecture is to discuss in detail the generalized approach of Kawashima and Gubernatis for the construction of cluster algorithms. We first present a brief refresher on the Monte Carlo method, describe the Swendsen-Wang algorithm, show how this algorithm follows from the Fortuin-Kastelyn transformation, and re=interpret this transformation in a form which is the basis of the generalized approach. We then derive the essential equations of the generalized approach. This derivation is remarkably simple if done from the viewpoint of probability theory, and the essential assumptions will be clearly stated. These assumptions are implicit in all useful cluster algorithms of which we are aware. They lead to a quite different perspective on cluster algorithms than found in the seminal works and in Ising model applications. Next, we illustrate how the generalized approach leads to a cluster algorithm for world-line quantum Monte Carlo simulations of Heisenberg models with S = 1/2. More succinctly, we also discuss the generalization of the Fortuin- Kasetelyn transformation to higher spin models and illustrate the essential steps for a S = 1 Heisenberg model. Finally, we summarize how to go beyond S = 1 to a general spin, XYZ model.

Gubernatis, J.E. [Los Alamos National Lab., NM (United States); Kawashima, Naoki [Tokyo Univ. (Japan). Dept. of Physics

1995-10-06

197

Genetic algorithms for determining the topological structure of metallic clusters  

Microsoft Academic Search

Genetic algorithms (GA) are applied for the optimization of the structure of metallic clusters by the calculation of the ground-state energies from a tight-binding (Hückel) Hamiltonian. The optimum topology or graph is searched by the use of the adjacency matrix A ij as a natural coding. The initial populations for N-atom clusters are generated from a representative group of fit

R. Poteau; G. M. Pastor

1999-01-01

198

NCUBE - A clustering algorithm based on a discretized data space  

NASA Technical Reports Server (NTRS)

Cluster analysis involves the unsupervised grouping of data. The process provides an automatic procedure for generating known training samples for pattern classification. NCUBE, the clustering algorithm presented, is based upon the concept of imposing a gridwork on the data space. The NCUBE computer implementation of this concept provides an easily derived form of piecewise linear discrimination. This piecewise linear discrimination permits the separation of some types of data groups that are not linearly separable.

Eigen, D. J.; Northouse, R. A.

1974-01-01

199

Flow based clustering algorithm for tourism search engine  

Microsoft Academic Search

This paper introduces a flow based clustering algorithm for tourism search engine. Unlike the general tourism search engines such as www.tripadvisor.com, www.qunar.com and www.kayak.com etc. to return the users' queries huge amount of web page links, this algorithm helps the tourism search engine create a list of words which serve as suggestions to expand and update the users' queries. It

Liu Jie; Du Junping; Sun Zengqi; Jia Yingming

2010-01-01

200

A survey: hybrid evolutionary algorithms for cluster analysis  

Microsoft Academic Search

Clustering is a popular data analysis and data mining technique. It is the unsupervised classification of patterns into groups.\\u000a Many algorithms for large data sets have been proposed in the literature using different techniques. However, conventional\\u000a algorithms have some shortcomings such as slowness of the convergence, sensitive to initial value and preset classed in large\\u000a scale data set etc. and

Mohamed Jafar Abul Hasan; Sivakumar Ramakrishnan

201

A new hybrid imperialist competitive algorithm on data clustering  

Microsoft Academic Search

.  Clustering is a process for partitioning datasets. This technique is very useful for optimum solution. k-means is one of the simplest and the most famous methods that is based on square error criterion. This algorithm depends\\u000a on initial states and converges to local optima. Some recent researches show that k-means algorithm has been successfully applied to combinatorial optimization problems for

TAHER NIKNAM; ELAHE TAHERIAN FARD; SHERVIN EHRAMPOOSH; ALIREZA ROUSTA

2011-01-01

202

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise  

Microsoft Academic Search

Clustering algorithms are attractive for the task of class iden- tification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large da- tabases. The well-known clustering algorithms offer no solu- tion to

Martin Ester; Hans-peter Kriegel; Jörg Sander; Xiaowei Xu

1996-01-01

203

ORCA: The Overdense Red-sequence Cluster Algorithm  

NASA Astrophysics Data System (ADS)

We present a new cluster-detection algorithm designed for the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) survey but with generic application to any multiband data. The method makes no prior assumptions about the properties of clusters other than (i) the similarity in colour of cluster galaxies (the 'red sequence'); and (ii) an enhanced projected surface density. The detector has three main steps: (i) it identifies cluster members by photometrically filtering the input catalogue to isolate galaxies in colour-magnitude space; (ii) a Voronoi diagram identifies regions of high surface density; and (iii) galaxies are grouped into clusters with a Friends-of-Friends technique. Where multiple colours are available, we require systems to exhibit sequences in two colours. In this paper, we present the algorithm and demonstrate it on two data sets. The first is a 7-deg2 sample of the deep Sloan Digital Sky Survey (SDSS) equatorial stripe (Stripe 82), from which we detect 97 clusters with z? 0.6. Benefitting from deeper data, we are 100 per cent complete in the maxBCG optically selected cluster catalogue (based on shallower single-epoch SDSS data) and find an additional 78 previously unidentified clusters. The second data set is a mock Medium Deep Survey Pan-STARRS catalogue, based on the ? cold dark matter (?CDM) model and a semi-analytic galaxy formation recipe. Knowledge of galaxy-halo memberships in the mock catalogue allows for the quantification of algorithm performance. We detect 305 mock clusters in haloes with mass >1013 h-1 M? at z? 0.6 and determine a spurious detection rate of <1 per cent, consistent with tests on the Stripe 82 catalogue. The detector performs well in the recovery of model ?CDM clusters. At the median redshift of the catalogue, the algorithm achieves >75 per cent completeness down to halo masses of 1013.4 h-1 M? and recovers >75 per cent of the total stellar mass of clusters in haloes down to 1013.8 h-1 M?. A companion paper presents the complete cluster catalogue over the full 270-deg2 Stripe 82 catalogue.

Murphy, D. N. A.; Geach, J. E.; Bower, R. G.

2012-03-01

204

Clustering algorithm evaluation and the development of a replacement for procedure 1. [for crop inventories  

NASA Technical Reports Server (NTRS)

An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.

Lennington, R. K.; Johnson, J. K.

1979-01-01

205

Parallelizing the buckshot algorithm for efficient document clustering  

Microsoft Academic Search

We present a parallel implementation of the Buckshot document clustering algorithm. We demonstrate that this parallel approach is highly efficient both in terms of load balancing and minimization of communication. In a series of experiments using the 2GB of SGML data from TReC disks 4 and 5, our parallel approach was shown to be scalable in terms of processors efficiently

Eric C. Jensen; Steven M. Beitzel; Angelo J. Pilotto; Nazli Goharian; Ophir Frieder

2002-01-01

206

The Effect of Document Ordering in Rocchio's Clustering Algorithm  

ERIC Educational Resources Information Center

Presented is an empirical confirmation of clustering behavior that has been proposed as a conjecture in the literature; namely, that document ordering is more significant in an algorithm using a full document space search, rather than a search of just unclustered documents, each time a density test is performed. (4 references) (Author)

Cody, Roger

1973-01-01

207

Clustered Self Organising Migrating Algorithm for the Quadratic Assignment Problem  

NASA Astrophysics Data System (ADS)

An approach of population dynamics and clustering for permutative problems is presented in this paper. Diversity indicators are created from solution ordering and its mapping is shown as an advantage for population control in metaheuristics. Self Organising Migrating Algorithm (SOMA) is modified using this approach and vetted with the Quadratic Assignment Problem (QAP). Extensive experimentation is conducted on benchmark problems in this area.

Davendra, Donald; Zelinka, Ivan; Senkerik, Roman

2009-08-01

208

Neuromorphic algorithms on clusters of PlayStation 3s  

Microsoft Academic Search

There is a significant interest in the research community to develop large scale, high performance implementations of neuromorphic models. These have the potential to provide significantly stronger information processing capabilities than current computing algorithms. In this paper we present the implementation of five neuromorphic models on a 50 TeraFLOPS 336 node Playstation 3 cluster at the Air Force Research Laboratory.

Tarek M. Taha; Pavan Yalamanchili; Mohammad Bhuiyan; Rommel Jalasutram; Chong Chen; Richard Linderman

2010-01-01

209

TBC: a clustering algorithm based on prokaryotic taxonomy.  

PubMed

High-throughput DNA sequencing technologies have revolutionized the study of microbial ecology. Massive sequencing of PCR amplicons of the 16S rRNA gene has been widely used to understand the microbial community structure of a variety of environmental samples. The resulting sequencing reads are clustered into operational taxonomic units that are then used to calculate various statistical indices that represent the degree of species diversity in a given sample. Several algorithms have been developed to perform this task, but they tend to produce different outcomes. Herein, we propose a novel sequence clustering algorithm, namely Taxonomy-Based Clustering (TBC). This algorithm incorporates the basic concept of prokaryotic taxonomy in which only comparisons to the type strain are made and used to form species while omitting full-scale multiple sequence alignment. The clustering quality of the proposed method was compared with those of MOTHUR, BLASTClust, ESPRIT-Tree, CD-HIT, and UCLUST. A comprehensive comparison using three different experimental datasets produced by pyrosequencing demonstrated that the clustering obtained using TBC is comparable to those obtained using MOTHUR and ESPRIT-Tree and is computationally efficient. The program was written in JAVA and is available from http://sw.ezbiocloud.net/tbc. PMID:22538644

Lee, Jae-Hak; Yi, Hana; Jeon, Yoon-Seong; Won, Sungho; Chun, Jongsik

2012-04-01

210

Partitioning clustering algorithms for protein sequence data sets  

PubMed Central

Background Genome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. The unsupervised classification of these data into functional groups or families, clustering, has become one of the principal research objectives in structural and functional genomics. Computer programs to automatically and accurately classify sequences into families become a necessity. A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Among the various sequence clustering methods in literature, hierarchical and graph-based approaches have been widely used. Although partitioning clustering techniques are extremely used in other fields, few applications have been found in the field of protein sequence clustering. It is not fully demonstrated if partitioning methods can be applied to protein sequence data and if these methods can be efficient compared to the published clustering methods. Methods We developed four partitioning clustering approaches using Smith-Waterman local-alignment algorithm to determine pair-wise similarities of sequences. Four different sets of protein sequences were used as evaluation data sets for the proposed methods. Results We show that these methods outperform several other published clustering methods in terms of correctly predicting a classifier and especially in terms of the correctness of the provided prediction. The software is available to academic users from the authors upon request.

Fayech, Sondes; Essoussi, Nadia; Limam, Mohamed

2009-01-01

211

Which clustering algorithm is better for predicting protein complexes?  

PubMed Central

Background Protein-Protein interactions (PPI) play a key role in determining the outcome of most cellular processes. The correct identification and characterization of protein interactions and the networks, which they comprise, is critical for understanding the molecular mechanisms within the cell. Large-scale techniques such as pull down assays and tandem affinity purification are used in order to detect protein interactions in an organism. Today, relatively new high-throughput methods like yeast two hybrid, mass spectrometry, microarrays, and phage display are also used to reveal protein interaction networks. Results In this paper we evaluated four different clustering algorithms using six different interaction datasets. We parameterized the MCL, Spectral, RNSC and Affinity Propagation algorithms and applied them to six PPI datasets produced experimentally by Yeast 2 Hybrid (Y2H) and Tandem Affinity Purification (TAP) methods. The predicted clusters, so called protein complexes, were then compared and benchmarked with already known complexes stored in published databases. Conclusions While results may differ upon parameterization, the MCL and RNSC algorithms seem to be more promising and more accurate at predicting PPI complexes. Moreover, they predict more complexes than other reviewed algorithms in absolute numbers. On the other hand the spectral clustering algorithm achieves the highest valid prediction rate in our experiments. However, it is nearly always outperformed by both RNSC and MCL in terms of the geometrical accuracy while it generates the fewest valid clusters than any other reviewed algorithm. This article demonstrates various metrics to evaluate the accuracy of such predictions as they are presented in the text below. Supplementary material can be found at: http://www.bioacademy.gr/bioinformatics/projects/ppireview.htm

2011-01-01

212

Adaptable fuzzy C-Means for improved classification as a preprocessing procedure of brain parcellation.  

PubMed

Parcellation, one of several brain analysis methods, is a procedure popular for subdividing the regions identified by segmentation into smaller topographically defined units. The fuzzy clustering algorithm is mainly used to preprocess parcellation into several segmentation methods, because it is very appropriate for the characteristics of magnetic resonance imaging (MRI), such as partial volume effect and intensity inhomogeneity. However, some gray matter, such as basal ganglia and thalamus, may be misclassified into the white matter class using the conventional fuzzy C-Means (FCM) algorithm. Parcellation has been nearly achieved through manual drawing, but it is a tedious and time-consuming process. We propose improved classification using successive fuzzy clustering and implementing the parcellation module with the modified graphic user interface (GUI) for the convenience of users. PMID:11442112

Yoon, U C; Kim, J S; Kim, J S; Kim, I Y; Kim, S I

2001-06-01

213

Automated detection of the left ventricular region in magnetic resonance images by Fuzzy c-Means model.  

PubMed

A new method for automated detection of the Left Ventricular (LV) region in Magnetic Resonance Imaging is presented. This method is based on the Fuzzy c-Means (FCM) clustering algorithm. The FCM is applied to each static frame of the cardiac cycle to detect the LV region. Delineation of this region is essential in the quantitative analysis of the cardiac function. The effectiveness of the method is demonstrated by application to sequences of cardiac images. PMID:9306149

Boudraa A el-O

1997-08-01

214

A comparison of clustering algorithms in article recommendation system  

NASA Astrophysics Data System (ADS)

Recommendation system is considered a tool that can be used to recommend researchers about resources that are suitable for their research of interest by using content-based filtering. In this paper, clustering algorithm as an unsupervised learning is introduced for grouping objects based on their feature selection and similarities. The information of publication in Science Cited Index is used to be dataset for clustering as a feature extraction in terms of dimensionality reduction of these articles by comparing Latent Dirichlet Allocation (LDA), Principal Component Analysis (PCA), and K-Mean to determine the best algorithm. In my experiment, the selected database consists of 2625 documents extraction extracted from SCI corpus from 2001 to 2009. Clustering into ranks as 50,100,200,250 is used to consider and using F-Measure evaluate among them in three algorithms. The result of this paper showed that LDA technique given the accuracy up to 95.5% which is the highest effective than any other clustering technique.

Tantanasiriwong, Supaporn

2011-12-01

215

clusterMaker: a multi-algorithm clustering plugin for Cytoscape  

PubMed Central

Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin clusterMaker provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the clusterMaker plugin. clusterMaker is available via the Cytoscape plugin manager.

2011-01-01

216

A new variable---length genome genetic algorithm for data clustering in semeiotics  

Microsoft Academic Search

This paper focuses on the introduction of a new evolutionary algorithm for data clustering, the Self-sizing Genome Genetic Algorithm. It is akin to a messy Genetic Algorithm and does not use a priori information about the number of clusters. A new recombination operator, gene-pooling, is introduced, while fitness is based on simultaneously maximizing intra-cluster homogeneity and inter-cluster separability. This algorithm

Ivan De Falco; Ernesto Tarantino; Antonio Della Cioppa; F. Gagliardi

2005-01-01

217

Comparison of Cluster Algorithms for the Analysis of Text Data Using Kolmogorov Complexity  

Microsoft Academic Search

In this paper we present a comparison of multiple cluster algorithms and their suitability for clustering text data. The clustering\\u000a is based on similarities only, employing the Kolmogorov complexity as a similiarity measure. This motivates the set of considered\\u000a clustering algorithms which take into account the similarity between objects exclusively. Compared cluster algorithms are\\u000a Median kMeans, Median Neural Gas, Relational

Tina Geweniger; Frank-michael Schleif; Alexander Hasenfuss; Barbara Hammer; Thomas Villmann

2008-01-01

218

LUCA: An Energy-efficient Unequal Clustering Algorithm Using Location Information for Wireless Sensor Networks  

Microsoft Academic Search

Over the last several years, various clustering algorithms for wireless sensor networks have been proposed to prolong network\\u000a lifetime. Most clustering algorithms provide an equal cluster size using node’s ID, degree and etc. However, many of these\\u000a algorithms heuristically determine the cluster size, even though the cluster size significantly affects the energy consumption\\u000a of the entire network. In this paper,

Sungryoul Lee; Han Choe; Yukyoung Song; Chong-kwon Kim

2011-01-01

219

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm  

Microsoft Academic Search

There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms are more suitable for clustering large datasets. The K-means algorithm

Xiaohui Cui; Thomas E. Potok

220

An Efficient Ant Algorithm for Swarm-Based Image Clustering  

Microsoft Academic Search

A collective approach to resolve the segmentation problem was proposed. AntClust is a new ant-based algorithm that uses the self-organizing and autonomous brood sorting behavior observed in real ants. Ants and pixels are scatted on a discrete array of cells represented the ants' environment. Using simple local rules and without any central control, ants form homogeneous clusters by moving pixels

Salima Ouadfel; Mohamed Batouche

2007-01-01

221

On an ensemble algorithm for clustering cancer patient data  

PubMed Central

Background The TNM staging system is based on three anatomic prognostic factors: Tumor, Lymph Node and Metastasis. However, cancer is no longer considered an anatomic disease. Therefore, the TNM should be expanded to accommodate new prognostic factors in order to increase the accuracy of estimating cancer patient outcome. The ensemble algorithm for clustering cancer data (EACCD) by Chen et al. reflects an effort to expand the TNM without changing its basic definitions. Though results on using EACCD have been reported, there has been no study on the analysis of the algorithm. In this report, we examine various aspects of EACCD using a large breast cancer patient dataset. We compared the output of EACCD with the corresponding survival curves, investigated the effect of different settings in EACCD, and compared EACCD with alternative clustering approaches. Results Using the basic T and N definitions, EACCD generated a dendrogram that shows a graphic relationship among the survival curves of the breast cancer patients. The dendrograms from EACCD are robust for large values of m (the number of runs in the learning step). When m is large, the dendrograms depend on the linkage functions. The statistical tests, however, employed in the learning step have minimal effect on the dendrogram for large m. In addition, if omitting the step for learning dissimilarity in EACCD, the resulting approaches can have a degraded performance. Furthermore, clustering only based on prognostic factors could generate misleading dendrograms, and direct use of partitioning techniques could lead to misleading assignments to clusters. Conclusions When only the Partitioning Around Medoids (PAM) algorithm is involved in the step of learning dissimilarity, large values of m are required to obtain robust dendrograms, and for a large m EACCD can effectively cluster cancer patient data.

2013-01-01

222

An improved distance matrix computation algorithm for multicore clusters.  

PubMed

Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations. DistVect algorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes DistVect1 as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulates DistVect1 vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI. PMID:25013779

Al-Neama, Mohammed W; Reda, Naglaa M; Ghaleb, Fayed F M

2014-01-01

223

An Improved Distance Matrix Computation Algorithm for Multicore Clusters  

PubMed Central

Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations. DistVect algorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes DistVect1 as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulates DistVect1 vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI.

Al-Neama, Mohammed W.; Reda, Naglaa M.; Ghaleb, Fayed F. M.

2014-01-01

224

Robust growing neural gas algorithm with application in cluster analysis.  

PubMed

We propose a novel robust clustering algorithm within the Growing Neural Gas (GNG) framework, called Robust Growing Neural Gas (RGNG) network.The Matlab codes are available from . By incorporating several robust strategies, such as outlier resistant scheme, adaptive modulation of learning rates and cluster repulsion method into the traditional GNG framework, the proposed RGNG network possesses better robustness properties. The RGNG is insensitive to initialization, input sequence ordering and the presence of outliers. Furthermore, the RGNG network can automatically determine the optimal number of clusters by seeking the extreme value of the Minimum Description Length (MDL) measure during network growing process. The resulting center positions of the optimal number of clusters represented by prototype vectors are close to the actual ones irrespective of the existence of outliers. Topology relationships among these prototypes can also be established. Experimental results have shown the superior performance of our proposed method over the original GNG incorporating MDL method, called GNG-M, in static data clustering tasks on both artificial and UCI data sets. PMID:15555857

Qin, A K; Suganthan, P N

2004-01-01

225

Genetic algorithms for determining the topological structure of metallic clusters  

NASA Astrophysics Data System (ADS)

Genetic algorithms (GA) are applied for the optimization of the structure of metallic clusters by the calculation of the ground-state energies from a tight-binding (Hückel) Hamiltonian. The optimum topology or graph is searched by the use of the adjacency matrix Aij as a natural coding. The initial populations for N-atom clusters are generated from a representative group of fit cluster structures having N-1 atoms by the addition of random connections or hoppings between the Nth atom and the rest of the cluster atoms (AiN=0 or 1). The diversity of geometries is enlarged by 20% with fully random structures. Several crossover strategies are proposed for the genetic evolution that combine the ``parent'' clusters while trying to preserve or transmit the physical characteristics of the parents' topologies. The performance of the different procedures is tested. For N<=13, the present GA yield topological structures that are in agreement with previous geometry optimizations performed using an enumerative search (N<=9) or simulated annealing Monte Carlo (10<=N<=13) methods. Limitations and extensions for N>=14 are discussed.

Poteau, R.; Pastor, G. M.

226

Finding reproducible cluster partitions for the k-means algorithm  

PubMed Central

K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions. This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset.

2013-01-01

227

A Geospatial Implementation of a Novel Delineation Clustering Algorithm Employing the K-means  

Microsoft Academic Search

The overarching objective of this study is to report the implementation and performance of a novel delineation clustering algorithm employing the k-means. This study explores a newly proposed algorithm designed to increase the overall performance of the k-means clustering technique—the Fast, Efficient, and Scalable k-means algorithm (FES-k- means*). The algorithm reduces the computational load and produce quality clusters. Resulting improvements

Tonny J. Oyana; Kara E. Scott

2008-01-01

228

Mammographic images segmentation based on chaotic map clustering algorithm  

PubMed Central

Background This work investigates the applicability of a novel clustering approach to the segmentation of mammographic digital images. The chaotic map clustering algorithm is used to group together similar subsets of image pixels resulting in a medically meaningful partition of the mammography. Methods The image is divided into pixels subsets characterized by a set of conveniently chosen features and each of the corresponding points in the feature space is associated to a map. A mutual coupling strength between the maps depending on the associated distance between feature space points is subsequently introduced. On the system of maps, the simulated evolution through chaotic dynamics leads to its natural partitioning, which corresponds to a particular segmentation scheme of the initial mammographic image. Results The system provides a high recognition rate for small mass lesions (about 94% correctly segmented inside the breast) and the reproduction of the shape of regions with denser micro-calcifications in about 2/3 of the cases, while being less effective on identification of larger mass lesions. Conclusions We can summarize our analysis by asserting that due to the particularities of the mammographic images, the chaotic map clustering algorithm should not be used as the sole method of segmentation. It is rather the joint use of this method along with other segmentation techniques that could be successfully used for increasing the segmentation performance and for providing extra information for the subsequent analysis stages such as the classification of the segmented ROI.

2014-01-01

229

PCA-based population structure inference with generic clustering algorithms  

PubMed Central

Background Handling genotype data typed at hundreds of thousands of loci is very time-consuming and it is no exception for population structure inference. Therefore, we propose to apply PCA to the genotype data of a population, select the significant principal components using the Tracy-Widom distribution, and assign the individuals to one or more subpopulations using generic clustering algorithms. Results We investigated K-means, soft K-means and spectral clustering and made comparison to STRUCTURE, a model-based algorithm specifically designed for population structure inference. Moreover, we investigated methods for predicting the number of subpopulations in a population. The results on four simulated datasets and two real datasets indicate that our approach performs comparably well to STRUCTURE. For the simulated datasets, STRUCTURE and soft K-means with BIC produced identical predictions on the number of subpopulations. We also showed that, for real dataset, BIC is a better index than likelihood in predicting the number of subpopulations. Conclusion Our approach has the advantage of being fast and scalable, while STRUCTURE is very time-consuming because of the nature of MCMC in parameter estimation. Therefore, we suggest choosing the proper algorithm based on the application of population structure inference.

Lee, Chih; Abdool, Ali; Huang, Chun-Hsi

2009-01-01

230

New fuzzy shell clustering algorithms for boundary detection and pattern recognition  

NASA Astrophysics Data System (ADS)

In this paper, we introduce new hard and fuzzy clustering algorithms called the c-quadric shells (CQS) algorithms. These algorithms are specifically designed to seek clusters that can be described by segments of second-degree curves, or more generally by segments of shells of hyperquadrics. Previous shell clustering algorithms have considered clusters of specific shapes such as circles (the fuzzy c-shells algorithm) or ellipses (the fuzzy c-ellipsoids algorithm). The advantage of our algorithm lies in the fact that it can be used to cluster mixtures of all types of hyperquadrics such as hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids, and even hyperplanes. Several examples of clustering in the two-dimensional case are shown.

Krishnapuram, Raghu J.; Frigui, Hichem; Nasraoui, Olfa

1992-02-01

231

Classification of posture maintenance data with fuzzy clustering algorithms  

NASA Technical Reports Server (NTRS)

Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.

Bezdek, James C.

1992-01-01

232

Quantum clustering algorithm and its application in warning forecast of tourism emergency  

Microsoft Academic Search

In this paper, we combine quantum computation and clustering algorithm in data mining. In this algorithm, we give the suppose that around the clustering centers exits a potential field, in Hilbert space, we get the potential energy function through Schrödinger equation. We use this as rules to assign element to clusters. Finally, through the simulative experiment, we validated its validity

Ruijie Wang; Junping Du; Min Zuo; Xuyan Tu

2007-01-01

233

A Memetic Algorithm for Selection of 3D Clustered Features with Applications in Neuroscience  

Microsoft Academic Search

We propose a Memetic algorithm for feature selection in volumetric data containing spatially distributed clusters of informative features, typically encountered in neuroscience applications. The proposed method complements a conventional genetic algorithm with a local search utilizing inherent spatial relationships to efficiently identify informative feature clusters across multiple regions of the search volume. First, we demonstrate the utility of the algorithm

Malin Björnsdotter Åberg; Johan Wessberg

2010-01-01

234

Texture segmentation algorithm based on wavelet transform and kd-tree clustering  

Microsoft Academic Search

A texture image segmentation algorithm based on wavelet transform and kd-tree clustering is studied in this paper. Firstly, texture features of an image are extracted using wavelet transform. Secondly, an improved algorithm based on quarter partition is given to smooth the texture feature image. Thirdly, the clustering algorithm using the kd-tree data structure is applied to the texture segmentation, and

Guosheng Yang; Yanli Hou; Chunyan Huang

2004-01-01

235

Generalized clustering networks and Kohonen's self-organizing scheme  

Microsoft Academic Search

The relationship between the sequential hard c-means (SHCM) and learning vector quantization (LVQ) clustering algorithms is discussed. The impact and interaction of these two families of methods with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method but often lends ideas to clustering algorithms, are considered. A generalization of LVQ that updates all nodes for a given input

Nikhil R. Pal; James C. Bezdek; E. C.-K. Tsao

1993-01-01

236

jClustering, an open framework for the development of 4D clustering algorithms.  

PubMed

We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913

Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J

2013-01-01

237

PFClust: an optimised implementation of a parameter-free clustering algorithm  

PubMed Central

Background A well-known problem in cluster analysis is finding an optimal number of clusters reflecting the inherent structure of the data. PFClust is a partitioning-based clustering algorithm capable, unlike many widely-used clustering algorithms, of automatically proposing an optimal number of clusters for the data. Results The results of tests on various types of data showed that PFClust can discover clusters of arbitrary shapes, sizes and densities. The previous implementation of the algorithm had already been successfully used to cluster large macromolecular structures and small druglike compounds. We have greatly improved the algorithm by a more efficient implementation, which enables PFClust to process large data sets acceptably fast. Conclusions In this paper we present a new optimized implementation of the PFClust algorithm that runs considerably faster than the original.

2014-01-01

238

LD-BSCA: A local-density based spatial clustering algorithm  

Microsoft Academic Search

Density-based clustering algorithms are very powerful to discover arbitrary-shaped clusters in large spatial databases. However, in many cases, varied local-density clusters exist in different regions of data space. In this paper, a new algorithm LD-BSCA is proposed with introducing the concept of local MinPts (a minimum number of points) and the new cluster expanding condition: ExpandConClId (Expanding Condition of ClId-th

Guiyi Wei; Haiping Liu

2009-01-01

239

Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values  

Microsoft Academic Search

The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses

Zhexue Huang

1998-01-01

240

SRUMMA: A Matrix Multiplication Algorithm Suitable for Clusters and Scalable Shared Memory Systems  

Microsoft Academic Search

This paper describes a novel parallel algorithm that implements a dense matrix multiplication operation with algorithmic efficiency equivalent to that of Cannon's algorithm. It is suitable for clusters and scalable shared memory systems. The current approach differs from the other parallel matrix multiplication algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message

Manojkumar Krishnan; Jarek Nieplocha

2004-01-01

241

Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation  

Microsoft Academic Search

Image segmentation is an indispensable process in the visualization of human tissues, particularly during clinical analysis\\u000a of magnetic resonance (MR) images. In this paper, the rough-fuzzy c-means (RFCM) algorithm is presented for segmentation of brain MR images. The RFCM algorithm comprises a judicious integration\\u000a of the rough sets, fuzzy sets, and c-means algorithm. While the concept of lower and upper

Pradipta Maji; Sankar K. Pal

2008-01-01

242

Subspace Clustering of Text Documents with Feature Weighting K-Means Algorithm  

Microsoft Academic Search

\\u000a This paper presents a new method to solve the problem of clustering large and complex text data. The method is based on a\\u000a new subspace clustering algorithm that automatically calculates the feature weights in the k-means clustering process. In clustering sparse text data the feature weights are used to discover clusters from subspaces\\u000a of the document vector space and identify

Liping Jing; Michael K. Ng; Jun Xu; Joshua Zhexue Huang

2005-01-01

243

Introducing Gene Clusters into a P2P Based TSP Solving Algorithm  

Microsoft Academic Search

The TSP (traveling salesman problem) genetic algorithm is very possible of destroying ever found pieces of a path. To prevent the found pieces from being destroyed, a P2P based TSP genetic algorithm P2PTSPGA which make use of gene clusters is presented. The gene cluster which stands for a series of cities is past down in whole to the offspring from

Guangzhi Ma; Yansheng Lu; Enmin Song; Wei Zhang

2009-01-01

244

Automatic Clustering Using a Synergy of Genetic Algorithm and Multi-objective Differential Evolution  

Microsoft Academic Search

This paper applies the Differential Evolution (DE) and Genetic Algorithm (GA) to the task of automatic fuzzy clustering in a Multi-objective Optimization (MO) framework. It compares the performance a hybrid of the GA and DE (GADE) algorithms over the fuzzy clustering problem, where two conflicting fuzzy validity indices are simultaneously optimized. The resultant Pareto optimal set of solutions from each

Debarati Kundu; Kaushik Suresh; Sayan Ghosh; Swagatam Das; Ajith Abraham; Youakim Badr

2009-01-01

245

Security clustering algorithm based on reputation in hierarchical peer-to-peer network  

NASA Astrophysics Data System (ADS)

For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.

Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji

2013-03-01

246

Learning Assignment Order of Instances for the Constrained K-Means Clustering Algorithm  

Microsoft Academic Search

The sensitivity of the constrained K-means clustering algorithm (Cop-Kmeans) to the assignment order of instances is studied, and a novel assignment order learning method for Cop-Kmeans, termed as clustering Uncertainty-based Assignment order Learning Algorithm (UALA), is proposed in this paper. The main idea of UALA is to rank all instances in the data set according to their clustering uncertainties calculated

Yi Hong; Sam Kwong

2009-01-01

247

GX-Means: A model-based divide and merge algorithm for geospatial image clustering  

Microsoft Academic Search

One of the practical issues in clustering is the specification of the appropriate number of clusters, which is not obvious when analyzing geospatial datasets, partly because they are huge (both in size and spatial extent) and high dimensional. In this paper we present a computationally effcient model-based split and merge clustering algorithm that incrementally finds model parameters and the number

Ranga Raju Vatsavai; Christopher T. Symons; Varun Chandola; Goo Jun

2011-01-01

248

Necessary conditions for determining a robust time threshold in standard INFOSEC alert clustering algorithms  

Microsoft Academic Search

The standard INFOSEC alert clustering algorithms use a predetermined fixed time threshold for specifying the maximum duration over which new INFOSEC alerts can be added to existing alert clusters. Since these alert clusters are the basis of further alert correlation processing, an important question is can this time threshold be set robustly, in the sense that an optimal value can

Stephen W. Neville

2005-01-01

249

A Formal Algorithm for Verifying the Validity of Clustering Results Based on Model Checking  

PubMed Central

The limitations in general methods to evaluate clustering will remain difficult to overcome if verifying the clustering validity continues to be based on clustering results and evaluation index values. This study focuses on a clustering process to analyze crisp clustering validity. First, we define the properties that must be satisfied by valid clustering processes and model clustering processes based on program graphs and transition systems. We then recast the analysis of clustering validity as the problem of verifying whether the model of clustering processes satisfies the specified properties with model checking. That is, we try to build a bridge between clustering and model checking. Experiments on several datasets indicate the effectiveness and suitability of our algorithms. Compared with traditional evaluation indices, our formal method can not only indicate whether the clustering results are valid but, in the case the results are invalid, can also detect the objects that have led to the invalidity.

Huang, Shaobin; Cheng, Yuan; Lang, Dapeng; Chi, Ronghua; Liu, Guofeng

2014-01-01

250

A formal algorithm for verifying the validity of clustering results based on model checking.  

PubMed

The limitations in general methods to evaluate clustering will remain difficult to overcome if verifying the clustering validity continues to be based on clustering results and evaluation index values. This study focuses on a clustering process to analyze crisp clustering validity. First, we define the properties that must be satisfied by valid clustering processes and model clustering processes based on program graphs and transition systems. We then recast the analysis of clustering validity as the problem of verifying whether the model of clustering processes satisfies the specified properties with model checking. That is, we try to build a bridge between clustering and model checking. Experiments on several datasets indicate the effectiveness and suitability of our algorithms. Compared with traditional evaluation indices, our formal method can not only indicate whether the clustering results are valid but, in the case the results are invalid, can also detect the objects that have led to the invalidity. PMID:24608823

Huang, Shaobin; Cheng, Yuan; Lang, Dapeng; Chi, Ronghua; Liu, Guofeng

2014-01-01

251

A mobility-based cluster formation algorithm for wireless mobile ad-hoc networks  

Microsoft Academic Search

In the last decade, numerous efforts have been devoted to design efficient algorithms for clustering the wireless mobile ad-hoc\\u000a networks (MANET) considering the network mobility characteristics. However, in existing algorithms, it is assumed that the\\u000a mobility parameters of the networks are fixed, while they are stochastic and vary with time indeed. Therefore, the proposed\\u000a clustering algorithms do not scale well

Mohammad Reza Meybodi

252

A fast readout algorithm for Cluster Counting/Timing drift chambers on a FPGA board  

NASA Astrophysics Data System (ADS)

A fast readout algorithm for Cluster Counting and Timing purposes has been implemented and tested on a Virtex 6 core FPGA board. The algorithm analyses and stores data coming from a Helium based drift tube instrumented by 1 GSPS fADC and represents the outcome of balancing between cluster identification efficiency and high speed performance. The algorithm can be implemented in electronics boards serving multiple fADC channels as an online preprocessing stage for drift chamber signals.

Cappelli, L.; Creti, P.; Grancagnolo, F.; Pepino, A.; Tassielli, G.

2013-08-01

253

Research on image segmentation algorithm based on fuzzy clustering  

NASA Astrophysics Data System (ADS)

Through the depth study on the existing classical FCM algorithms, this paper puts forward a program to improve the FCM algorithm. The improved algorithm introduces the statistical properties of images, thus greatly reducing the amount of data the algorithm processed, to speed up the convergence rate, and the segmentation results with FCM algorithm are exactly the same.

Qu, Bo

2013-07-01

254

A fast training algorithm for RBF networks based on subtractive clustering  

Microsoft Academic Search

A new algorithm for training radial basis function neural networks is presented in this paper. The algorithm, which is based on the subtractive clustering technique, has a number of advantages compared to the traditional learning algorithms, including faster training times and more accurate predictions. Due to these advantages the method proves suitable for developing models for complex nonlinear systems.

Haralambos Sarimveis; Alex Alexandridis; George Bafas

2003-01-01

255

Document Clustering Using Multi-Objective Genetic Algorithms on MATLAB Distributed Computing  

Microsoft Academic Search

Genetic Algorithm (GA), one of the artificial intelligence algorithms, performs much better than the other algorithms for the document clustering. However, it has problem known as the premature convergence occurrence. So, Fuzzy Logic based GA (FLGA) was proposed to solve it. Nevertheless, it has still weakness such as the parameter dependence problem. In order to overcome this problem, the Multi-Objective

Jung Song Lee; Soon Cheol Park

2012-01-01

256

Multiresolution mean shift clustering algorithm for shape interpolation.  

PubMed

In this paper, we solve the problem of 3D shape interpolation with significant pose variation. For an ideal 3D shape interpolation, especially the articulated model, the shape should follow the movement of the underlying articulated structure and be transformed in a way that is as rigid as possible. Given input shapes with compatible connectivity, we propose a novel multiresolution mean shift (MMS) clustering algorithm to automatically extract their near-rigid components. Then, by building the hierarchical relationship among extracted components, we compute a common articulated structure for these input shapes. With the aid of this articulated structure, we solve the shape interpolation by combining 1) a global pose interpolation of near-rigid components from the source shape to the target shape with 2) a local gradient field interpolation for each pair of components, followed by solving a Poisson equation in order to reconstruct an interpolated shape. As a result, an aesthetically pleasing shape interpolation can be generated, with even the poses of shapes varying significantly. In contrast to a recent state-of-the-art work, the proposed approach can achieve comparable or even better results and have better computational efficiency as well. PMID:19590110

Chu, Hung-Kuo; Lee, Tong-Yee

2009-01-01

257

A Novel k-Means Algorithm for Clustering and Outlier Detection  

Microsoft Academic Search

A three-stage k-means algorithm of O(nkt) polynomial time is proposed to cluster the numerical data and detect the outliers. The clusters are preliminarily determined at the first stage. The local outliers of each cluster are found out and their influences on the centroid are removed at the second stage. Global outliers are consequently identified. Finally, the clusters, the densities of

Yinghua Zhou; Hong Yu; Xuemei Cai

2009-01-01

258

Geometric Algorithms for the Constrained 1-D K Means Clustering Problems and IMRT Applications  

Microsoft Academic Search

In this paper, we present efficient geometric algorithms for the discrete constrained 1-D K-means clustering problem and extend our solutions to the continuous version of the problem. One key clustering constraint\\u000a we consider is that the maximum difference in each cluster cannot be larger than a given threshold. These constrained 1-D\\u000a K-means clustering problems appear in various applications, especially in

Danny Z. Chen; Mark A. Healy; Chao Wang; Bin Xu

2007-01-01

259

Differential Evolution Based Fuzzy Clustering  

NASA Astrophysics Data System (ADS)

In this work, two new fuzzy clustering (FC) algorithms based on Differential Evolution (DE) are proposed. Five well-known data sets viz. Iris, Wine, Glass, E. Coli and Olive Oil are used to demonstrate the effectiveness of DEFC-1 and DEFC-2. They are compared with Fuzzy C-Means (FCM) algorithm and Threshold Accepting Based Fuzzy Clustering algorithms proposed by Ravi et al., [1]. Xie-Beni index is used to arrive at the 'optimal' number of clusters. Based on the numerical experiments, we infer that, in terms of least objective function value, these variants can be used as viable alternatives to FCM algorithm.

Ravi, V.; Aggarwal, Nupur; Chauhan, Nikunj

260

C-element: A New Clustering Algorithm to Find High Quality Functional Modules in PPI Networks  

PubMed Central

Graph clustering algorithms are widely used in the analysis of biological networks. Extracting functional modules in protein-protein interaction (PPI) networks is one such use. Most clustering algorithms whose focuses are on finding functional modules try either to find a clique like sub networks or to grow clusters starting from vertices with high degrees as seeds. These algorithms do not make any difference between a biological network and any other networks. In the current research, we present a new procedure to find functional modules in PPI networks. Our main idea is to model a biological concept and to use this concept for finding good functional modules in PPI networks. In order to evaluate the quality of the obtained clusters, we compared the results of our algorithm with those of some other widely used clustering algorithms on three high throughput PPI networks from Sacchromyces Cerevisiae, Homo sapiens and Caenorhabditis elegans as well as on some tissue specific networks. Gene Ontology (GO) analyses were used to compare the results of different algorithms. Each algorithm's result was then compared with GO-term derived functional modules. We also analyzed the effect of using tissue specific networks on the quality of the obtained clusters. The experimental results indicate that the new algorithm outperforms most of the others, and this improvement is more significant when tissue specific networks are used.

Ghasemi, Mahdieh; Rahgozar, Maseud; Bidkhori, Gholamreza; Masoudi-Nejad, Ali

2013-01-01

261

C-element: a new clustering algorithm to find high quality functional modules in PPI networks.  

PubMed

Graph clustering algorithms are widely used in the analysis of biological networks. Extracting functional modules in protein-protein interaction (PPI) networks is one such use. Most clustering algorithms whose focuses are on finding functional modules try either to find a clique like sub networks or to grow clusters starting from vertices with high degrees as seeds. These algorithms do not make any difference between a biological network and any other networks. In the current research, we present a new procedure to find functional modules in PPI networks. Our main idea is to model a biological concept and to use this concept for finding good functional modules in PPI networks. In order to evaluate the quality of the obtained clusters, we compared the results of our algorithm with those of some other widely used clustering algorithms on three high throughput PPI networks from Sacchromyces Cerevisiae, Homo sapiens and Caenorhabditis elegans as well as on some tissue specific networks. Gene Ontology (GO) analyses were used to compare the results of different algorithms. Each algorithm's result was then compared with GO-term derived functional modules. We also analyzed the effect of using tissue specific networks on the quality of the obtained clusters. The experimental results indicate that the new algorithm outperforms most of the others, and this improvement is more significant when tissue specific networks are used. PMID:24039752

Ghasemi, Mahdieh; Rahgozar, Maseud; Bidkhori, Gholamreza; Masoudi-Nejad, Ali

2013-01-01

262

Block clustering based on difference of convex functions (DC) programming and DC algorithms.  

PubMed

We investigate difference of convex functions (DC) programming and the DC algorithm (DCA) to solve the block clustering problem in the continuous framework, which traditionally requires solving a hard combinatorial optimization problem. DC reformulation techniques and exact penalty in DC programming are developed to build an appropriate equivalent DC program of the block clustering problem. They lead to an elegant and explicit DCA scheme for the resulting DC program. Computational experiments show the robustness and efficiency of the proposed algorithm and its superiority over standard algorithms such as two-mode K-means, two-mode fuzzy clustering, and block classification EM. PMID:23777526

Le, Hoai Minh; Le Thi, Hoai An; Dinh, Tao Pham; Huynh, Van Ngai

2013-10-01

263

SRUMMA: A Matrix Multiplication Algorithm Suitable for Clusters and Scalable Shared Memory Systems  

SciTech Connect

This paper describes a novel parallel algorithm that implements a dense matrix multiplication operation with algorithmic efficiency equivalent to that of the Cannon’s algorithm. It is suitable for clusters and shared memory systems. The current approach differs from the other parallel matrix multiplication algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. The experimental results on clusters (IBM SP, Linux-Myrinet) and shared memory systems (SGI Altix, Cray X1) demonstrate consistent performance advantages over ScaLAPACK pdgemm, the leading implementation of the parallel matrix multiplication algorithms used today. In the best case on the SGI Altix, the new algorithm performs 20 times better than ScaLAPACK for a matrix size of 1000 on 128 processors. The impact of zero-copy nonblocking RMA communications and shared memory communication on matrix multiplication performance on clusters are investigated.

Krishnan, Manoj Kumar; Nieplocha, Jarek

2004-04-30

264

A method for initialising the K-means clustering algorithm using kd-trees  

Microsoft Academic Search

We present a method for initialising the K-means clustering algorithm. Our method hinges on the use of a kd-tree to perform a density estimation of the data at various locations. We then use a modification of Katsavounidis’ algorithm, which incorporates this density information, to choose K seeds for the K-means algorithm. We test our algorithm on 36 synthetic datasets, and

Stephen J. Redmond; Conor Heneghan

2007-01-01

265

Scheme for Implementing Quantum Search Algorithm in a Cluster State Quantum Computer  

NASA Astrophysics Data System (ADS)

Using cluster state and single qubit measurement one can perform the one-way quantum computation. Here we give a detailed scheme for realizing a modified Grover search algorithm using measurements on cluster state. We give the measurement pattern for the cluster-state realization of the algorithm and estimated the number of measurement needed for its implementation. It is found that O(23n/2n2) number of single qubit measurements is required for its realization in a cluster-state quantum computer.

Wang, Yan-Hui; Zhang, Yong

2008-06-01

266

Rapid integration of large airborne geophysical data suites using a fuzzy partitioning cluster algorithm: a tool for geological mapping and mineral exploration targeting  

NASA Astrophysics Data System (ADS)

Unsupervised classification techniques, such as cluster algorithms, are routinely used for structural exploration and integration of multiple frequency bands of remotely sensed spectral datasets. However, up to now, very few attempts have been made towards using unsupervised classification techniques for rapid, automated, and objective information extraction from large airborne geophysical data suites. We employ fuzzy c-means (FCM) cluster analysis for the rapid and largely automated integration of complementary geophysical datasets comprising airborne radiometric and magnetic as well as ground-based gravity data, covering a survey area of approximately 5000km2 located 100km east-south-east of Johannesburg, South Africa, along the south-eastern limb of the Bushveld layered mafic intrusion complex. After preparatory data processing and normalisation, the three datasets are subjected to FCM cluster analysis, resulting in the generation of a zoned integrated geophysical map delineating distinct subsurface units based on the information the three input datasets carry. The fuzzy concept of the cluster algorithm employed also provides information about the significance of the identified zonation. According to the nature of the input datasets, the integrated zoned map carries information from near-surface depositions as well as rocks underneath the sediment cover. To establish a sound geological association of these zones we refer the zoned geophysical map to all available geological information, demonstrating that the zoned geophysical map as obtained from FCM cluster analysis outlines geological units that are related to Bushveld-type, other Proterozoic- and Karoo-aged rocks.

Paasche, Hendrik; Eberle, Detlef G.

2009-09-01

267

An adaptive spatial clustering algorithm based on the minimum spanning tree-like  

NASA Astrophysics Data System (ADS)

Spatial clustering is an important means for spatial data mining and spatial analysis, and it can be used to discover the potential rules and outliers among the spatial data. Most existing spatial clustering methods cannot deal with the uneven density of the data and usually require predefined parameters which are hard to justify. In order to overcome such limitations, we firstly propose the concept of edge variation factor based upon the definition of distance variation among the entities in the spatial neighborhood. Then, an approach is presented to construct the minimum spanning tree-like (MST-L). Further, an adaptive MST-L based spatial clustering algorithm (AMSTLSC) is developed in this paper. The spatial clustering algorithm only involves the setting of the threshold of edge variation factor as an input parameter, which is easily made with the support of little priori information. Through this parameter, a series of MST-L can be automatically generated from the high-density region to the low-density one, where each MST-L represents a cluster. As a result, the algorithm proposed in this paper can adapt to the change of local density among spatial points. This property is also called the adaptiveness. Finally, two tests are implemented to demonstrate that the AMSTLSC algorithm is very robust and suitable to find the clusters with different shapes. Especially the algorithm has good adaptiveness. A comparative test is made to further prove the AMSTLSC algorithm better than classic DBSCAN algorithm.

Deng, Min; Liu, Qiliang; Li, Guangqiang; Cheng, Tao

2009-10-01

268

A Local Graph Clustering Algorithm for Discovering Subgoals in Reinforcement Learning  

NASA Astrophysics Data System (ADS)

Reinforcement Learning studies the problem of learning through interaction with the unknown environment. Learning efficiently in large scale problems and complex tasks demands a decomposition of the original complex task to simple and smaller subtasks. In this paper a local graph clustering algorithm is represented for discovering subgoals. The main advantage of the proposed algorithm is that only the local information of the graph is considered to cluster the agent state space. Subgoals discovered by the algorithm are then used to generate skills. Experimental results show that the proposed subgoal discovery algorithm has a dramatic effect on the learning performance.

Entezari, Negin; Shiri, Mohammad Ebrahim; Moradi, Parham

269

Performance Assessment of the Optical Transient Detector and Lightning Imaging Sensor. Part 2; Clustering Algorithm  

NASA Technical Reports Server (NTRS)

We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.

Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William

2006-01-01

270

A Novel Automatic Detection System for ECG Arrhythmias Using Maximum Margin Clustering with Immune Evolutionary Algorithm  

PubMed Central

This paper presents a novel maximum margin clustering method with immune evolution (IEMMC) for automatic diagnosis of electrocardiogram (ECG) arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias.

Zhu, Bohui; Ding, Yongsheng; Hao, Kuangrong

2013-01-01

271

The clustering algorithm of marine environment vector field data based on "Velocity-Direction" histogram  

NASA Astrophysics Data System (ADS)

Clustering is an effective mean for of marine environment data analysis. This paper proposes a clustering algorithm based on the "Velocity-Direction" histogram. First of all, the "Velocity-Direction" histogram is constructed based on the characteristics of marine environment vector field data. Secondly, the exact surface of histogram is reconstructed by the Gaussian kernel function to eliminate the contaminated data points in "Velocity-Direction" histogram. Finally, the FCM algorithm is introduced and modified for the "Velocity-Direction" histogram clustering. The initial number and clustering centers for the FCM algorithm are set as the local extremum in the constructed histogram surfaces. The experiment results based on the simulation and the NOAA marine environment vector field data verifies the effectiveness of the proposed algorithm.

Zhang, Junda; Tang, Xiao-an; Jiang, Libing

2013-10-01

272

Detecting Clusters of Galaxies in the Sloan Digital Sky Survey. I. Monte Carlo Comparison of Cluster Detection Algorithms  

Microsoft Academic Search

We present a comparison of three cluster-finding algorithms from imaging data using Monte Carlo simulations of clusters embedded in a 25 deg2 region of Sloan Digital Sky Survey (SDSS) imaging data: the matched filter (MF; Postman et al., published in 1996), the adaptive matched filter (AMF; Kepner et al., published in 1999), and a color-magnitude filtered Voronoi tessellation technique (VTT).

Jeremy V. Kepner; Marc Postman; Michael A. Strauss; Neta A. Bahcall; James E. Gunn; Robert H. Lupton; James Annis; Robert C. Nichol; J. Brinkmann; Robert J. Brunner; Andrew Connolly; Istvan Csabai; Robert B. Hindsley; Zeljko Ivezic; Michael S. Vogeley; Donald G. York

2002-01-01

273

Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based approach  

Microsoft Academic Search

We present two results which arise from a model-based approach to hierarchical agglom- erative clustering. First, we show formally that the common heuristic agglomerative clustering algorithms - single-link, complete-link, group- average, and Ward's method - are each equiva- lent to a hierarchical model-based method. This interpretation gives a theoretical explanation of the empirical behavior of these algorithms, as well as

Sepandar D. Kamvar; Dan Klein; Christopher D. Manning

2002-01-01

274

Empirical comparison of fast partitioning-based clustering algorithms for large data sets  

Microsoft Academic Search

Several fast algorithms for clustering very large data sets have been proposed in the literature, including CLARA, CLARANS, GAC-R3, and GAC-RARw. CLARA is a combination of a sampling procedure and the classical PAM algorithm, while CLARANS adopts a serial randomized search strategy to find the optimal set of medoids. GAC-R3 and GAC-RARw exploit genetic search heuristics for solving clustering problems.

Chih-ping Wei; Yen-hsien Lee; Che-ming Hsu

2003-01-01

275

An Novel Dynamic Clustering Algorithm Based on Geographical Location for Wireless Sensor Networks  

Microsoft Academic Search

One of the key problems for wireless sensor networks (WSNs) is how to make the best of limited energy. The conventional clustering method has the unique potential to be the framework for energy-conserving wireless sensor networks .In this paper, a novel dynamic clustering algorithm based on geographical location information(GL-DC)is proposed for WSNs. Comparing with other algorithms, GL-DC has two obvious

Ming Zhang; ChengLong Gong; Yanhong Lu

2008-01-01

276

Chinese Text Clustering Algorithm Based k-means  

NASA Astrophysics Data System (ADS)

Text clustering is an important means and method in text mining. The process of Chinese text clustering based on k-means was emphasized, we found that new center of a cluster was easily effected by isolated text after some experiments. Average similarity of one cluster was used as a parameter, and multiplied it with a modulus between 0.75 and 1.25 to get the similarity threshold value, the texts whose similarity with original cluster center was greater than or equal to the threshold value ware collected as a candidate collection, then updated the cluster center with center of candidate collection. The experiments show that improved method averagely increased purity and F value about 10 percent over the original method.

Yao, Mingyu; Pi, Dechang; Cong, Xiangxiang

277

Incremental Clustering Algorithm for Earth Science Data Mining  

Microsoft Academic Search

Remote sensing data plays a key role in understanding the complex geographic phenomena. Clustering is a useful tool in discovering\\u000a interesting patterns and structures within the multivariate geospatial data. One of the key issues in clustering is the specification\\u000a of appropriate number of clusters, which is not obvious in many practical situations. In this paper we provide an extension\\u000a of

Ranga Raju Vatsavai; Ranga Raju

2009-01-01

278

Clustering of symbolic data using the assignment-prototype algorithm  

Microsoft Academic Search

This paper shows a fuzzy relational clustering method in order to perform the clustering of symbolic data. The presented method yields a fuzzy partition and prototype for each cluster by optimizing an adequacy criterion based on suitable dissimilarity measures. This work considers two volume-based measures that may be applied to data described by set-valued, list-valued or interval-valued symbolic variables. Experiments

Kelly P. Silva; Francisco de A. T. de Carvalho; Marc Csernel

2009-01-01

279

Characterization of secondary organic aerosol particles using aerosol laser time-of-flight mass spectrometer coupled with FCM clustering algorithm  

NASA Astrophysics Data System (ADS)

Experiments for formation of secondary organic aerosol (SOA) from photooxidation of 1,3,5-trimethylbenzene in the CH3ONO/NO/air mixture were carried out in the laboratory chamber. The size and chemical composition of the resultant individual particles were measured in real-time by an aerosol laser time of flight mass spectrometer (ALTOFMS) recently designed in our group. We also developed Fuzzy C-Means (FCM) algorithm to classify the mass spectra of large numbers of SOA particles. The study first started with mixed particles generated from the standards benzaldehyde, phenol, benzoic acid, and nitrobenzene solutions to test the feasibility of application of the FCM. The FCM was then used to extract out potential aerosol classes in the chamber experiments. The results demonstrate that FCM allowed a clear identification of ten distinct chemical particle classes in this study, namely, 3,5-dimethylbenzoic acid, 3,5-dimethylbenzaldehyde, 2,4,6-trimethyl-5-nitrophenol, 2-methyl-4-oxo-2-pentenal, 2,4,6-trimethylphenol, 3,5-dimethyl-2-furanone, glyoxal, and high-molecular-weight (HMW) components. Compared to offline method such as gas chromatography-mass spectrometry (GC-MS) measurement, the real-time ALTOFMS detection approach coupled with the FCM data processing algorithm can make cluster analysis of SOA successfully and provide more information of products. Thus ALTOFMS is a useful tool to reveal the formation and transformation processes of SOA particles in smog chambers.

Huang, Mingqiang; Hao, Liqing; Guo, Xiaoyong; Hu, Changjin; Gu, Xuejun; Zhao, Weixiong; Wang, Zhenya; Fang, Li; Zhang, Weijun

2013-01-01

280

A new clustering algorithm applicable to multispectral and polarimetric SAR images  

NASA Technical Reports Server (NTRS)

We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.

Wong, Yiu-Fai; Posner, Edward C.

1993-01-01

281

Application of Cluster-Based Local Outlier Factor Algorithm in Anti-Money Laundering  

Microsoft Academic Search

Financial institutions' capability in recognizing suspicious money laundering transactional behavioral patterns (SMLTBPs) is critical to antimoney laundering. Combining distance-based unsupervised clustering and local outlier detection, this paper designs a new cluster based local outlier factor (CBLOF) algorithm to identify SMLTBPs and use authentic and synthetic data experimentally to test its applicability and effectiveness.

Gao Zengan

2009-01-01

282

The Effect of Clustering Algorithms on Aftershock Productivity and Foreshock Rates  

Microsoft Academic Search

The properties of earthquake clusters are important for the modeling of short-term hazard. In particular the forecasting of larger events is of societal importance. We apply common declustering algorithms, including Reasenberg, Gardner-Knophoff, and the model independent method by Marsan to the Southern California earthquake data to define earthquake clusters. We model the aftershock productivity as a function of mainshock magnitude

A. Christophersen; S. Wiemer; E. G. Smith

2007-01-01

283

A new clustering algorithm applicable to multispectral and polarimetric SAR images  

SciTech Connect

The authors describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, the authors extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.

Wong, Y.F. (Lawrence Livermore National Lab., CA (United States). Inst. for Scientific Computing Research); Posner, E.C. (California Inst. of Tech., Pasadena, CA (United States). Dept. of Electrical Engineering)

1993-05-01

284

A divide-link algorithm based on fuzzy similarity for clustering networks  

Microsoft Academic Search

In this paper we present an efficient hierarchical clustering algorithm for relational data, being those relations modeled by a graph. The hierarchical clustering approach proposed in this paper is based on divisive and link criteria, to break the graph and join the nodes at different stages. We then apply this approach to a community detection problems based on the well-known

Daniel Gomez; Javier Montero; Javier Yanez

2011-01-01

285

A clustering algorithm for extracting rules from supervised neural network models in data mining tasks  

Microsoft Academic Search

The main challenge to the use of supervised neural networks in data mining applications is to get explicit knowledge from these models. For this purpose, a clustering genetic algorithm for rule extraction from artiÞcial neural networks is developed. The methodology is based on the clustering of the hidden unit activation values. A simple encoding scheme that yields to constant- length

Eduardo R. Hruschka; Nelson F. F. Ebecken

2000-01-01

286

The Application of Artificial Immune Clustering Algorithm in Division of Electric Load Forecasting  

Microsoft Academic Search

Electric load forecasting plays a vital role in the safety of power system. To overcome the unreasonableness in the division of electric load forecasting artificially, in this paper, a new method which is the division of electric load forecasting based on artificial immune clustering algorithm is proposed. To reduce the scale of data and the redundancy information of the clustering

Lianfu Yao; Dawei Jiang; Nan Cheng

2009-01-01

287

A Study of Clustering Applied to Multiple Target Tracking Algorithm1  

Microsoft Academic Search

In this paper the effectiveness of two Data Association algorithms for Multiple Target Tracking (MTT) based on Global Nearest Neighbor approach are compared. As the time for assignment problem solution increases nonlinearly depending on the problem size, it is useful to divide the whole scenario on small groups of targets called clusters. For each cluster the assignment problem is solved

Pavlina Konstantinova; Milen Nikolov; Tzvetan Semerdjiev

2004-01-01

288

MMR: An algorithm for clustering categorical data using Rough Set Theory  

Microsoft Academic Search

A variety of cluster analysis techniques exist to group objects having similar characteristics. However, the implementation of many of these techniques is challenging due to the fact that much of the data contained in today’s databases is categorical in nature. While there have been recent advances in algorithms for clustering categorical data, some are unable to handle uncertainty in the

Darshit Parmar; Teresa Wu; Jennifer Blackhurst

2007-01-01

289

A smart clustering algorithm for photo set obtained from multiple digital cameras  

Microsoft Academic Search

The use of digital cameras is prevalent. Although the cost of digital photographs is low, managing numerous digital photos is burdensome to most users. An intelligent management tool for digital photos is needed. We propose a novel clustering algorithm for concurrent digital photos obtained from multiple cameras. Since previous photo clustering methods can be applied to a single camera, a

Chul-jin Jang; Taijin Yoon; Hwan-gue Cho

2009-01-01

290

A hidden Markov model-based K-means time series clustering algorithm  

Microsoft Academic Search

Aimed at some shortages in the existing time series clustering methods based on hidden Markov model(HMM), such as longer sequence and equal length, a hidden Markov model-based k-means time series clustering algorithm is proposed, whose objective function is the joint likelihood function. At first, an initial partition is obtained by unsupervised clustering of the time series using dynamic time warping

Li-Li Wei; Jing-Qiang Jiang

2010-01-01

291

Dynamic Maximum Entropy algorithms for clustering and coverage control  

Microsoft Academic Search

The dynamic coverage problem is increasingly found in a wide variety of areas, for example, from the development of mobile sensor networks, to the analysis of clustering in spatio-temporal dynamics of brain signals. In this paper, we apply control-theoretic methods to locate and track cluster center dynamics and show that dynamic control design is necessary to achieve dynamic coverage of

Yunwen Xu; Srinivasa M. Salapaka; Carolyn L. Beck

2010-01-01

292

A Conceptual Clustering Algorithm for Database Schema Design  

Microsoft Academic Search

Conceptual clustering techniques based on current theories of categorization provide a way to design database schemas that more accurately represent classes. An approach is presented in which classes are treated as complex clusters of concepts rather than as simple predicates. An important service provided by the database is determining whether a particular instance is a member of a class. A

Howard W. Beck; Tarek M. Anwar; Shamkant B. Navathe

1994-01-01

293

Analysis of a Simple k-Means Clustering Algorithm.  

National Technical Information Service (NTIS)

K-means clustering is a very popular clustering technique which is used in numerous applications. Given a set of n data points in R(exp d) and an integer k, the problem is to determine a set of k points R(exp d), called centers, so as to minimize the mean...

C. Piatko D. M. Mount N. S. Netanyahu R. Silverman T. Kanungo

2000-01-01

294

A fast general-purpose clustering algorithm based on FPGAs for high-throughput data processing  

NASA Astrophysics Data System (ADS)

We present a fast general-purpose algorithm for high-throughput clustering of data "with a two-dimensional organization". The algorithm is designed to be implemented with FPGAs or custom electronics. The key feature is a processing time that scales linearly with the amount of data to be processed. This means that clustering can be performed in pipeline with the readout, without suffering from combinatorial delays due to looping multiple times through all the data. This feature makes this algorithm especially well suited for problems where the data have high density, e.g. in the case of tracking devices working under high-luminosity condition such as those of LHC or super-LHC. The algorithm is organized in two steps: the first step (core) clusters the data; the second step analyzes each cluster of data to extract the desired information. The current algorithm is developed as a clustering device for modern high-energy physics pixel detectors. However, the algorithm has much broader field of applications. In fact, its core does not specifically rely on the kind of data or detector it is working for, while the second step can and should be tailored for a given application. For example, in case of spatial measurement with silicon pixel detectors, the second step performs center of charge calculation. Applications can thus be foreseen to other detectors and other scientific fields ranging from HEP calorimeters to medical imaging. An additional advantage of this two steps approach is that the typical clustering related calculations (second step) are separated from the combinatorial complications of clustering. This separation simplifies the design of the second step and it enables it to perform sophisticated calculations achieving offline quality in online applications. The algorithm is general purpose in the sense that only minimal assumptions on the kind of clustering to be performed are made.

Annovi, A.; Beretta, M.

2010-05-01

295

An Efficient Document Clustering Algorithm and Its Application to a Document Browser.  

ERIC Educational Resources Information Center

Presents a document-clustering algorithm that uses a term frequency vector for each document in a Japanese collection to produce a hierarchy in the form of a document classification tree. Introduces an application of this algorithm to a Japanese-to-English translation-aid system. (Author/LRW)

Tanaka, Hideki; Kumano, Tadashi; Uratani, Noriyoshi; Ehara, Terumasa

1999-01-01

296

A hierarchical monothetic document clustering algorithm for summarization and browsing search results  

Microsoft Academic Search

Organizing Web search results into a hierarchy of topics and sub-topics facilitates browsing the collection and locating results of interest. In this paper, we propose a new hierarchical monothetic clustering algorithm to build a topic hierarchy for a collection of search results retrieved in response to a query. At every level of the hierarchy, the new algorithm progressively identifies topics

Krishna Kummamuru; Rohit Lotlikar; Shourya Roy; Karan Singal; Raghu Krishnapuram

2004-01-01

297

Exploratory data mining lead by text mining using a novel high dimensional clustering algorithm  

Microsoft Academic Search

Text mining has emerged as a different stream in data mining because of the unstructured nature associated with free text. Many algorithms have been developed to assist in text mining. This paper presents the use of text mining based on a novel high dimensional clustering algorithm that leads to the exploratory data mining on data associated with the text. Experimental

Rasika Amarasiri; Jason Ceddia; Damminda Alahakoon

2005-01-01

298

Energy Efficient Prediction-Based Clustering Algorithm for Target Tracking in Wireless Sensor Networks  

Microsoft Academic Search

Nowadays energy efficiency has been a main challenge in wireless sensor networks (WSNs) and their applications. Target tracking is one of the most important of these applications. In this paper we propose an energy efficient prediction-based clustering algorithm for target tracking in WSNs. Our algorithm attempts to decreases transmission distance between transmitter and receiver nodes and decreases the number of

Fatemeh Deldar; Mohammad Hossien Yaghmaee

2010-01-01

299

Application of shuffled frog-leaping algorithm on clustering  

Microsoft Academic Search

Evolutionary algorithms, such as shuffled frog leaping, are stochastic search methods that mimic natural biological evolution\\u000a and\\/or the social behavior of species. Such algorithms have been developed to arrive at near-optimum solutions to complex\\u000a and large-scale optimization problems which cannot be solved by gradient-based mathematical programming techniques. The shuffled\\u000a frog-leaping algorithm draws its formulation from two other search techniques: the

Babak Amiri; Mohammad Fathian; Ali Maroosi

2009-01-01

300

Empirical relations between static and dynamic exponents for Ising model cluster algorithms  

Microsoft Academic Search

We have measured the autocorrelations for the Swendsen-Wang and the Wolff cluster update algorithms for the Ising model in two, three, and four dimensions. The data for the Wolff algorithm suggest that the autocorrelations are linearly related to the specific heat, in which case the dynamic critical exponent is zint,EW=alpha\\/nu. For the Swendsen-Wang algorithm, scaling the autocorrelations by the average

Paul D. Coddington; Clive F. Baillie

1992-01-01

301

An improved clustering algorithm of tunnel monitoring data for cloud computing.  

PubMed

With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971

Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing

2014-01-01

302

An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing  

PubMed Central

With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data.

Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing

2014-01-01

303

Learning assignment order of instances for the constrained K-means clustering algorithm.  

PubMed

The sensitivity of the constrained K-means clustering algorithm (Cop-Kmeans) to the assignment order of instances is studied, and a novel assignment order learning method for Cop-Kmeans, termed as clustering Uncertainty-based Assignment order Learning Algorithm (UALA), is proposed in this paper. The main idea of UALA is to rank all instances in the data set according to their clustering uncertainties calculated by using the ensembles of multiple clustering algorithms. Experimental results on several real data sets with artificial instance-level constraints demonstrate that UALA can identify a good assignment order of instances for Cop-Kmeans. In addition, the effects of ensemble sizes on the performance of UALA are analyzed, and the generalization property of Cop-Kmeans is also studied. PMID:19109091

Hong, Yi; Kwong, Sam

2009-04-01

304

A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique  

PubMed Central

Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.

Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza

2014-01-01

305

A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression  

PubMed Central

Background Cancer subtype information is critically important for understanding tumor heterogeneity. Existing methods to identify cancer subtypes have primarily focused on utilizing generic clustering algorithms (such as hierarchical clustering) to identify subtypes based on gene expression data. The network-level interaction among genes, which is key to understanding the molecular perturbations in cancer, has been rarely considered during the clustering process. The motivation of our work is to develop a method that effectively incorporates molecular interaction networks into the clustering process to improve cancer subtype identification. Results We have developed a new clustering algorithm for cancer subtype identification, called “network-assisted co-clustering for the identification of cancer subtypes” (NCIS). NCIS combines gene network information to simultaneously group samples and genes into biologically meaningful clusters. Prior to clustering, we assign weights to genes based on their impact in the network. Then a new weighted co-clustering algorithm based on a semi-nonnegative matrix tri-factorization is applied. We evaluated the effectiveness of NCIS on simulated datasets as well as large-scale Breast Cancer and Glioblastoma Multiforme patient samples from The Cancer Genome Atlas (TCGA) project. NCIS was shown to better separate the patient samples into clinically distinct subtypes and achieve higher accuracy on the simulated datasets to tolerate noise, as compared to consensus hierarchical clustering. Conclusions The weighted co-clustering approach in NCIS provides a unique solution to incorporate gene network information into the clustering process. Our tool will be useful to comprehensively identify cancer subtypes that would otherwise be obscured by cancer heterogeneity, using high-throughput and high-dimensional gene expression data.

2014-01-01

306

Clustering with Repulsive Prototypes  

NASA Astrophysics Data System (ADS)

Although there is no exact definition for the term cluster, in the 2D case, it is fairly easy for human beings to decide which objects belong together. For machines on the other hand, it is hard to determine which objects form a cluster. Depending on the problem, the success of a clustering algorithm depends on the idea of their creators about what a cluster should be. Likewise, each clustering algorithm comprises a characteristic idea of the term cluster. For example the fuzzy c-means algorithm (Kruse et al., Advances in Fuzzy Clustering and Its Applications, Wiley, New York, 2007, pp. 3-30; Höppner et al., Fuzzy Clustering, Wiley, Chichester, 1999) tends to find spherical clusters with equal numbers of objects. Noise clustering (Rehm et al., Soft Computing - A Fusion of Foundations, Methodologies and Applications 11(5):489-494) focuses on finding spherical clusters of user-defined diameter. In this paper, we present an extension to noise clustering that tries to maximize the distances between prototypes. For that purpose, the prototypes behave like repulsive magnets that have an inertia depending on their sum of membership values. Using this repulsive extension, it is possible to prevent that groups of objects are divided into more than one cluster. Due to the repulsion and inertia, we show that it is possible to determine the number and approximate position of clusters in a data set.

Winkler, Roland; Rehm, Frank; Kruse, Rudolf

307

Two generalizations of Kohonen clustering  

NASA Technical Reports Server (NTRS)

The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.

Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.

1993-01-01

308

AntClass: discovery of clusters in numeric data by an hybridization of an ant colony with the Kmeans algorithm  

Microsoft Academic Search

We present in this paper a new hybrid algorithm for data clustering. This algorithm discovers automatically clusters in numerical data without prior knowledge of a possible number of classes, without any initial partition, and without complex parameter settings. It uses the stochastic and exploratory principles of an ant colony with the deterministic and heuristic principles of the Kmeans algorithm. Ants

Nicolas Monmarché; Mohamed Slimane; Gilles Venturini

1999-01-01

309

Fuzzy-rough supervised attribute clustering algorithm and classification of microarray data.  

PubMed

One of the major tasks with gene expression data is to find groups of coregulated genes whose collective expression is strongly associated with sample categories. In this regard, a new clustering algorithm, termed as fuzzy-rough supervised attribute clustering (FRSAC), is proposed to find such groups of genes. The proposed algorithm is based on the theory of fuzzy-rough sets, which directly incorporates the information of sample categories into the gene clustering process. A new quantitative measure is introduced based on fuzzy-rough sets that incorporates the information of sample categories to measure the similarity among genes. The proposed algorithm is based on measuring the similarity between genes using the new quantitative measure, whereby redundancy among the genes is removed. The clusters are refined incrementally based on sample categories. The effectiveness of the proposed FRSAC algorithm, along with a comparison with existing supervised and unsupervised gene selection and clustering algorithms, is demonstrated on six cancer and two arthritis data sets based on the class separability index and predictive accuracy of the naive Bayes' classifier, the K-nearest neighbor rule, and the support vector machine. PMID:20542768

Maji, Pradipta

2011-02-01

310

An Effective Intrusion Detection Algorithm Based on Improved Semi-supervised Fuzzy Clustering  

NASA Astrophysics Data System (ADS)

An algorithm for intrusion detection based on improved evolutionary semi- supervised fuzzy clustering is proposed which is suited for situation that gaining labeled data is more difficulty than unlabeled data in intrusion detection systems. The algorithm requires a small number of labeled data only and a large number of unlabeled data and class labels information provided by labeled data is used to guide the evolution process of each fuzzy partition on unlabeled data, which plays the role of chromosome. This algorithm can deal with fuzzy label, uneasily plunges locally optima and is suited to implement on parallel architecture. Experiments show that the algorithm can improve classification accuracy and has high detection efficiency.

Li, Xueyong; Zhang, Baojian; Sun, Jiaxia; Yan, Shitao

311

Irreversible Growth Algorithm for Branched Polymers (Lattice Animals), and Their Relation to Colloidal Cluster-Cluster Aggregates  

NASA Astrophysics Data System (ADS)

We prove that a new, irreversible growth algorithm, Non-Deletion Reaction-Limited Cluster-cluster Aggregation (NDRLCA), produces equilibrium Branched Polymers, expected to exhibit Lattice Animal statistics [1]. We implement NDRLCA, off-lattice, as a computer simulation for embedding dimension d=2 and 3, obtaining values for critical exponents, fractal dimension D and cluster mass distribution exponent tau: d=2, D? 1.53± 0.05, tau = 1.09± 0.06; d=3, D=1.96± 0.04, tau =1.50± 0.04 in good agreement with theoretical LA values. The simulation results do not support recent suggestions [2] that BPs may be in the same universality class as percolation. We also obtain values for a model-dependent critical “fugacity”, z_c and investigate the finite-size effects of our simulation, quantifying notions of “inbreeding” that occur in this algorithm. Finally we use an extension of the NDRLCA proof to show that standard Reaction-Limited Cluster-cluster Aggregation is very unlikely to be in the same universality class as Branched Polymers/Lattice Animals unless the backnone dimension for the latter is considerably less than the published value.

Ball, R. C.; Lee, J. R.

1996-03-01

312

An Outlier Detection Algorithm Based on Spectral Clustering  

Microsoft Academic Search

Outlier detection is widely used for many areas such as credit card fraud detection, discovery of criminal activities in electronic commerce, weather prediction and marketing. In this paper, we demonstrate the effectiveness of spectral clustering in dataset with outliers. Through spectral method we can use the information of feature space with eigenvectors rather than that of the whole dataset to

Peng Yang; Biao Huang

2008-01-01

313

A biologically inspired algorithm for microcalcification cluster detection  

Microsoft Academic Search

The early detection of breast cancer greatly improves prognosis. One of the earliest signs of cancer is the formation of clusters of microcalcifications. We introduce a novel method for microcalcification detection based on a biologically inspired adaptive model of contrast detection. This model is used in conjunction with image filtering based on anisotropic diffusion and curvilinear structure removal using local

Marius George Linguraru; Kostas Marias; Ruth E. English; Michael Brady

2006-01-01

314

Algorithms for clustering expressed sequence tags: the wcd tool  

Microsoft Academic Search

Understanding which genes are active, and when and why, is an important question for molecular biology. Expressed Sequence Tags (ESTs) are a technology used to explore the transcriptome (a record of this gene activity). ESTs are short fragments of DNA created in the laboratory from mRNA extracted from a cell. The key computational step in their processing is clustering: putting

Scott Hazelhurst

2008-01-01

315

An efficient clustering algorithm for partitioning Y-short tandem repeats data  

PubMed Central

Background Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. Results Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91), k-Modes-RVF (0.81), New Fuzzy k-Modes (0.80), k-Modes (0.76), k-Modes-Hybrid 1 (0.76), k-Modes-Hybrid 2 (0.75), Fuzzy k-Modes (0.74), and k-Modes-UAVM (0.70). Conclusions The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k)) and considered to be linear.

2012-01-01

316

Clustering WHO-ART terms using semantic distance and machine learning algorithms.  

PubMed

WHO-ART was developed by the WHO collaborating centre for international drug monitoring in order to code adverse drug reactions. We assume that computation of semantic distance between WHO-ART terms may be an efficient way to group related medical conditions in the WHO database in order to improve signal detection. Our objective was to develop a method for clustering WHO-ART terms according to some proximity of their meanings. Our material comprises 758 WHO-ART terms. A formal definition was acquired for each term as a list of elementary concepts belonging to SNOMED international axes and characterized by modifier terms in some cases. Clustering was implemented as a terminology service on a J2EE server. Two different unsupervised machine learning algorithms (KMeans, Pvclust) clustered WHO-ART terms according to a semantic distance operator previously described. Pvclust grouped 51% of WHO-ART terms. K-Means grouped 100% of WHO-ART terms but 25% clusters were heterogeneous with k = 180 clusters and 6% clusters were heterogeneous with k = 32 clusters. Clustering algorithms associated to semantic distance could suggest potential groupings of WHO-ART terms that need validation according to the user's requirements. PMID:17238365

Iavindrasana, Jimison; Bousquet, Cedric; Degoulet, Patrice; Jaulent, Marie-Christine

2006-01-01

317

Computational Identification of Transcription Factor Binding Sites via a Transcription-factor-centric Clustering (TFCC) Algorithm  

Microsoft Academic Search

While microarray-based expression profiling has facilitated the use of computational methods to find potential cis-regulatory promoter elements, few current in silico approaches explicitly link regulatory motifs with the transcription factors that bind them. We have thus developed a TF-centric clustering (TFCC) algorithm that may provide such missing information through incorporation of biological knowledge about TFs. TFCC is a semi-supervised clustering

Zhou Zhu; Yitzhak Pilpel; George M. Church

2002-01-01

318

Optimization models and algorithms for the hyperplane clustering problem  

Microsoft Academic Search

This is a summary of the author’s PhD thesis supervised by Edoardo Amaldi and defended on 3 April 2009 at the Politecnico\\u000a di Milano. The thesis is written in English and is available from the author upon request. In this work, we extensively study\\u000a two challenging variants of the general problem of clustering a given set of data points with respect

Kanika Dhyani

2010-01-01

319

Track clustering and vertexing algorithm for L1 trigger  

SciTech Connect

One of the keystones of the canceled BTeV experiment (proposed at Fermilab's Tevatron) was its sophisticated three-level trigger. The trigger was designed to reject 99.9% of light-quark background events and retain a large number of B decays. The BTeV Pixel Detector provided a 3-dimensional, high resolution tracking system to detect B signatures. The Level 1 pixel detector trigger was proposed as a two stage process, a track-segment finder and a vertex finder which analyzed every accelerator crossing. In simulations the track-segment finder stage outputs an average of 200 track-segments per accelerator crossing (2.5MHz). The vertexing stage finds vertices and associates track-segments with the vertices found. This paper proposes a novel adaptive pattern recognition model to find the number and the estimated location of vertices, and to cluster track-segments around those vertices. The track clustering and vertex finding is done in parallel. The pattern recognition model also generates the estimate of other important parameters such as the covariance matrix of the cluster vertices and the minimum distances from the tracks to the vertices needed to compute detached tracks.

Cancelo, Gustavo I.; /Fermilab

2005-10-01

320

Semi-supervised clustering algorithm for community structure detection in complex networks  

NASA Astrophysics Data System (ADS)

Discovering a community structure is fundamental for uncovering the links between structure and function in complex networks. In this paper, we discuss an equivalence of the objective functions of the symmetric nonnegative matrix factorization (SNMF) and the maximum optimization of modularity density. Based on this equivalence, we develop a new algorithm, named the so-called SNMF-SS, by combining SNMF and a semi-supervised clustering approach. Previous NMF-based algorithms often suffer from the restriction of measuring network topology from only one perspective, but our algorithm uses a semi-supervised mechanism to get rid of the restriction. The algorithm is illustrated and compared with spectral clustering and NMF by using artificial examples and other classic real world networks. Experimental results show the significance of the proposed approach, particularly, in the cases when community structure is obscure.

Ma, Xiaoke; Gao, Lin; Yong, Xuerong; Fu, Lidong

2010-01-01

321

An Efficient Algorithm for Clustering of Large-Scale Mass Spectrometry Data.  

PubMed

High-throughput spectrometers are capable of producing data sets containing thousands of spectra for a single biological sample. These data sets contain a substantial amount of redundancy from peptides that may get selected multiple times in a LC-MS/MS experiment. In this paper, we present an efficient algorithm, CAMS (Clustering Algorithm for Mass Spectra) for clustering mass spectrometry data which increases both the sensitivity and confidence of spectral assignment. CAMS utilizes a novel metric, called F-set, that allows accurate identification of the spectra that are similar. A graph theoretic framework is defined that allows the use of F-set metric efficiently for accurate cluster identifications. The accuracy of the algorithm is tested on real HCD and CID data sets with varying amounts of peptides. Our experiments show that the proposed algorithm is able to cluster spectra with very high accuracy in a reasonable amount of time for large spectral data sets. Thus, the algorithm is able to decrease the computational time by compressing the data sets while increasing the throughput of the data by interpreting low S/N spectra. PMID:23471471

Saeed, Fahad; Pisitkun, Trairak; Knepper, Mark A; Hoffert, Jason D

2012-10-01

322

An efficient method of key-frame extraction based on a cluster algorithm.  

PubMed

This paper proposes a novel method of key-frame extraction for use with motion capture data. This method is based on an unsupervised cluster algorithm. First, the motion sequence is clustered into two classes by the similarity distance of the adjacent frames so that the thresholds needed in the next step can be determined adaptively. Second, a dynamic cluster algorithm called ISODATA is used to cluster all the frames and the frames nearest to the center of each class are automatically extracted as key-frames of the sequence. Unlike many other clustering techniques, the present improved cluster algorithm can automatically address different motion types without any need for specified parameters from users. The proposed method is capable of summarizing motion capture data reliably and efficiently. The present work also provides a meaningful comparison between the results of the proposed key-frame extraction technique and other previous methods. These results are evaluated in terms of metrics that measure reconstructed motion and the mean absolute error value, which are derived from the reconstructed data and the original data. PMID:24511336

Zhang, Qiang; Yu, Shao-Pei; Zhou, Dong-Sheng; Wei, Xiao-Peng

2013-12-18

323

An Efficient Method of Key-Frame Extraction Based on a Cluster Algorithm  

PubMed Central

This paper proposes a novel method of key-frame extraction for use with motion capture data. This method is based on an unsupervised cluster algorithm. First, the motion sequence is clustered into two classes by the similarity distance of the adjacent frames so that the thresholds needed in the next step can be determined adaptively. Second, a dynamic cluster algorithm called ISODATA is used to cluster all the frames and the frames nearest to the center of each class are automatically extracted as key-frames of the sequence. Unlike many other clustering techniques, the present improved cluster algorithm can automatically address different motion types without any need for specified parameters from users. The proposed method is capable of summarizing motion capture data reliably and efficiently. The present work also provides a meaningful comparison between the results of the proposed key-frame extraction technique and other previous methods. These results are evaluated in terms of metrics that measure reconstructed motion and the mean absolute error value, which are derived from the reconstructed data and the original data.

Zhang, Qiang; Yu, Shao-Pei; Zhou, Dong-Sheng; Wei, Xiao-Peng

2013-01-01

324

On the convergence of a clustering algorithm for protein-coding regions in microbial genomes  

Microsoft Academic Search

Motivation: As the number of fully sequenced prokaryotic genomes continues to grow rapidly, computational meth- ods for reliably detecting protein-coding regions become even more important. Audic and Claverie (1998)Proc. Natl Acad. Sci. USA, 95, 10026-10031, have proposed a clustering algorithm for protein-coding regions in mi- crobial genomes. The algorithm is based on three Markov models of order k associated with

Pierre Baldi

2000-01-01

325

A multidimensional flocking algorithm for clustering spatial data  

Microsoft Academic Search

In this paper, we describe the e-cient imple- mentation of M-Sparrow, an adaptive ?ocking algorithm based on the biology-inspired paradigm of a ?ock of birds. We extended the classical ?ock model of Reynolds with two new characteristics: the movement in a multi-dimensional space and difierent kinds of birds. The birds, in this con- text, are used to discovery point having

Antonio Augimeri; Gianluigi Folino; Agostino Forestiero; Giandomenico Spezzano

2006-01-01

326

Straight-Line Drawing Algorithms for Hierarchical Graphs and Clustered Graphs  

Microsoft Academic Search

Hierarchical graphs and clustered graphs are useful non-classical graph models for structured relational information. Hierarchical\\u000a graphs are\\u000a graphs with layering structures; clustered graphs are graphs with\\u000a recursive clustering structures. Both have applications in CASE tools, software visualization and VLSI design. Drawing algorithms\\u000a for hierarchical\\u000a graphs have been well investigated. However, the problem of planar straight-line representation has not been solved

Peter Eades; Qing-wen Feng; Xuemin Lin; Hiroshi Nagamochi

2006-01-01

327

Distributed Clustering Algorithm to Explore Selection Diversity in Wireless Sensor Networks  

NASA Astrophysics Data System (ADS)

This paper presents a novel cross-layer approach to explore selection diversity for distributed clustering based wireless sensor networks (WSNs) by selecting a proper cluster-head. We develop and analyze an instantaneous channel state information (CSI) based cluster-head selection algorithm for a distributed, dynamic and randomized clustering based WSN. The proposed cluster-head selection scheme is also random and capable to distribute the energy uses among the nodes in the network. We present an analytical approach to evaluate the energy efficiency and system lifetime of our proposal. Analysis shows that the proposed scheme outperforms the performance of additive white Gaussian noise (AWGN) channel under Rayleigh fading environment. This proposal also outperforms the existing cooperative diversity protocols in terms of system lifetime and implementation complexity.

Kong, Hyung-Yun; Asaduzzaman, Hyung-Yun

328

Solvent-Shift Monte Carlo: A cluster algorithm for solvated atomistic and coarse-grained systems  

NASA Astrophysics Data System (ADS)

We present a cluster algorithm for the efficient simulation of solvated molecules that we term solvent-shift Monte Carlo (SSMC). The algorithm involves a conformational change in a solvated solute molecule of interest, followed by a geometrical rotation of solvent particles. The method satisfies detailed balance and can be applied to existing schemes to sample conformational space, where an axis or plane of rotation can be defined. We demonstrate that the algorithm significantly enhances the sampling of phase space in solvated systems, and may be easily combined with other advanced sampling techniques such as parallel tempering.

Earl, David; Hixson, Christopher; Benigni, James

2009-03-01

329

Experimental realization of the Deutsch-Jozsa algorithm with a six-qubit cluster state  

SciTech Connect

We describe an experimental realization of the Deutsch-Jozsa quantum algorithm to evaluate the properties of a two-bit Boolean function in the framework of one-way quantum computation. For this purpose, a two-photon six-qubit cluster state was engineered. Its peculiar topological structure is the basis of the original measurement pattern allowing the algorithm realization. The good agreement of the experimental results with the theoretical predictions, obtained at {approx}1 kHz success rate, demonstrates the correct implementation of the algorithm.

Vallone, Giuseppe [Museo Storico della Fisica e Centro Studi e Ricerche Enrico Fermi, Via Panisperna 89/A, Compendio del Viminale, IT-00184 Roma (Italy); Dipartimento di Fisica, Universita Sapienza di Roma, IT-00185 Roma (Italy); Donati, Gaia; Bruno, Natalia; Chiuri, Andrea [Dipartimento di Fisica, Universita Sapienza di Roma, IT-00185 Roma (Italy); Mataloni, Paolo [Dipartimento di Fisica, Universita Sapienza di Roma, IT-00185 Roma (Italy); Istituto Nazionale di Ottica (INO-CNR), L.go E. Fermi 6, IT-50125 Florence (Italy)

2010-05-15

330

An investigation of linguistic features and clustering algorithms for topical document clustering  

Microsoft Academic Search

We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, and single-pass) and two linguistically motivated text features (noun phrase heads and proper names) in the context of document clustering. A statistical model for combining similarity information from multiple sources is described and applied to DARPA's Topic Detection and Tracking phase 2 (TDT2) data. This model, based on log-linear regression, alleviates

Vasileios Hatzivassiloglou; Luis Gravano; Ankineedu Maganti

2000-01-01

331

Comparing Clustering on Symbolic Data  

Microsoft Academic Search

Although various dissimilarity functions for symbolic data clustering are available in the literature, little attention has\\u000a thus far been paid to making a comparison between such different distance measures. This paper presents a comparative study\\u000a of some well known dissimilarity functions treating symbolic data. A version of the fuzzy c-means clustering algorithm is\\u000a used to create groups of individuals characterized

Alzennyr Da Silva; Yves Lechevallier; Francisco De A. T. De Carvalho

2009-01-01

332

Analyzing Distance Measures for Symbolic Data Based on Fuzzy Clustering  

Microsoft Academic Search

Various propositions to solve the problem of symbolic data clustering are available in the literature. This paper introduces a comparative study among some well known dissimilarity functions treating symbolic data. An extension of the fuzzy c-means clustering algorithm is used to create groups of individuals characterized by symbolic variables of mixed types. The proposed method furnishes a fuzzy partition and

Alzennyr da Silva; Yves Lechevallier; Francisco de Carvalho

2007-01-01

333

Relational graph clustering based on spectral coefficient angle  

Microsoft Academic Search

This paper introduces a relational graph representation method using the angle between spectral coefficient vectors. A relational graph clustering system builds on this presentation method. The system adopts fuzzy C-mean (FCM) as clustering algorithm. FCM exerts on the pattern space which embedded by locality preserving projections (LPP). The pattern space obtains from Laplacian matrix constructed by the corner points oriented

Min Kong; Jin Tang; Bin Luo

2008-01-01

334

Based on the TF fast clustering algorithm steel surface defect feature extraction and classification  

NASA Astrophysics Data System (ADS)

To detect steel plate surface defect and collect the defect feature, this paper puts forward a steel plate surface defect detection method based on TF fast clustering algorithm, which runs fast and timely in the field of industrial fields, such as shipyard. According to the gray characteristics and geometrical characteristics, several common defects are divided into simple classifications.

Yu, Zhiwei; Xiong, Mudi; Niu, Zhuqing

2013-10-01

335

Comparison of clustering algorithms on generalized propensity score in observational studies: a simulation study  

Microsoft Academic Search

In observational studies, unbalanced observed covariates between treatment groups often cause biased inferences on the estimation of treatment effects. Recently, generalized propensity score (GPS) has been proposed to overcome this problem; however, a practical technique to apply the GPS is lacking. This study demonstrates how clustering algorithms can be used to group similar subjects based on transformed GPS. We compare

Chunhao Tu; Shuo Jiao; Woon Yuen Koh

2012-01-01

336

A parallel point cloud clustering algorithm for subset segmentation and outlier detection  

Microsoft Academic Search

We present a fast point cloud clustering technique which is suitable for outlier detection, object segmentation and region labeling for large multi-dimensional data sets. The basis is a minimal data structure similar to a kd-tree which enables us to detect connected subsets very fast. The proposed algorithms utilizing this tree structure are parallelizable which further increases the computation speed for

Christian Teutsch; Erik Trostmann; Dirk Berndt

2011-01-01

337

Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm  

Microsoft Academic Search

Motivation: The identification of regulatory control regions within genomes is a major challenge. Studies have demon- strated that regulating regions can be described as locally dense clusters or modules of cis-acting transcription fac- tor binding sites (TFBS). For well-described biological con- texts, it is possible to train predictive algorithms to discern novel modules in genome sequences. However, utility of module

Öjvind Johansson; Wynand Alkema; Wyeth W. Wasserman; Jens Lagergren

2003-01-01

338

Combinatorial time series forecasting based on clustering algorithms and neural networks  

Microsoft Academic Search

Time series analysis utilising more than a single forecasting approach is a procedure originated many years ago as an attempt to improve the performance of the individual model forecasts. In the literature there is a wide range of different approaches but their success depends on the forecasting performance of the individual schemes. A clustering algorithm is often employed to distinguish

A. Sfetsos; C. Siriopoulos

2004-01-01

339

CGMGRAPH\\/CGMLIB: Implementing and Testing CGM Graph Algorithms on PC Clusters and Shared Memory Machines  

Microsoft Academic Search

In this paper, we present CGMgraph, the first integrated library of parallel graph methods for PC clusters based on Coarse Grained Multicomputer (CGM) algorithms. CGM- graph implements parallel methods for various graph problems. Our implementations of deterministic list rank- ing, Euler tour, connected components, spanning forest, and bipartite graph detection are, to our knowledge, the first efficient implementations for PC

Albert Chan; Frank K. H. A. Dehne; Ryan Taylor

2005-01-01

340

A Multilevel Gamma-Clustering Layout Algorithm for Visualization of Biological Networks  

PubMed Central

Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel ?-clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs.

Hruz, Tomas; Lucas, Christoph; Laule, Oliver; Zimmermann, Philip

2013-01-01

341

A multilevel gamma-clustering layout algorithm for visualization of biological networks.  

PubMed

Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel ? -clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs. PMID:23864855

Hruz, Tomas; Wyss, Markus; Lucas, Christoph; Laule, Oliver; von Rohr, Peter; Zimmermann, Philip; Bleuler, Stefan

2013-01-01

342

NEW MDS AND CLUSTERING BASED ALGORITHMS FOR PROTEIN MODEL QUALITY ASSESSMENT AND SELECTION  

PubMed Central

In protein tertiary structure prediction, assessing the quality of predicted models is an essential task. Over the past years, many methods have been proposed for the protein model quality assessment (QA) and selection problem. Despite significant advances, the discerning power of current methods is still unsatisfactory. In this paper, we propose two new algorithms, CC-Select and MDS-QA, based on multidimensional scaling and k-means clustering. For the model selection problem, CC-Select combines consensus with clustering techniques to select the best models from a given pool. Given a set of predicted models, CC-Select first calculates a consensus score for each structure based on its average pairwise structural similarity to other models. Then, similar structures are grouped into clusters using multidimensional scaling and clustering algorithms. In each cluster, the one with the highest consensus score is selected as a candidate model. For the QA problem, MDS-QA combines single-model scoring functions with consensus to determine more accurate assessment score for every model in a given pool. Using extensive benchmark sets of a large collection of predicted models, we compare the two algorithms with existing state-of-the-art quality assessment methods and show significant improvement.

WANG, QINGGUO; SHANG, CHARLES; XU, DONG

2014-01-01

343

A novel gray clustering filtering algorithms for identifying the false alert in aircraft long-distance fault diagnosis  

Microsoft Academic Search

The fault report is downloaded from the aircraft with ACARS for the line maintenance. This is widely attended currently. But the false alert often occurs in the fault report and drop the maintenance efficiency Aimed at the problem, the gray clustering filtering algorithms is set up based on gray cluster and filter theory .The algorithms can identify the false alert

Hong Geng

2007-01-01

344

A novel fuzzy and multiobjective evolutionary algorithm based gene assignment for clustering short time series expression data  

Microsoft Academic Search

Conventional clustering algorithms based on Euclidean distance or Pearson correlation coefficient are not able to include order information in the distance metric and also unable to distinguish between random and real biological patterns. We present template based clustering algorithm for time series gene expression data. Template profiles are defined based on up-down regulation of genes between consecutive time points. Assignment

Ashish Anand; Ponnuthurai N. Suganthan; Kalyanmoy Deb

2007-01-01

345

MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms  

SciTech Connect

The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models. All features of the package are illustrated in great detail. The utility of the package is highlighted through a small comparison study of several popular clustering algorithms.

Melnykov, Volodymyr [University of Alabama, Tuscaloosa; Chen, Wei-Chen [ORNL; Maitra, Ranjan [Iowa State University

2012-01-01

346

Unsupervised Binary Change Detection in VHR Images using a Kernelized Clustering Algorithm  

NASA Astrophysics Data System (ADS)

When dealing with Change Detection (CD) it is often important to obtain a binary mask of changes occurred in two or more coregistered images. Methods involving unsupervised CD are known for their fast application and the minimal dependency on the user. The peculiarity of a multitude of such methods is the application of linear functions resulting in fast solutions, but on the other hand the final model is suboptimal, in the sense that nonlinearities are not taken into account. Depending on the composition of the scene (classes of similar object and their spectral response), non-linear relationship can be a crucial topic to consider in change detection studies, especially when in the binary mask different semantic and radiometric classes are grouped into a single cluster. Classical clustering algorithms like k-means, hierarchical schemes, similarity based metrics or graphs, mixtures of gaussians can give suboptimal predictions due to the impossibility of recognizing the correct cluster in the input space. A partial solution to this problem can be found by applying some bagging on the clustering scheme and obtaining a solution by voting, but remaining suboptimal. An improved solution can be obtained by applying some explicitly nonlinear clustering scheme. In the present research, we propose a clustering scheme based on a kernelized version of the well known k-means algorithm. The so-called Kernel k-Means (KKM) looks for clusters in an induced reproducing kernel Hilbert space, where data are mapped by a kernel function. The final clustering is thus a classical k-means applied in a higher dimensional space where clusters are assumed to be more recognizable. In order to make the problem feasible, a bagging scheme is adopted, which helps to avoid explicit computing the kernel matrix of the entire image. Iteratively, the clusters are computed using different random subsets of the multitemporal image. By applying such approach on many random subsets, the whole variance of the pixels is considered. The final map is obtained by a voting scheme on the cluster assignments. The efficiency and superior accuracy of the proposed method is studied using real data (QuickBird image of Zurich) and are compared with classical and bagged version of k-means algorithms. This work is supported by the SNFS Project No. 200021-126505 "KernelCD".

Volpi, Michele; Kanevski, Mikhail

2010-05-01

347

Scalable fault tolerant algorithms for linear-scaling coupled-cluster electronic structure methods.  

SciTech Connect

By means of coupled-cluster theory, molecular properties can be computed with an accuracy often exceeding that of experiment. The high-degree polynomial scaling of the coupled-cluster method, however, remains a major obstacle in the accurate theoretical treatment of mainstream chemical problems, despite tremendous progress in computer architectures. Although it has long been recognized that this super-linear scaling is non-physical, the development of efficient reduced-scaling algorithms for massively parallel computers has not been realized. We here present a locally correlated, reduced-scaling, massively parallel coupled-cluster algorithm. A sparse data representation for handling distributed, sparse multidimensional arrays has been implemented along with a set of generalized contraction routines capable of handling such arrays. The parallel implementation entails a coarse-grained parallelization, reducing interprocessor communication and distributing the largest data arrays but replicating as many arrays as possible without introducing memory bottlenecks. The performance of the algorithm is illustrated by several series of runs for glycine chains using a Linux cluster with an InfiniBand interconnect.

Leininger, Matthew L.; Nielsen, Ida Marie B.; Janssen, Curtis L.

2004-10-01

348

The improved fuzzy clustering algorithm based on AFS theory and its applications to Wisconsin breast cancer data  

Microsoft Academic Search

In this paper, the AFS fuzzy logic clustering algorithm proposed by X.D. Liu has been studied further by the improvement of the algorithm. Instead of examples of less than 10 samples in Liu's paper, we apply the improved algorithm to Wisconsin breast cancer data which has 699 samples and just the order relationships of the samples on each feature are

Xianchang Wang; Xiaodong Liu; Lishi Zhang

2010-01-01

349

On applying spatial constraints in fuzzy image clustering using a fuzzy rule-based system  

Microsoft Academic Search

A novel approach for enhancing the results of fuzzy clustering by imposing spatial constraints for solving image segmentation problems is presented. We have developed a Sugeno (185) type rule-based system with three inputs and 11 rules that interacts with the clustering results obtained by the well-known fuzzy c-means (FCM) and\\/or possibilistic c-means (PCM) algorithms. It provides good image segmentations in

Yannis A. Tolias; Stavros M. Panas

1998-01-01

350

Improvement for detection of microcalcifications through clustering algorithms and artificial neural networks  

NASA Astrophysics Data System (ADS)

A new method for detecting microcalcifications in regions of interest (ROIs) extracted from digitized mammograms is proposed. The top-hat transform is a technique based on mathematical morphology operations and, in this paper, is used to perform contrast enhancement of the mi-crocalcifications. To improve microcalcification detection, a novel image sub-segmentation approach based on the possibilistic fuzzy c-means algorithm is used. From the original ROIs, window-based features, such as the mean and standard deviation, were extracted; these features were used as an input vector in a classifier. The classifier is based on an artificial neural network to identify patterns belonging to microcalcifications and healthy tissue. Our results show that the proposed method is a good alternative for automatically detecting microcalcifications, because this stage is an important part of early breast cancer detection.

Quintanilla-Domínguez, Joel; Ojeda-Magaña, Benjamín; Marcano-Cedeño, Alexis; Cortina-Januchs, María G.; Vega-Corona, Antonio; Andina, Diego

2011-12-01

351

A fast hierarchical clustering algorithm for large-scale protein sequence data sets.  

PubMed

TRIBE-MCL is a Markov clustering algorithm that operates on a graph built from pairwise similarity information of the input data. Edge weights stored in the stochastic similarity matrix are alternately fed to the two main operations, inflation and expansion, and are normalized in each main loop to maintain the probabilistic constraint. In this paper we propose an efficient implementation of the TRIBE-MCL clustering algorithm, suitable for fast and accurate grouping of protein sequences. A modified sparse matrix structure is introduced that can efficiently handle most operations of the main loop. Taking advantage of the symmetry of the similarity matrix, a fast matrix squaring formula is also introduced to facilitate the time consuming expansion. The proposed algorithm was tested on protein sequence databases like SCOP95. In terms of efficiency, the proposed solution improves execution speed by two orders of magnitude, compared to recently published efficient solutions, reducing the total runtime well below 1min in the case of the 11,944proteins of SCOP95. This improvement in computation time is reached without losing anything from the partition quality. Convergence is generally reached in approximately 50 iterations. The efficient execution enabled us to perform a thorough evaluation of classification results and to formulate recommendations regarding the choice of the algorithm?s parameter values. PMID:24657908

Szilágyi, Sándor M; Szilágyi, László

2014-05-01

352

A New Waveform Signal Processing Method Based on Adaptive Clustering-Genetic Algorithms  

SciTech Connect

We present a fast digital signal processing method for numerical analysis of individual pulses from CdZnTe compound semiconductor detectors. Using Maxi-Mini Distance Algorithm and Genetic Algorithms based discrimination technique. A parametric approach has been used for classifying the discriminated waveforms into a set of clusters each has a similar signal shape with a corresponding pulse height spectrum. A corrected total pulse height spectrum was obtained by applying a normalization factor for the full energy peak for each cluster with a highly improvements in the energy spectrum characteristics. This method applied successfully for both simulated and real measured data, it can be applied to any detector suffers from signal shape variation. (authors)

Noha Shaaban; Fukuzo Masuda; Hidetsugu Morota [Computer Software Development Company, Ltd. (Japan)

2006-07-01

353

Application of a clustering-based peak alignment algorithm to analyze various DNA fingerprinting data.  

PubMed

DNA fingerprinting analysis such as amplified ribosomal DNA restriction analysis (ARDRA), repetitive extragenic palindromic PCR (rep-PCR), ribosomal intergenic spacer analysis (RISA), and denaturing gradient gel electrophoresis (DGGE) are frequently used in various fields of microbiology. The major difficulty in DNA fingerprinting data analysis is the alignment of multiple peak sets. We report here an R program for a clustering-based peak alignment algorithm, and its application to analyze various DNA fingerprinting data, such as ARDRA, rep-PCR, RISA, and DGGE data. The results obtained by our clustering algorithm and by BioNumerics software showed high similarity. Since several R packages have been established to statistically analyze various biological data, the distance matrix obtained by our R program can be used for subsequent statistical analyses, some of which were not previously performed but are useful in DNA fingerprinting studies. PMID:19616587

Ishii, Satoshi; Kadota, Koji; Senoo, Keishi

2009-09-01

354

Dynamic connectivity algorithms for Monte Carlo simulations of the random-cluster model  

NASA Astrophysics Data System (ADS)

We review Sweeny's algorithm for Monte Carlo simulations of the random cluster model. Straightforward implementations suffer from the problem of computational critical slowing down, where the computational effort per edge operation scales with a power of the system size. By using a tailored dynamic connectivity algorithm we are able to perform all operations with a poly-logarithmic computational effort. This approach is shown to be efficient in keeping online connectivity information and is of use for a number of applications also beyond cluster-update simulations, for instance in monitoring droplet shape transitions. As the handling of the relevant data structures is non-trivial, we provide a Python module with a full implementation for future reference.

Metin Elçi, Eren; Weigel, Martin

2014-05-01

355

A novel data-sending algorithm based on cluster for wireless sensor networks  

NASA Astrophysics Data System (ADS)

Wireless Sensor Networks (WSN) has been attracting growing interests for developing a new generation of large-scale embedded computing systems. However, the communication paradigms in wireless sensor networks differ from the ones associated to traditional wireless networks, triggering the need for new communication protocols and energy-consuming model. The NDSA (A Novel Data-sending Algorithm) Based on Cluster for wireless sensor networks aims at the design of a scalable data-sending model for supporting large-scale embedded computing applications with critical requirements. In this paper, we assume intra-cluster data taking on Gaussian distribution. According to the desired accuracy given by system, NDSA can automatically adjust the number of data-sending nodes. Experimental results show that NDSA can lengthen the life of network significantly better than other similar algorithms.

Xu, Xiao-feng; Feng, Ren-jian; Wan, Jiang-wen

2008-11-01

356

A Clustering Algorithm Based on the Ants Self-Assembly Behavior  

Microsoft Academic Search

\\u000a We have presented in this paper an ants based clustering algorithm which is inspired from the self-assembling behavior observed\\u000a in real ants. These ants progressively become connected to an initial point called the support and then successively to other\\u000a connected ants. The artificial ants that we have defined similarly build a tree where each ant represents a node\\/data. Ants\\u000a use

Hanene Azzag; Nicolas Monmarché; Mohamed Slimane; Christiane Guinot; Gilles Venturini

2003-01-01

357

A Human Action Recognition Algorithm Based on Semi-supervised Kmeans Clustering  

Microsoft Academic Search

\\u000a This paper proposes a new method of semi-supervised human action recognition. In our approach, the motion energy image(MEI)\\u000a and motion history image(MHI) are firstly used as the feature representation of the human action. Then, the constrained semi-supervised\\u000a kmeans clustering algorithm is utilized to predict the class label of unlabeled training example. Meanwhile the average motion\\u000a energy and history images are

Hejin Yuan; Cuiru Wang

358

New aspects of the elastic net algorithm for cluster analysis  

Microsoft Academic Search

The elastic net algorithm formulated by Durbin–Willshaw as a heuristic method and initially applied to solve the traveling\\u000a salesman problem can be used as a tool for data clustering in n-dimensional space. With the help of statistical mechanics,\\u000a it is formulated as a deterministic annealing method, where a chain with a fixed number of nodes interacts at different temperatures\\u000a with the

Marcos Lévano; Hans Nowak

359

An Efficient Algorithm for Cluster Updates in Z(2) Lattice Gauge Theories  

Microsoft Academic Search

An efficient algorithm for identifying the independent cluster flips in a Z(2) lattice gauge theory with stochastic percolation is presented. It applies Gaussian elimination to the incidence matrix, with special attention payed to the pivoting strategy and appropriate linked list structures. At the critical point of the 3-dimensional pure gauge model, storage and cpu-time scale like L3 and L3 log

B. Bunk

1992-01-01

360

Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods  

PubMed Central

Background A Bayesian approach based on a Dirichlet process (DP) prior is useful for inferring genetic population structures because it can infer the number of populations and the assignment of individuals simultaneously. However, the properties of the DP prior method are not well understood, and therefore, the use of this method is relatively uncommon. We characterized the DP prior method to increase its practical use. Results First, we evaluated the usefulness of the sequentially-allocated merge-split (SAMS) sampler, which is a technique for improving the mixing of Markov chain Monte Carlo algorithms. Although this sampler has been implemented in a preceding program, HWLER, its effectiveness has not been investigated. We showed that this sampler was effective for population structure analysis. Implementation of this sampler was useful with regard to the accuracy of inference and computational time. Second, we examined the effect of a hyperparameter for the prior distribution of allele frequencies and showed that the specification of this parameter was important and could be resolved by considering the parameter as a variable. Third, we compared the DP prior method with other Bayesian clustering methods and showed that the DP prior method was suitable for data sets with unbalanced sample sizes among populations. In contrast, although current popular algorithms for population structure analysis, such as those implemented in STRUCTURE, were suitable for data sets with uniform sample sizes, inferences with these algorithms for unbalanced sample sizes tended to be less accurate than those with the DP prior method. Conclusions The clustering method based on the DP prior was found to be useful because it can infer the number of populations and simultaneously assign individuals into populations, and it is suitable for data sets with unbalanced sample sizes among populations. Here we presented a novel program, DPART, that implements the SAMS sampler and can consider the hyperparameter for the prior distribution of allele frequencies to be a variable.

2011-01-01

361

A comparative study of radial basis function neural networks in dynamic clustering algorithm  

NASA Astrophysics Data System (ADS)

This paper developed two learning procedure, respectively, based on the orthogonal least squares (OLS) method and the "Innovation-Contribution" criterion (ICc) proposed newly. The orthogonal use of the stepwise-regression algorithm of the ICc mages the model structure independent of the selected term sequence and reduces the cluster region further as compared with orthogonal least squares (OLS). as the Bayesian information criteria (BIC) method is incorporate into the clustering process of the ICc, except for the widths of Gaussian functions, it has no other parameter that need tuning ,but the user is required to specify the tolerance ?, which is relevant to noises and will be difficult to implement in the real system, for the OLS algorithm. The two algorithms are employed to the Radial Basis Function Neural Networks (RBFNN) to compare its performance for different noise nonlinear dynamic systems. Experimental results show that they provide an efficient approximation to the required results for fitting models, but the clustering procedures of the ICc is substantially better solutions than does the OLS.

Zhou, Peng; Li, Dehua; Wu, Hong; Zeng, Jun; Chen, Feng

2009-10-01

362

An unsupervised, ensemble clustering algorithm: A new approach for classification of X-ray sources  

NASA Astrophysics Data System (ADS)

A large volume of CCD X-ray spectra is being generated by the Chandra X-ray Observatory (Chandra) and XMM-Newton. Automated spectral analysis and classification methods can aid in sorting, characterizing, and classifying this large volume of CCD X-ray spectra in a non-parametric fashion, complementary to current parametric model fits. We have developed an algorithm that uses multivariate statistical techniques, including an ensemble clustering method, applied for the first time for X-ray spectral classification. The algorithm uses spectral data to group similar discrete sources of X-ray emission by placing the X-ray sources in a three-dimensional spectral sequence and then grouping the ordered sources into clusters based on their spectra. This new method can handle large quantities of data and operate independently of the requirement of spectral source models and a priori knowledge concerning the nature of the sources (i.e., young stars, interacting binaries, active galactic nuclei). We apply the method to Chandra imaging spectroscopy of the young stellar clusters in the Orion Nebula Cluster and the NGC 1333 star formation region.

Hojnacki, S. M.; Micela, G.; Lalonde, S. M.; Feigelson, E. D.; Kastner, J. H.

2008-07-01

363

Application of multiresolution multidimensional clustering of hyperspectral data using the watershed algorithm  

NASA Astrophysics Data System (ADS)

In many applications of remotely-sensed imagery, one of the first steps is partitioning the image into a tractable number of regions. In spectral remote sensing, the goal is often to find regions that are spectrally similar within the region but spectrally distinct from other regions. There is often no requirement that these region be spatially connected. Two goals of this study are to partition a hyperspectral image into groups of spectrally distinct materials, and to partition without human intervention. To this end, this study investigates the use of multi- resolution, multi-dimensional variants of the watershed- clustering algorithm on Hyperspectral Digital Imagery Collection Experiment (HYDICE) data. The watershed algorithm looks for clusters in a histogram: a B-dimensional surface where B is the number of bands used (up to 210 for HYDICE). The algorithm is applied to HYDICE data of the Purdue Agronomy Farm, for which ground truth is available. Watershed results are compared to those obtained by using the commonly-available Iterative Self-Organizing Data Analysis Technique (ISODATA) algorithm.

Hemmer, Terrence H.; Jellison, Gerard P.; Wilson, Darryl G.

2002-08-01

364

Commodity cluster and hardware-based massively parallel implementations of hyperspectral imaging algorithms  

NASA Astrophysics Data System (ADS)

The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.

Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David

2006-06-01

365

Application of fuzzy c-means segmentation technique for tissue differentiation in MR images of a hemorrhagic glioblastoma multiforme.  

PubMed

The application of a raw data-based, operator-independent MR segmentation technique to differentiate boundaries of tumor from edema or hemorrhage is demonstrated. A case of a glioblastoma multiforme with gross and histopathologic correlation is presented. The MR image data set was segmented into tissue classes based on three different MR weighted image parameters (T1-, proton density-, and T2-weighted) using unsupervised fuzzy c-means (FCM) clustering algorithm technique for pattern recognition. A radiological examination of the MR images and correlation with fuzzy clustering segmentations was performed. Results were confirmed by gross and histopathology which, to the best of our knowledge, reports the first application of this demanding approach. Based on the results of neuropathologic correlation, the application of FCM MR image segmentation to several MR images of a glioblastoma multiforme represents a viable technique for displaying diagnostically relevant tissue contrast information used in 3D volume reconstruction. With this technique, it is possible to generate segmentation images that display clinically important neuroanatomic and neuropathologic tissue contrast information from raw MR image data. PMID:7739370

Phillips, W E; Velthuizen, R P; Phuphanich, S; Hall, L O; Clarke, L P; Silbiger, M L

1995-01-01

366

Application of multiscale amplitude modulation features and fuzzy C-means to brain-computer interface.  

PubMed

This study proposed a recognized system for electroencephalogram (EEG) data classification. In addition to the wavelet-based amplitude modulation (AM) features, the fuzzy c-means (FCM) clustering is used for the discriminant of left finger lifting and resting. The features are extracted from discrete wavelet transform (DWT) data with the AM method. The FCM is then applied to recognize extracted features. Compared with band power features, k-means clustering, and linear discriminant analysis (LDA) classifier, the results indicate that the proposed method is satisfactory in applications of brain-computer interface (BCI). PMID:22423549

Hsu, Wei-Yen; Li, Yu-Chuan; Hsu, Chien-Yeh; Liu, Chien-Tsai; Chiu, Hung-Wen

2012-01-01

367

A new concept of wildland-urban interface based on city clustering algorithm  

NASA Astrophysics Data System (ADS)

Wildland-Urban-Interface (WUI) is a widely used term in the context of wild and forest fires to indicate areas where human infrastructures interact with wildland/forest areas. Many complex problems are associated to the WUI; but the most relevant ones are those related to forest fire hazard and management in dense populated areas where fire regime is dominated by anthropogenic-induced ignition fires. This coexistence enhances both anthropogenic-ignition sources and flammable fuels. Furthermore, the growing trend of the WUI and global change effects may even worsening the situation in the near future. Therefore, many studies are dedicated to the WUI problem, focusing on refinement of its definition, development of mapping methods, implementation of measures into specific fire management plans and the validation of the proposed approaches. The present study introduces a new concept of WUI based on city clustering algorithm (CCA) introduced in Rosenfeld et al., 2008. CCA was proposed as an automatic tool for studying the definition of cities and their distribution. The algorithm uses demographic data - either on a regular or non-regular grid in space - where a city (urban zone) is detected as a cluster of connected populated cells with maximal size. In the present study the CCA is proposed as a tool to develop a new concept of population dynamic analysis crucial to define and to localise WUI. The real case study is based on demographic/census data - organised in a regular grid with a resolution of 100 m and the forest fire ignition points database from canton Ticino, Switzerland. By changing spatial scales of demographic cells the relationships between urban zones (demographic clusters) and forest fire events were statistically analyzed. Corresponding scaling laws were used to understand the interaction between urban zones and forest fires. The first results are good and indicate that the method can be applied to define WUI in an innovative way. Keywords: forest fires, wild-land-user interface, city clustering algorithms.

Kanevski, M.; Champendal, A.; Vega Orozco, C.; Tonini, M.; Conedera, M.

2012-04-01

368

Application of K- and Fuzzy c-Means for Color Segmentation of Thermal Infrared Breast Images  

Microsoft Academic Search

Color segmentation of infrared thermal images is an important factor in detecting the tumor region. The cancerous tissue with\\u000a angiogenesis and inflammation emits temperature pattern different from the healthy one. In this paper, two color segmentation\\u000a techniques, K-means and fuzzy c-means for color segmentation of infrared (IR) breast images are modeled and compared. Using\\u000a the K-means algorithm in Matlab, some

M. EtehadTavakol; S. Sadri; E. Y. K. Ng

2010-01-01

369

Automated search for arthritic patterns in infrared spectra of synovial fluid using adaptive wavelets and fuzzy C-means analysis.  

PubMed

Analysis of synovial fluid by infrared (IR) clinical chemistry requires expert interpretation and is susceptible to subjective error. The application of automated pattern recognition (APR) may enhance the utility of IR analysis. Here, we describe an APR method based on the fuzzy C-means cluster adaptive wavelet (FCMC-AW) algorithm, which consists of two parts: one is a FCMC using the features from an M-band feature extractor adopting the adaptive wavelet algorithm and the second is a Bayesian classifier using the membership matrix generated by the FCMC. A FCMC-cross-validated quadratic probability measure (FCMC-CVQPM) criterion is used under the assumption that the class probability density is equal to the value of the membership matrix. Therefore, both values of posterior probabilities and selection criterion MFQ can be obtained through the membership matrix. The distinctive advantage of this method is that it provides not only the 'hard' classification of a new pattern, but also the confidence of this classification, which is reflected by the membership matrix. PMID:16686402

Cui, Jie; Loewy, John; Kendall, Edward J

2006-05-01

370

An Improvement on the Weighted Least-Connection Scheduling Algorithm for Load Balancing in Web Cluster Systems  

NASA Astrophysics Data System (ADS)

Web cluster systems consist of a load balancer for distributing web requests and loads to several servers, and real servers for processing web requests. Previous load distribution scheduling algorithms of web cluster systems to distribute web requests to real servers are Round-Robin, Weighted Round-Robin, Least-Connection and Weighted Least-Connection(WLC) algorithm. The WLC scheduling algorithm, in which a throughput weight is assigned to real servers and the least connected real server is selected for processing web requests, is generally used for web cluster systems. When a new real server is added to a web cluster system with many simultaneous users, previous WLC scheduling algorithm assigns web requests to only the new real server, and makes load imbalance among real servers. In this paper, we propose a improved WLC scheduling algorithm which maintains load balance among real servers by avoiding web requests being assigned to only a new real server. When web requests are continuously assigned to only a new real server more than the maximum continuous allocation number(L), the proposed algorithm excepts the new real server from activated real server scheduling list and deactivates the new real server. And after L-1 allocation round times, the new real server is included into real server scheduling list by activating it. When a new real server is added to web cluster systems, the proposed algorithm maintains load balance among real servers by avoiding overloads of the new real server.

Choi, Dongjun; Chung, Kwang Sik; Shon, Jingon

371

Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm  

NASA Technical Reports Server (NTRS)

Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.

Mitra, Sunanda; Pemmaraju, Surya

1992-01-01

372

An improved scheduling algorithm for 3D cluster rendering with platform LSF  

NASA Astrophysics Data System (ADS)

High-quality photorealistic rendering of 3D modeling needs powerful computing systems. On this demand highly efficient management of cluster resources develops fast to exert advantages. This paper is absorbed in the aim of how to improve the efficiency of 3D rendering tasks in cluster. It focuses research on a dynamic feedback load balance (DFLB) algorithm, the work principle of load sharing facility (LSF) and optimization of external scheduler plug-in. The algorithm can be applied into match and allocation phase of a scheduling cycle. Candidate hosts is prepared in sequence in match phase. And the scheduler makes allocation decisions for each job in allocation phase. With the dynamic mechanism, new weight is assigned to each candidate host for rearrangement. The most suitable one will be dispatched for rendering. A new plugin module of this algorithm has been designed and integrated into the internal scheduler. Simulation experiments demonstrate the ability of improved plugin module is superior to the default one for rendering tasks. It can help avoid load imbalance among servers, increase system throughput and improve system utilization.

Xu, Wenli; Zhu, Yi; Zhang, Liping

2013-10-01

373

BCRgt: a Bayesian cluster regression-based genotyping algorithm for the samples with copy number alterations  

PubMed Central

Background Accurate genotype calling is a pre-requisite of a successful Genome-Wide Association Study (GWAS). Although most genotyping algorithms can achieve an accuracy rate greater than 99% for genotyping DNA samples without copy number alterations (CNAs), almost all of these algorithms are not designed for genotyping tumor samples that are known to have large regions of CNAs. Results This study aims to develop a statistical method that can accurately genotype tumor samples with CNAs. The proposed method adds a Bayesian layer to a cluster regression model and is termed a Bayesian Cluster Regression-based genotyping algorithm (BCRgt). We demonstrate that high concordance rates with HapMap calls can be achieved without using reference/training samples, when CNAs do not exist. By adding a training step, we have obtained higher genotyping concordance rates, without requiring large sample sizes. When CNAs exist in the samples, accuracy can be dramatically improved in regions with DNA copy loss and slightly improved in regions with copy number gain, comparing with the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM). Conclusions In conclusion, we have demonstrated that BCRgt can provide accurate genotyping calls for tumor samples with CNAs.

2014-01-01

374

An Empirical Study of Unsupervised Rule Set Extraction of Clustered Categorical Data Using a Simulated Bee Colony Algorithm  

NASA Astrophysics Data System (ADS)

This study investigates the use of a biologically inspired meta-heuristic algorithm to extract rule sets from clustered categorical data. A computer program which implemented the algorithm was executed against six benchmark data sets and successfully discovered the underlying generation rules in all cases. Compared to existing approaches, the simulated bee colony (SBC) algorithm used in this study has the advantage of allowing full customization of the characteristics of the extracted rule set, and allowing arbitrarily large data sets to be analyzed. The primary disadvantages of the SBC algorithm for rule set extraction are that the approach requires a relatively large number of input parameters, and that the approach does not guarantee convergence to an optimal solution. The results demonstrate that an SBC algorithm for rule set extraction of clustered categorical data is feasible, and suggest that the approach may have the ability to outperform existing algorithms in certain scenarios.

McCaffrey, James D.; Dierking, Howard

375

Rotational fluctuation of molecules in quantum clusters. I. Path integral hybrid Monte Carlo algorithm  

SciTech Connect

In this paper, we present a path integral hybrid Monte Carlo (PIHMC) method for rotating molecules in quantum fluids. This is an extension of our PIHMC for correlated Bose fluids [S. Miura and J. Tanaka, J. Chem. Phys. 120, 2160 (2004)] to handle the molecular rotation quantum mechanically. A novel technique referred to be an effective potential of quantum rotation is introduced to incorporate the rotational degree of freedom in the path integral molecular dynamics or hybrid Monte Carlo algorithm. For a permutation move to satisfy Bose statistics, we devise a multilevel Metropolis method combined with a configurational-bias technique for efficiently sampling the permutation and the associated atomic coordinates. Then, we have applied the PIHMC to a helium-4 cluster doped with a carbonyl sulfide molecule. The effects of the quantum rotation on the solvation structure and energetics were examined. Translational and rotational fluctuations of the dopant in the superfluid cluster were also analyzed.

Miura, Shinichi [Institute for Molecular Science, 38 Myodaiji, Okazaki 444-8585 (Japan)

2007-03-21

376

Development of a Genetic Algorithm to Automate Clustering of a Dependency Structure Matrix  

NASA Technical Reports Server (NTRS)

Much technology assessment and organization design data exists in Microsoft Excel spreadsheets. Tools are needed to put this data into a form that can be used by design managers to make design decisions. One need is to cluster data that is highly coupled. Tools such as the Dependency Structure Matrix (DSM) and a Genetic Algorithm (GA) can be of great benefit. However, no tool currently combines the DSM and a GA to solve the clustering problem. This paper describes a new software tool that interfaces a GA written as an Excel macro with a DSM in spreadsheet format. The results of several test cases are included to demonstrate how well this new tool works.

Rogers, James L.; Korte, John J.; Bilardo, Vincent J.

2006-01-01

377

Communities recognition in the Chesapeake Bay ecosystem by dynamical clustering algorithms based on different oscillators systems  

NASA Astrophysics Data System (ADS)

We have recently introduced [Phys. Rev. E 75, 045102(R) (2007); AIP Conference Proceedings 965, 2007, p. 323] an efficient method for the detection and identification of modules in complex networks, based on the de-synchronization properties (dynamical clustering) of phase oscillators. In this paper we apply the dynamical clustering tecnique to the identification of communities of marine organisms living in the Chesapeake Bay food web. We show that our algorithm is able to perform a very reliable classification of the real communities existing in this ecosystem by using different kinds of dynamical oscillators. We compare also our results with those of other methods for the detection of community structures in complex networks.

Pluchino, A.; Rapisarda, A.; Latora, V.

2008-10-01

378

CLUSTAG & WCLUSTAG: Hierarchical Clustering Algorithms for Efficient Tag-SNP Selection  

NASA Astrophysics Data System (ADS)

More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clustering algorithms have been proposed for efficient tag SNP selection.

Ao, Sio-Iong

379

[A cloud detection algorithm for MODIS images combining Kmeans clustering and multi-spectral threshold method].  

PubMed

An improved method for detecting cloud combining Kmeans clustering and the multi-spectral threshold approach is described. On the basis of landmark spectrum analysis, MODIS data is categorized into two major types initially by Kmeans method. The first class includes clouds, smoke and snow, and the second class includes vegetation, water and land. Then a multi-spectral threshold detection is applied to eliminate interference such as smoke and snow for the first class. The method is tested with MODIS data at different time under different underlying surface conditions. By visual method to test the performance of the algorithm, it was found that the algorithm can effectively detect smaller area of cloud pixels and exclude the interference of underlying surface, which provides a good foundation for the next fire detection approach. PMID:21714260

Wang, Wei; Song, Wei-Guo; Liu, Shi-Xing; Zhang, Yong-Ming; Zheng, Hong-Yang; Tian, Wei

2011-04-01

380

Hartree-Fock via variational coupled cluster theory: An alternative way to diagonalization free algorithm  

NASA Astrophysics Data System (ADS)

It is shown that the non-terminating expansions of the wave function within the variational coupled cluster singles (VCCS) can be exactly treated by summing up the one-particle density matrix elements in the occupied block using simple recurrence relation. At the same time, this leads to an extremely simple 'a priori' diagonalization free algorithm for the solution of the Hartree-Fock equations. This treatment corresponds to a non-unitary transformation of orbitals, however, preserving the norm and idempotency of the density matrix. The resulting algorithm enables a Hartree-Fock solution with 'a priori' localized orbitals. Similar approach can be applied within the Kohn-Sham theory. Analysis of the VCCS expansion in terms of the generalized perturbation theory is also presented. Numerical results are presented for model systems N2, F2, H2O, NH3 but also for a larger Uracile molecule and an interaction of four Guanine molecules.

Šimunek, Ján; Noga, Jozef

2012-12-01

381

Meanie3D - a mean-shift based, multivariate, multi-scale clustering and tracking algorithm  

NASA Astrophysics Data System (ADS)

Project OASE is the one of 5 work groups at the HErZ (Hans Ertel Centre for Weather Research), an ongoing effort by the German weather service (DWD) to further research at Universities concerning weather prediction. The goal of project OASE is to gain an object-based perspective on convective events by identifying them early in the onset of convective initiation and follow then through the entire lifecycle. The ability to follow objects in this fashion requires new ways of object definition and tracking, which incorporate all the available data sets of interest, such as Satellite imagery, weather Radar or lightning counts. The Meanie3D algorithm provides the necessary tool for this purpose. Core features of this new approach to clustering (object identification) and tracking are the ability to identify objects using the mean-shift algorithm applied to a multitude of variables (multivariate), as well as the ability to detect objects on various scales (multi-scale) using elements of Scale-Space theory. The algorithm works in 2D as well as 3D without modifications. It is an extension of a method well known from the field of computer vision and image processing, which has been tailored to serve the needs of the meteorological community. In spite of the special application to be demonstrated here (like convective initiation), the algorithm is easily tailored to provide clustering and tracking for a wide class of data sets and problems. In this talk, the demonstration is carried out on two of the OASE group's own composite sets. One is a 2D nationwide composite of Germany including C-Band Radar (2D) and Satellite information, the other a 3D local composite of the Bonn/Jülich area containing a high-resolution 3D X-Band Radar composite.

Simon, Jürgen-Lorenz; Malte, Diederich; Silke, Troemel

2014-05-01

382

Using Hierarchical Time Series Clustering Algorithm and Wavelet Classifier for Biometric Voice Classification  

PubMed Central

Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm.

Fong, Simon

2012-01-01

383

A computational algorithm for functional clustering of proteome dynamics during development.  

PubMed

Phenotypic traits, such as seed development, are a consequence of complex biochemical interactions among genes, proteins and metabolites, but the underlying mechanisms that operate in a coordinated and sequential manner remain elusive. Here, we address this issue by developing a computational algorithm to monitor proteome changes during the course of trait development. The algorithm is built within the mixture-model framework in which each mixture component is modeled by a specific group of proteins that display a similar temporal pattern of expression in trait development. A nonparametric approach based on Legendre orthogonal polynomials was used to fit dynamic changes of protein expression, increasing the power and flexibility of protein clustering. By analyzing a dataset of proteomic dynamics during early embryogenesis of the Chinese fir, the algorithm has successfully identified several distinct types of proteins that coordinate with each other to determine seed development in this forest tree commercially and environmentally important to China. The algorithm will find its immediate applications for the characterization of mechanistic underpinnings for any other biological processes in which protein abundance plays a key role. PMID:24955031

Wang, Yaqun; Wang, Ningtao; Hao, Han; Guo, Yunqian; Zhen, Yan; Shi, Jisen; Wu, Rongling

2014-06-01

384

A contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation  

SciTech Connect

The recent and continuing construction of multi and hyper spectral imagers will provide detailed data cubes with information in both the spatial and spectral domain. This data shows great promise for remote sensing applications ranging from environmental and agricultural to national security interests. The reduction of this voluminous data to useful intermediate forms is necessary both for downlinking all those bits and for interpreting them. Smart onboard hardware is required, as well as sophisticated earth bound processing. A segmented image (in which the multispectral data in each pixel is classified into one of a small number of categories) is one kind of intermediate form which provides some measure of data compression. Traditional image segmentation algorithms treat pixels independently and cluster the pixels according only to their spectral information. This neglects the implicit spatial information that is available in the image. We will suggest a simple approach; a variant of the standard k-means algorithm which uses both spatial and spectral properties of the image. The segmented image has the property that pixels which are spatially contiguous are more likely to be in the same class than are random pairs of pixels. This property naturally comes at some cost in terms of the compactness of the clusters in the spectral domain, but we have found that the spatial contiguity and spectral compactness properties are nearly orthogonal, which means that we can make considerable improvements in the one with minimal loss in the other.

Theiler, J.; Gisler, G.

1997-07-01

385

KANTS: a stigmergic ant algorithm for cluster analysis and swarm art.  

PubMed

KANTS is a swarm intelligence clustering algorithm inspired by the behavior of social insects. It uses stigmergy as a strategy for clustering large datasets and, as a result, displays a typical behavior of complex systems: self-organization and global patterns emerging from the local interaction of simple units. This paper introduces a simplified version of KANTS and describes recent experiments with the algorithm in the context of a contemporary artistic and scientific trend called swarm art, a type of generative art in which swarm intelligence systems are used to create artwork or ornamental objects. KANTS is used here for generating color drawings from the input data that represent real-world phenomena, such as electroencephalogram sleep data. However, the main proposal of this paper is an art project based on well-known abstract paintings, from which the chromatic values are extracted and used as input. Colors and shapes are therefore reorganized by KANTS, which generates its own interpretation of the original artworks. The project won the 2012 Evolutionary Art, Design, and Creativity Competition. PMID:23912505

Fernandes, Carlos M; Mora, Antonio M; Merelo, Juan J; Rosa, Agostinho C

2014-06-01

386

Fuzzy Order Statistics and Their Application to Fuzzy Clustering.  

National Technical Information Service (NTIS)

The median and the median absolute deviation (MAD) are robust statistics based on order statistics. Order statistics are extended to fuzzy sets to define a fuzzy median and a fuzzy MAD. The fuzzy c- Means (FCM) clustering algorithm is defined for any p-no...

P. R. Kersten

1999-01-01

387

Transition from exo- to endo- Cu absorption in CuSin clusters: A Genetic Algorithms Density Functional Theory (DFT) Study  

PubMed Central

The characterization and prediction of the structures of metal silicon clusters is important for nanotechnology research because these clusters can be used as building blocks for nano devices, integrated circuits and solar cells. Several authors have postulated that there is a transition between exo to endo absorption of Cu in Sin clusters and showed that for n larger than 9 it is possible to find endohedral clusters. Unfortunately, no global searchers have confirmed this observation, which is based on local optimizations of plausible structures. Here we use parallel Genetic Algorithms (GA), as implemented in our MGAC software, directly coupled with DFT energy calculations to show that the global search of CuSin cluster structures does not find endohedral clusters for n < 8 but finds them for n ? 10.

Ona, Ofelia B.; Ferraro, Marta B.; Facelli, Julio C.

2010-01-01

388

Large-scale validation of a computer-aided polyp detection algorithm for CT colonography using cluster computing  

NASA Astrophysics Data System (ADS)

The presented method significantly reduces the time necessary to validate a computed tomographic colonography (CTC) computer aided detection (CAD) algorithm of colonic polyps applied to a large patient database. As the algorithm is being developed on Windows PCs and our target, a Beowulf cluster, is running on Linux PCs, we made the application dual platform compatible using a single source code tree. To maintain, share, and deploy source code, we used CVS (concurrent versions system) software. We built the libraries from their sources for each operating system. Next, we made the CTC CAD algorithm dual-platform compatible and validate that both Windows and Linux produced the same results. Eliminating system dependencies was mostly achieved using the Qt programming library, which encapsulates most of the system dependent functionality in order to present the same interface on either platform. Finally, we wrote scripts to execute the CTC CAD algorithm in parallel. Running hundreds of simultaneous copies of the CTC CAD algorithm on a Beowulf cluster computing network enables execution in less than four hours on our entire collection of over 2400 CT scans, as compared to a month a single PC. As a consequence, our complete patient database can be processed daily, boosting research productivity. Large scale validation of a computer aided polyp detection algorithm for CT colonography using cluster computing significantly improves the round trip time of algorithm improvement and revalidation.

Bitter, Ingmar; Brown, John E.; Brickman, Daniel; Summers, Ronald M.

2004-04-01

389

`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny  

NASA Astrophysics Data System (ADS)

Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.

Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila

2010-10-01

390

Falcon: neural fuzzy control and decision systems using FKP and PFKP clustering algorithms.  

PubMed

Neural fuzzy networks proposed in the literature can be broadly classified into two groups. The first group is essentially fuzzy systems with self-tuning capabilities and requires an initial rule base to be specified prior to training. The second group of neural fuzzy networks, on the other hand, is able to automatically formulate the fuzzy rules from the numerical training data. Examples are the Falcon-ART, and the POPFNN family of networks. A cluster analysis is first performed on the training data and the fuzzy rules are subsequently derived through the proper connections of these computed clusters. This correspondence proposes two new networks: Falcon-FKP and Falcon-PFKP. They are extensions of the Falcon-ART network, and aimed to overcome the shortcomings faced by the Falcon-ART network itself, i.e., poor classification ability when the classes of input data are very similar to each other, termination of training cycle depends heavily on a preset error parameter, the fuzzy rule base of the Falcon-ART network may not be consistent Nauck, there is no control over the number of fuzzy rules generated, and learning efficiency may deteriorate by using complementarily coded training data. These deficiencies are essentially inherent to the fuzzy ART, clustering technique employed by the Falcon-ART network. Hence, two clustering techniques--Fuzzy Kohonen Partitioning (FKP) and its pseudo variant PFKP, are synthesized with the basic Falcon structure to compute the fuzzy sets and to automatically derive the fuzzy rules from the training data. The resultant neural fuzzy networks are Falcon-FKP and Falcon-PFKP, respectively. These two proposed networks have a lean and efficient training algorithm and consistent fuzzy rule bases. Extensive simulations are conducted using the two networks and their performances are encouraging when benchmarked against other neural and neural fuzzy systems. PMID:15369109

Tung, W L; Quek, C

2004-02-01

391

Possibilistic clustering for shape recognition  

NASA Technical Reports Server (NTRS)

Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, the clustering problem was cast into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. An appropriate objective function whose minimum will characterize a good possibilistic partition of the data was constructed, and the membership and prototype update equations from necessary conditions for minimization of our criterion function were derived. The ability of this approach to detect linear and quartic curves in the presence of considerable noise is shown.

Keller, James M.; Krishnapuram, Raghu

1993-01-01

392

An integrated artificial neural network-genetic algorithm clustering ensemble for performance assessment of decision making units  

Microsoft Academic Search

This study proposes a non-parametric efficiency frontier analysis method based on artificial neural network (ANN) and genetic\\u000a algorithm clustering ensemble (GACE) for measuring efficiency as a complementary tool for the common techniques of the efficiency\\u000a studies in the previous studies. The proposed ANN GA algorithm is able to find a stochastic frontier based on a set of input–output\\u000a observational data

A. Azadeh; M. Saberi; M. Anvari; M. Mohamadi

2011-01-01

393

A parallel point cloud clustering algorithm for subset segmentation and outlier detection  

NASA Astrophysics Data System (ADS)

We present a fast point cloud clustering technique which is suitable for outlier detection, object segmentation and region labeling for large multi-dimensional data sets. The basis is a minimal data structure similar to a kd-tree which enables us to detect connected subsets very fast. The proposed algorithms utilizing this tree structure are parallelizable which further increases the computation speed for very large data sets. The procedures given are a vital part of the data preprocessing. They improve the input data properties for a more reliable computation of surface measures, polygonal meshes and other visualization techniques. In order to show the effectiveness of our techniques we evaluate sets of point clouds from different 3D scanning devices.

Teutsch, Christian; Trostmann, Erik; Berndt, Dirk

2011-06-01

394

Clustering gene expression data using a diffraction-inspired framework  

PubMed Central

Background The recent developments in microarray technology has allowed for the simultaneous measurement of gene expression levels. The large amount of captured data challenges conventional statistical tools for analysing and finding inherent correlations between genes and samples. The unsupervised clustering approach is often used, resulting in the development of a wide variety of algorithms. Typical clustering algorithms require selecting certain parameters to operate, for instance the number of expected clusters, as well as defining a similarity measure to quantify the distance between data points. The diffraction?based clustering algorithm however is designed to overcome this necessity for user?defined parameters, as it is able to automatically search the data for any underlying structure. Methods The diffraction?based clustering algorithm presented in this paper is tested using five well?known expression datasets pertaining to cancerous tissue samples. The clustering results are then compared to those results obtained from conventional algorithms such as the k?means, fuzzy c?means, self?organising map, hierarchical clustering algorithm, Gaussian mixture model and density?based spatial clustering of applications with noise (DBSCAN). The performance of each algorithm is measured using an average external criterion and an average validity index. Results The diffraction?based clustering algorithm is shown to be independent of the number of clusters as the algorithm searches the feature space and requires no form of parameter selection. The results show that the diffraction?based clustering algorithm performs significantly better on the real biological datasets compared to the other existing algorithms. Conclusion The results of the diffraction?based clustering algorithm presented in this paper suggest that the method can provide researchers with a new tool for successfully analysing microarray data.

2012-01-01

395

Non-traditional spectral clustering algorithms for the detection of community structure in complex networks: a comparative analysis  

NASA Astrophysics Data System (ADS)

The detection of community structure in complex networks is crucial since it provides insight into the substructures of the whole network. Spectral clustering algorithms that employ the eigenvalues and eigenvectors of an appropriate input matrix have been successfully applied in this field. Despite its empirical success in community detection, spectral clustering has been criticized for its inefficiency when dealing with large scale data sets. This is confirmed by the fact that the time complexity for spectral clustering is cubic with respect to the number of instances; even the memory efficient iterative eigensolvers, such as the power method, may converge slowly to the desired solutions. In efforts to improve the complexity and performance, many non-traditional spectral clustering algorithms have been proposed. Rather than using the real eigenvalues and eigenvectors as in the traditional methods, the non-traditional clusterings employ additional topological structure information characterized by the spectrum of a matrix associated with the network involved, such as the complex eigenvalues and their corresponding complex eigenvectors, eigenspaces and semi-supervised labels. However, to the best of our knowledge, no work has been devoted to comparison among these newly developed approaches. This is the main goal of this paper, through evaluating the effectiveness of these spectral algorithms against some benchmark networks. The experimental results demonstrate that the spectral algorithm based on the eigenspaces achieves the best performance but is the slowest algorithm; the semi-supervised spectral algorithm is the fastest but its performance largely depends on the prior knowledge; and the spectral method based on the complement network shows similar performance to the conventional ones.

Ma, Xiaoke; Gao, Lin

2011-05-01

396

A Fuzzy C-Means approach for regionalization using a bivariate homogeneity and discordancy approach  

NASA Astrophysics Data System (ADS)

In stochastic analysis for droughts, such as frequency or trend analysis, the absence of lengthy records typically limits the reliability of statistical estimates. To address this issue, "regional" or "pooled" analysis approach is often used. The main contribution of this study is to create regions based on bivariate criteria rather than univariate ones; the two variables are severity and duration. The methodology is applied to hydrological records of 36 unregulated flow monitoring sites in the Canadian "prairie" provinces of Alberta, Saskatchewan and Manitoba. Our criteria for a hydrological "region" to be suitable are that it should be homogeneous, that it should not be discordant, and that it should not be too small. Tests for homogeneity and non-discordancy are traditionally based on univariate L-moment statistics; for example there have been several applications of univariate L-moments to bivariate drought analysis by simply ignoring one of the variables. Instead, we use multivariate L-moments, also known as L-comoments. The approach uses site characteristics and a fuzzy clustering approach, called Fuzzy C-Means (FCM), to form the initial regions (clusters) and adjusts initial clusters based on partial or fuzzy membership of each site to other clusters to form final clusters that meet the criteria of homogeneity, lack of discordancy, and sufficient size. We also estimate return periods using a bivariate copula method.

Sadri, S.; Burn, D. H.

2011-05-01

397

A heuristic method for finding the optimal number of clusters with application in medical data.  

PubMed

In this paper, a heuristic method for determining the optimal number of clusters is proposed. Four clustering algorithms, namely K-means, Growing Neural Gas, Simulated Annealing based technique, and Fuzzy C-means in conjunction with three well known cluster validity indices, namely Davies-Bouldin index, Calinski-Harabasz index, Maulik-Bandyopadhyay index, in addition to the proposed index are used. Our simulations evaluate capability of mentioned indices in some artificial and medical datasets. PMID:19163761

Bayati, Hamidreza; Davoudi, Heydar; Fatemizadeh, Emad

2008-01-01

398

A heuristic method for finding the optimal number of clusters with application in medical data  

Microsoft Academic Search

In this paper, a heuristic method for determining the optimal number of clusters is proposed. Four clustering algorithms, namely K-means, Growing Neural Gas, Simulated Annealing based technique, and Fuzzy C-means in conjunction with three well known cluster validity indices, namely Davies-Bouldin index, Calinski-Harabasz index, Maulik-Bandyopadhyay index, in addition to the proposed index are used. Our simulations evaluate capability of mentioned

Hamidreza Bayati; Heydar Davoudi; Emad Fatemizadeh

2008-01-01

399

Rough-fuzzy clustering for grouping functionally similar genes from microarray data.  

PubMed

Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets. PMID:22848138

Maji, Pradipta; Paul, Sushmita

2013-01-01

400

A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches  

PubMed Central

Motivation: Identifying orthologous genes in multiple genomes is a fundamental task in comparative genomics. Construction of intergenomic symmetrical best matches (SymBets) and joining them into clusters is a popular method of ortholog definition, embodied in several software programs. Despite their wide use, the computational complexity of these programs has not been thoroughly examined. Results: In this work, we show that in the standard approach of iteration through all triangles of SymBets, the memory scales with at least the number of these triangles, O(g3) (where g = number of genomes), and construction time scales with the iteration through each pair, i.e. O(g6). We propose the EdgeSearch algorithm that iterates over edges in the SymBet graph rather than triangles of SymBets, and as a result has a worst-case complexity of only O(g3log g). Several optimizations reduce the run-time even further in realistically sparse graphs. In two real-world datasets of genomes from bacteriophages (POGs) and Mollicutes (MOGs), an implementation of the EdgeSearch algorithm runs about an order of magnitude faster than the original algorithm and scales much better with increasing number of genomes, with only minor differences in the final results, and up to 60 times faster than the popular OrthoMCL program with a 90% overlap between the identified groups of orthologs. Availability and implementation: C++ source code freely available for download at ftp.ncbi.nih.gov/pub/wolf/COGs/COGsoft/ Contact: dmk@stowers.org Supplementary information: Supplementary materials are available at Bioinformatics online.

Kristensen, David M.; Kannan, Lavanya; Coleman, Michael K.; Wolf, Yuri I.; Sorokin, Alexander; Koonin, Eugene V.; Mushegian, Arcady

2010-01-01

401

AHIMSA - Ad hoc histogram information measure sensing algorithm for feature selection in the context of histogram inspired clustering techniques  

NASA Technical Reports Server (NTRS)

An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.

Dasarathy, B. V.

1976-01-01

402

An event-by-event comparison of clustering algorithms for photon detection in the STAR Endcap Calorimeter  

NASA Astrophysics Data System (ADS)

The STAR detector at the Relativistic Heavy Ion Collider (RHIC) at Brookhaven National Laboratory uses polarized proton collisions to determine the origin of the proton spin, using measurements such as neutral pion asymmetries. The Endcap Electromagnetic Calorimeter (EEMC) in the STAR detector is especially useful for detecting photons from 0? decays at forward angles. This latter measurement is obtained from the Shower Maximum Detector (SMD) in the EEMC where narrow crossed scintillator strips measure the energy deposited in them and can be used to identify the location of the photon shower. The electromagnetic shower most often deposits energy in a small number of adjacent strips that collectively form a ``cluster.'' This work has focused on a qualitative and quantitative comparison of two different clustering algorithms that were developed to reliably identify 0? events and to effectively discriminate against background cluster selection that produce false 0? signals. This comparative analysis will be presented and the strengths and weaknesses of the algorithms will be discussed.

Pochron, William

2012-10-01

403

A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality  

PubMed Central

The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 106 records and 104 dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.

Wang, Xueyi

2011-01-01

404

A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.  

PubMed

The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces. PMID:22247818

Wang, Xueyi

2012-02-01

405

Cloud classification from satellite data using a fuzzy sets algorithm: A polar example  

NASA Technical Reports Server (NTRS)

Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.

Key, J. R.; Maslanik, J. A.; Barry, R. G.

1988-01-01

406

Cloud classification from satellite data using a fuzzy sets algorithm - A polar example  

NASA Technical Reports Server (NTRS)

Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine like areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.

Key, J. R.; Maslanik, J. A.; Barry, R. G.

1989-01-01

407

Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS  

NASA Astrophysics Data System (ADS)

To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.

Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin

2011-04-01

408

Unsupervised approach data analysis based on fuzzy possibilistic clustering: application to medical image MRI.  

PubMed

The analysis and processing of large data are a challenge for researchers. Several approaches have been used to model these complex data, and they are based on some mathematical theories: fuzzy, probabilistic, possibilistic, and evidence theories. In this work, we propose a new unsupervised classification approach that combines the fuzzy and possibilistic theories; our purpose is to overcome the problems of uncertain data in complex systems. We used the membership function of fuzzy c-means (FCM) to initialize the parameters of possibilistic c-means (PCM), in order to solve the problem of coinciding clusters that are generated by PCM and also overcome the weakness of FCM to noise. To validate our approach, we used several validity indexes and we compared them with other conventional classification algorithms: fuzzy c-means, possibilistic c-means, and possibilistic fuzzy c-means. The experiments were realized on different synthetics data sets and real brain MR images. PMID:24489535

El Harchaoui, Nour-Eddine; Ait Kerroum, Mounir; Hammouch, Ahmed; Ouadou, Mohamed; Aboutajdine, Driss

2013-01-01

409

Unsupervised Approach Data Analysis Based on Fuzzy Possibilistic Clustering: Application to Medical Image MRI  

PubMed Central

The analysis and processing of large data are a challenge for researchers. Several approaches have been used to model these complex data, and they are based on some mathematical theories: fuzzy, probabilistic, possibilistic, and evidence theories. In this work, we propose a new unsupervised classification approach that combines the fuzzy and possibilistic theories; our purpose is to overcome the problems of uncertain data in complex systems. We used the membership function of fuzzy c-means (FCM) to initialize the parameters of possibilistic c-means (PCM), in order to solve the problem of coinciding clusters that are generated by PCM and also overcome the weakness of FCM to noise. To validate our approach, we used several validity indexes and we compared them with other conventional classification algorithms: fuzzy c-means, possibilistic c-means, and possibilistic fuzzy c-means. The experiments were realized on different synthetics data sets and real brain MR images.

El Harchaoui, Nour-Eddine; Ait Kerroum, Mounir; Hammouch, Ahmed; Ouadou, Mohamed; Aboutajdine, Driss

2013-01-01

410

Heart sound localization in chest sound using temporal fuzzy C-means classification.  

PubMed

Most of heart sound cancellation algorithms to improve the quality of lung sound use information about heart sound locations. Therefore, a reliable estimation of heart sound localizations within chest sound is a key issue to enhance the performance of heart sound cancellation algorithms. In this paper, we present a new technique to estimate locations of heart sound segments in chest sound using the temporal fuzzy c-means (TFCM) algorithm. In applying the method, chest sound is first divided into frames and then for each frame, the entropy feature is calculated. Next, by means of these features, the TFCM algorithm is applied to classify a chest sound into two classes: heart sound (heart sound containing lung sound) and non-heart sound (only lung sound). The proposed method was tested on the database used in the liteature and experimetal results are compared with the baseline which is a well-known method in the literature. The experimental results show that the proposed method outperforms the baseline method interms of false negative rate (FNR), false positive rate (FPR) and accuracy (ACC). PMID:23367122

Shamsi, Hamed; Ozbek, I Yucel

2012-01-01

411

Automatic segmentation of corpus callosum using Gaussian mixture modeling and Fuzzy C means methods.  

PubMed

This paper presents a comparative study of the success and performance of the Gaussian mixture modeling and Fuzzy C means methods to determine the volume and cross-sectionals areas of the corpus callosum (CC) using simulated and real MR brain images. The Gaussian mixture model (GMM) utilizes weighted sum of Gaussian distributions by applying statistical decision procedures to define image classes. In the Fuzzy C means (FCM), the image classes are represented by certain membership function according to fuzziness information expressing the distance from the cluster centers. In this study, automatic segmentation for midsagittal section of the CC was achieved from simulated and real brain images. The volume of CC was obtained using sagittal sections areas. To compare the success of the methods, segmentation accuracy, Jaccard similarity and time consuming for segmentation were calculated. The results show that the GMM method resulted by a small margin in more accurate segmentation (midsagittal section segmentation accuracy 98.3% and 97.01% for GMM and FCM); however the FCM method resulted in faster segmentation than GMM. With this study, an accurate and automatic segmentation system that allows opportunity for quantitative comparison to doctors in the planning of treatment and the diagnosis of diseases affecting the size of the CC was developed. This study can be adapted to perform segmentation on other regions of the brain, thus, it can be operated as practical use in the clinic. PMID:23871683

?çer, Semra

2013-10-01

412

Abdominal adipose tissue quantification on water-suppressed and non-water-suppressed MRI at 3T using semi-automated FCM clustering algorithm  

NASA Astrophysics Data System (ADS)

Accurate measurements of human body fat distribution are desirable because excessive body fat is associated with impaired insulin sensitivity, type 2 diabetes mellitus (T2DM) and cardiovascular disease. In this study, we hypothesized that the performance of water suppressed (WS) MRI is superior to non-water suppressed (NWS) MRI for volumetric assessment of abdominal subcutaneous (SAT), intramuscular (IMAT), visceral (VAT), and total (TAT) adipose tissues. We acquired T1-weighted images on a 3T MRI system (TIM Trio, Siemens), which was analyzed using semi-automated segmentation software that employs a fuzzy c-means (FCM) clustering algorithm. Sixteen contiguous axial slices, centered at the L4-L5 level of the abdomen, were acquired in eight T2DM subjects with water suppression (WS) and without (NWS). Histograms from WS images show improved separation of non-fatty tissue pixels from fatty tissue pixels, compared to NWS images. Paired t-tests of WS versus NWS showed a statistically significant lower volume of lipid in the WS images for VAT (145.3 cc less, p=0.006) and IMAT (305 cc less, p<0.001), but not SAT (14.1 cc more, NS). WS measurements of TAT also resulted in lower fat volumes (436.1 cc less, p=0.002). There is strong correlation between WS and NWS quantification methods for SAT measurements (r=0.999), but poorer correlation for VAT studies (r=0.845). These results suggest that NWS pulse sequences may overestimate adipose tissue volumes and that WS pulse sequences are more desirable due to the higher contrast generated between fatty and non-fatty tissues.

Valaparla, Sunil K.; Peng, Qi; Gao, Feng; Clarke, Geoffrey D.

2014-03-01

413

A Modified and Efficient Shuffled Frog Leaping Algorithm (MSFLA) for Unsupervised Data Clustering  

Microsoft Academic Search

\\u000a Shuffled frog leaping Algorithm (SFLA) is a new memetic, population based, meta-heuristic algorithm, has emerged as one of\\u000a the fast, robust with efficient global search capability. In order to enhance the algorithm’s stability and the ability to\\u000a search the global optimum, the conventional SFL Algorithm has been modified in our work by using the local best value of each\\u000a memeplex

Suresh Chittineni; Dinesh Godavarthi; A. N. S. Pradeep; Suresh Chandra Satapathy; P. V. G. D. Prasad Reddy

414

A non-parametric heuristic algorithm for convex and non-convex data clustering based on equipotential surfaces  

Microsoft Academic Search

In this paper, using the concepts of field theory and potential functions a sub-optimal non-parametric algorithm for clustering of convex and non-convex data is proposed. For this purpose, equipotential surfaces, created by interaction of the potential functions, are applied. Equipotential surfaces are the geometric location of the points in the space on which the potential is constant. It means all

Farhad Bayat; Ehsan Adeli Mosabbeb; Ali Akbar Jalali; Farshad Bayat

2010-01-01

415

An approximation-based load-balancing algorithm with admission control for cluster web servers with dynamic workloads  

Microsoft Academic Search

The growth of web-based applications in business and e-commerce is building up demands for high performance web servers for\\u000a better throughputs and lower user-perceived latency. These demands are leading to a widespread substitution of powerful single\\u000a servers by robust newcomers, cluster web servers, in many enterprise companies. In this respect the load-balancing algorithms\\u000a play an important role in boosting the

Saeed Sharifian; Seyed A. Motamedi; Mohammad K. Akbari

2010-01-01

416

Evaluation of Cluster Analysis Algorithms Enhanced by Using R*Trees  

Microsoft Academic Search

R* tree is a useful data structure for handling spatial data. However, although objects stored in the same R* tree leaf node enjoys spatial proximity, it is well-known that R* trees cannot be used directly for cluster analysis. Nevertheless, R* tree’s indexing feature can be used to assist existing cluster analysis methods, thus enhancing their performance or cluster quality. In

Jiaxiong Pi; Yong Shi; Zhengxin Chen

2006-01-01

417

Details of the Adjusted Rand index and Clustering algorithms Supplement to the paper \\  

Microsoft Academic Search

. Suppose that is our external criterion and is a clustering result. Let be the number of pairs of objects that are placed in the same class in and in the same cluster in , be the number of pairs of objects in the same class in but not in the same cluster in , be the number of pairs

Ka Yee Yeung; Walter L. Ruzzo

418

An examination of the effect of six types of error perturbation on fifteen clustering algorithms  

Microsoft Academic Search

An evaluation of several clustering methods was conducted. Artificial clusters which exhibited the properties of internal cohesion and external isolation were constructed. The true cluster structure was subsequently hidden by six types of error-perturbation. The results indicated that the hierarchical methods were differentially sensitive to the type of error perturbation. In addition, generally poor recovery performance was obtained when random

Glenn W. Milligan

1980-01-01

419

EAD and PEBD: Two Energy-Aware Duplication Scheduling Algorithms for Parallel Tasks on Homogeneous Clusters  

Microsoft Academic Search

High-performance clusters have been widely deployed to solve challenging and rigorous scientific and engineering tasks. On one hand, high performance is certainly an important consideration in designing clusters to run parallel applications. On the other hand, the ever increasing energy cost requires us to effectively conserve energy in clusters. To achieve the goal of optimizing both performance and energy efficiency

Ziliang Zong; Adam Manzanares; Xiaojun Ruan; Xiao Qin

2011-01-01

420

BiCluE - Exact and heuristic algorithms for weighted bi-cluster editing of biomedical data  

PubMed Central

Background The explosion of biological data has dramatically reformed today's biology research. The biggest challenge to biologists and bioinformaticians is the integration and analysis of large quantity of data to provide meaningful insights. One major problem is the combined analysis of data from different types. Bi-cluster editing, as a special case of clustering, which partitions two different types of data simultaneously, might be used for several biomedical scenarios. However, the underlying algorithmic problem is NP-hard. Results Here we contribute with BiCluE, a software package designed to solve the weighted bi-cluster editing problem. It implements (1) an exact algorithm based on fixed-parameter tractability and (2) a polynomial-time greedy heuristics based on solving the hardest part, edge deletions, first. We evaluated its performance on artificial graphs. Afterwards we exemplarily applied our implementation on real world biomedical data, GWAS data in this case. BiCluE generally works on any kind of data types that can be modeled as (weighted or unweighted) bipartite graphs. Conclusions To our knowledge, this is the first software package solving the weighted bi-cluster editing problem. BiCluE as well as the supplementary results are available online at http://biclue.mpi-inf.mpg.de.

2013-01-01

421

The CLASSY clustering algorithm: Description, evaluation, and comparison with the iterative self-organizing clustering system (ISOCLS). [used for LACIE data  

NASA Technical Reports Server (NTRS)

A clustering method, CLASSY, was developed, which alternates maximum likelihood iteration with a procedure for splitting, combining, and eliminating the resulting statistics. The method maximizes the fit of a mixture of normal distributions to the observed first through fourth central moments of the data and produces an estimate of the proportions, means, and covariances in this mixture. The mathematical model which is the basic for CLASSY and the actual operation of the algorithm is described. Data comparing the performances of CLASSY and ISOCLS on simulated and actual LACIE data are presented.

Lennington, R. K.; Malek, H.

1978-01-01

422

Location Fingerprint Positioning Based on Interval-valued Data FCM Algorithm  

NASA Astrophysics Data System (ADS)

In order to reduce positioning calculation power consumption of ZigBee module, a fingerprint positioning method was proposed in the paper based on interval-valued data fuzzy c-means algorithm. Fingerprints were regarded as interval-valued data which could reflect its uncertainty caused by measurement error and interference. In high-dimensional feature space spanned by interval midpoint and length, fingerprints were clustered by FCM algorithm to lower computation complexity. Compared with traditional clustering technologies, such as c-mean, the method got better clustering results of location fingerprints in the positioning experiment designed in the paper. Results from the clustering and positioning experiments show that the method provides a feasible solution to decrease the positioning calculation power consumption of ZigBee module remarkably, as well as ensures the positioning precision.

Li, Fang; Tong, Weiming; Wang, Tiecheng

423

Effective Analysis of NGS Metagenomic Data with Ultra-Fast Clustering Algorithms (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)  

ScienceCinema

San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

424

Effective Analysis of NGS Metagenomic Data with Ultra-Fast Clustering Algorithms (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)  

ScienceCinema

San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

Li, Weizhong [San Diego Supercomputer Center

2013-01-22

425

Effective Analysis of NGS Metagenomic Data with Ultra-Fast Clustering Algorithms (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)  

SciTech Connect

San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

Li, Weizhong [San Diego Supercomputer Center] [San Diego Supercomputer Center

2011-10-12

426

Cluster randomized controlled trial of a simple warfarin maintenance dosing algorithm versus usual care among primary care practices.  

PubMed

Many patients using warfarin are being managed in primary care and typically achieve a lower time in therapeutic range (TTR) for the international normalized ratio (INR) than patients in specialized care. A simple warfarin maintenance dosing tool could assist primary care physicians with improving TTR. We tested whether a simple warfarin maintenance dosing algorithm can improve TTR compared with usual care among Canadian primary care physicians. Primary care practices managing warfarin therapy without an anticoagulation clinic, computer decision support system or patient self-management tools enrolled 10-30 patients with target INR range 2-3. Practices were randomized to manage warfarin maintenance with the algorithm, or as usual in 2009-2010. Primary outcome was the mean individual patient TTR, and was compared between groups with adjustment for clustering within practices. There were 13 practices randomized to the Algorithm and 15 practices to Control, enrolling 240 and 297 patients respectively, with a mean follow-up of 280 days. Mean (standard deviation; SD) TTR before the study was comparable between groups [68 % (SD 26) for usual care vs. 70 % (SD 27) for the algorithm; p = 0.49]. Dosing decisions during the study in the algorithm group were more often in agreement with the algorithm's recommendations than with usual care (81 vs. 91 %; p < 0.0001). Mean study TTR of the algorithm group was not superior to usual care: [72.1 (SE 1.7) vs. 71.4 % (SE 1.5) respectively; p = 0.73]. The simple warfarin maintenance dosing algorithm did not improve TTR compared with usual care among Canadian primary care practices. PMID:23877621

Nieuwlaat, Robby; Eikelboom, John W; Schulman, Sam; van Spall, Harriette G C; Schulze, Karleen M; Connolly, Benjamin J; Cuddy, Spencer M; Hubers, Lowiek M; Stehouwer, Alexander C; Connolly, Stuart J

2014-05-01

427

A possibilistic approach to clustering  

NASA Technical Reports Server (NTRS)

Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering methods in that total commitment of a vector to a given class is not required at each image pattern recognition iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from the 'Fuzzy C-Means' (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Recently, we cast the clustering problem into the framework of possibility theory using an approach in which the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.

Krishnapuram, Raghu; Keller, James M.

1993-01-01

428

SPARCL: an effective and efficient algorithm for mining arbitrary shape-based clusters  

Microsoft Academic Search

Clustering is one of the fundamental data mining tasks. Many different clustering paradigms have been developed over the years,\\u000a which include partitional, hierarchical, mixture model based, density-based, spectral, subspace, and so on. The focus of this\\u000a paper is on full-dimensional, arbitrary shaped clusters. Existing methods for this problem suffer either in terms of the memory\\u000a or time complexity (quadratic or

Vineet Chaoji; Mohammad Al Hasan; Saeed Salem; Mohammed J. Zaki

2009-01-01

429

Fuzzy clustering and enumeration of target type based on sonar returns  

Microsoft Academic Search

The fuzzy c-means (FCM) clustering algorithm is used in conjunction with a cluster validity criterion, to determine the number of di2erent types of targets in a given environment, based on their sonar signatures. The class of each target and its location are also determined. The method is experimentally veri4ed using real sonar returns from targets in indoor environments. A correct

Billur Barshan; Birsel Ayrulu

2004-01-01

430

A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain  

Microsoft Academic Search

Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms, and a supervised computational neural network. Initial clinical results are presented on normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques

L. O. Hall; A. M. Bensaid; L. P. Clarke; R. P. Velthuizen; M. S. Silbiger; J. C. Bezdek

1992-01-01

431

CGMgraph\\/CGMlib: Implementing and Testing CGM Graph Algorithms on PC Clusters  

Microsoft Academic Search

In this paper, we present CGMgraph, the first integrated library of parallel graph methods for PC clusters based on CGM algo- rithms. CGMgraph implements parallel methods for various graph prob- lems. Our implementations of deterministic list ranking, Euler tour, con- nected components, spanning forest, and bipartite graph detection are, to our knowledge, the first efficient implementations for PC clusters. Our

Albert Chan; Frank K. H. A. Dehne

2003-01-01

432

GX-Means: A Model-Based Divide and Merge Algorithm for Geospatial Image Clustering.  

National Technical Information Service (NTIS)

One of the practical issues in clustering is the specification of the appropriate number of clusters, which is not obvious when analyzing geospatial datasets, partly because they are huge (both in size and spatial extent) and high dimensional. In this pap...

C. T. Symons G. Jun R. R. Vatsavai V. Chandola

2011-01-01

433

CASCADE: a novel quasi all paths-based network analysis algorithm for clustering biological interactions  

PubMed Central

Background Quantitative characterization of the topological characteristics of protein-protein interaction (PPI) networks can enable the elucidation of biological functional modules. Here, we present a novel clustering methodology for PPI networks wherein the biological and topological influence of each protein on other proteins is modeled using the probability distribution that the series of interactions necessary to link a pair of distant proteins in the network occur within a time constant (the occurrence probability). Results CASCADE selects representative nodes for each cluster and iteratively refines clusters based on a combination of the occurrence probability and graph topology between every protein pair. The CASCADE approach is compared to nine competing approaches. The clusters obtained by each technique are compared for enrichment of biological function. CASCADE generates larger clusters and the clusters identified have p-values for biological function that are approximately 1000-fold better than the other methods on the yeast PPI network dataset. An important strength of CASCADE is that the percentage of proteins that are discarded to create clusters is much lower than the other approaches which have an average discard rate of 45% on the yeast protein-protein interaction network. Conclusion CASCADE is effective at detecting biologically relevant clusters of interactions.

Hwang, Woochang; Cho, Young-Rae; Zhang, Aidong; Ramanathan, Murali

2008-01-01

434

A Dynamic Cluster Formation Algorithm for Collaborative Information Processing in Wireless Sensor Networks  

Microsoft Academic Search

Clustering of sensor nodes has been shown to be an effective approach for distributed collaborative information processing in resource constrained wireless sensor networks to keep network traffic local in order to reduce energy dissipation of long-distance transmissions. Defining the range and topology of clusters to reduce energy consumption and retransmissions due to collisions on shared radio channels is an ongoing

Chia-Yen Shih; Stephen F. Jenks

2007-01-01

435

Reliability of Vegetation Community Information Derived using Decorana Ordination and Fuzzy c-means Clustering  

Microsoft Academic Search

Descriptions of vegetation communities are often based on vague semantic terms describing species presence and dominance.\\u000a For this reason, some researchers advocate the use of fuzzy sets in the statistical classification of plant species data into\\u000a communities. In this study, spatially referenced vegetation abundance values collected from Greek phrygana were analysed by ordination (DECORANA), and classified on the resulting axes

Lucy Bastin; Peter Fisher; M. C. Bacon; Charles Arnot; M. J. Hughes

436

A Fast Fuzzy-C means based marker controlled watershed segmentation of clustered nuclei  

Microsoft Academic Search

Microscopy cell image analysis is a fundamental tool for biological research. This analysis is used in studies of different aspects of cell cultures. The main challenges in segmenting nuclei in histometry are due to the fact that the specimen is a 2-D section of a 3-D tissue sample. The 2-D sectioning can result in partially imaged nuclei, sectioning of nuclei

M. Mohideen Fatima alias Niraimathi; V. Seenivasagam

2011-01-01

437

Fuzzy clustering in Intelligent Scissors.  

PubMed

In this study a modified Live-Wire approach is presented. A Fuzzy C-Means (FCM) clustering procedure has been implemented before the wavelet transform cost map function is defined. This shrinks the area to be searched resulting in a significant reduction of the computational complexity. The method has been employed to computed tomography (CT) and magnetic resonance (MR) studies. The 2D segmentation of lungs, abdominal structures and knee joint has been performed in order to evaluate the method. Significant numerical complexity reduction of the Live-Wire algorithm as well as improvement of the object delineation with a decreased number of user interactions have been obtained. PMID:22483373

Wieclawek, W; Pietka, E

2012-07-01

438

Treatment of ill-balanced datasets of fMRI with Modified Fuzzy c-means Method.  

PubMed

In fMRI dataset, the population of actived voxels is always much less than the total population of the voxels, and that produced an ill-balanced dataset. Some methods, such as limiting the analysis to the gray matter voxels where the BOLD signal is expected and removing the voxels that is absolutely non-actived based on statistical criteria, have been used to treat the ill-balanced dataset. In this article, a new method, Modified Fuzzy c-means(MFc), has been proposed to treat the ill-balanced dataset of fMRI. The main difference from other statistical methods is that it is datadriven. iven. The MFc method is used to classify the voxels into two clusters with nearly the same population and all actived voxels are contained in one cluster. Thus we got nearly half voxels to analysis and the ill-balanced dataset can be treated. The efficiency of clustering analysis is also boosted. PMID:17282463

Gu, Jiebin; Cao, Zhitong; Zheng, Xi; Aihua, Cai

2005-01-01

439

MetaCluster 4.0: a novel binning algorithm for NGS reads and huge number of species.  

PubMed

Next-generation sequencing (NGS) technologies allow the sequencing of microbial communities directly from the environment without prior culturing. The output of environmental DNA sequencing consists of many reads from genomes of different unknown species, making the clustering together reads from the same (or similar) species (also known as binning) a crucial step. The difficulties of the binning problem are due to the following four factors: (1) the lack of reference genomes; (2) uneven abundance ratio of species; (3) short NGS reads; and (4) a large number of species (can be more than a hundred). None of the existing binning tools can handle all four factors. No tools, including both AbundanceBin and MetaCluster 3.0, have demonstrated reasonable performance on a sample with more than 20 species. In this article, we introduce MetaCluster 4.0, an unsupervised binning algorithm that can accurately (with about 80% precision and sensitivity in all cases and at least 90% in some cases) and efficiently bin short reads with varying abundance ratios and is able to handle datasets with 100 species. The novelty of MetaCluster 4.0 stems from solving a few important problems: how to divide reads into groups by a probabilistic approach, how to estimate the 4-mer distribution of each group, how to estimate the number of species, and how to modify MetaCluster 3.0 to handle a large number of species. We show that Meta Cluster 4.0 is effective for both simulated and real datasets. Supplementary Material is available at www.liebertonline.com/cmb. PMID:22300323

Wang, Yi; Leung, Henry C M; Yiu, S M; Chin, Francis Y L

2012-02-01

440

Parallel SOR Iterative Algorithms and Performance Evaluation on a Linux Cluster.  

National Technical Information Service (NTIS)

The successive over-relaxation (SOR) iterative method is an important solver for linear systems. In this paper, a parallel algorithm for the red-black SOR method with domain decomposition is investigated. The parallel SOR algorithm is designed by combinin...

C. Zhang H. Lan Y. Ye B. D. Estrade

2005-01-01